1129 lines
284 KiB
Plaintext
1129 lines
284 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Лабораторная работа №3\n",
|
|||
|
"\n",
|
|||
|
"*Вариант задания:* Товары Jio Mart (вариант - 23) "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Для выполнения лабораторной работы по датасету 'jio mart product items', приведу пример двух бизнес-целей:\n",
|
|||
|
"\n",
|
|||
|
"### Бизнес-цели:\n",
|
|||
|
"\n",
|
|||
|
"1. **Оптимизация ассортимента товаров в онлайн-магазине**\n",
|
|||
|
" \n",
|
|||
|
" **Формулировка:** Разработать модель, которая позволяет онлайн-магазину Jio Mart анализировать, какие товары наиболее востребованы, и автоматизировать оптимизацию ассортимента. Это поможет поддерживать в наличии наиболее популярные продукты и своевременно пополнять запасы.\n",
|
|||
|
" \n",
|
|||
|
" **Цель:** Увеличить объем продаж за счет оптимизации ассортимента и сокращения вероятности отсутствия популярных товаров на складе. Повысить клиентскую удовлетворенность за счет улучшения доступности товаров.\n",
|
|||
|
" \n",
|
|||
|
" **Ключевые показатели успеха (KPI):** \n",
|
|||
|
" - *Точность прогнозирования популярности товаров:* Модель должна иметь точность не менее 90% в прогнозировании популярных товаров.\n",
|
|||
|
" - *Увеличение продаж:* Увеличение продаж наиболее популярных товаров на 15% за счет правильного планирования запасов.\n",
|
|||
|
" - *Снижение потерь от неликвидов:* Снижение доли товаров, которые остаются нераспроданными, до уровня ниже 5%.\n",
|
|||
|
"\n",
|
|||
|
"2. **Оптимизация ценовой политики**\n",
|
|||
|
" \n",
|
|||
|
" **Формулировка:** Разработать модель для автоматической корректировки цен в зависимости от спроса и конкуренции, чтобы максимизировать доход. Модель должна учитывать такие факторы, как сезонные колебания спроса, конкуренция и изменения цен.\n",
|
|||
|
" \n",
|
|||
|
" **Цель:** Повысить доходность онлайн-магазина Jio Mart за счет гибкой и динамической ценовой стратегии.\n",
|
|||
|
" \n",
|
|||
|
" **Ключевые показатели успеха (KPI):** \n",
|
|||
|
" - *Рост среднего чека:* Увеличение среднего чека покупок на 10% за счет оптимизации цен.\n",
|
|||
|
" - *Увеличение объема продаж:* Повышение объема продаж на 20% за счет корректировки цен в зависимости от спроса.\n",
|
|||
|
" - *Конкурентоспособность цен:* Цены должны быть ниже или на уровне с ключевыми конкурентами для 80% ассортимента."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Технические цели проекта для каждой выделенной бизнес-цели\n",
|
|||
|
"\n",
|
|||
|
"1. **Создание модели для оптимизации ассортимента товаров в онлайн-магазине.** \n",
|
|||
|
" \n",
|
|||
|
" - **Сбор и подготовка данных:** \n",
|
|||
|
" Необходимо собрать данные о продажах товаров, наличии на складе, временных трендах и сезонных изменениях спроса. Провести очистку данных от пропусков, дубликатов, аномальных значений (например, нулевые продажи при наличии товара). Преобразовать категориальные переменные (категории товаров, бренды, регионы) в числовую форму с помощью методов, таких как One-Hot-Encoding или Label Encoding. Выполнить временное сглаживание данных и стандартизацию числовых признаков для приведения их к одному масштабу. Разбить данные на обучающую и тестовую выборки.\n",
|
|||
|
" \n",
|
|||
|
" - **Разработка и обучение модели:** \n",
|
|||
|
" Провести эксперименты с различными алгоритмами машинного обучения, такими как регрессия, градиентный бустинг, нейронные сети, для прогнозирования спроса на товары. Обучить модель с использованием метрик оценки, таких как MAE (Mean Absolute Error) и MSE (Mean Squared Error). Оценить производительность моделей на тестовых данных, обеспечивая точность прогнозирования популярности товаров.\n",
|
|||
|
" \n",
|
|||
|
" - **Развёртывание модели:** \n",
|
|||
|
" Интеграция модели в систему управления запасами магазина для автоматической корректировки ассортимента. Создание API или интерфейса для отображения прогноза спроса и рекомендаций по пополнению запасов товаров. Модель должна предлагать автоматическое обновление ассортимента с учетом прогноза популярности и доступности товаров.\n",
|
|||
|
"\n",
|
|||
|
"2. **Создание модели для оптимизации ценовой политики.** \n",
|
|||
|
" \n",
|
|||
|
" - **Сбор и подготовка данных:** \n",
|
|||
|
" Сбор данных о ценах товаров, продажах, спросе, а также информации о конкурентах и сезонных трендах. Очистка данных от пропусков и аномальных значений. Преобразование категориальных признаков (категории товаров, регионы продаж) в числовой формат. Нормализация числовых данных (например, цены, скидки, объем продаж). Разбиение данных на тренировочную и тестовую выборки для корректного обучения модели.\n",
|
|||
|
" \n",
|
|||
|
" - **Разработка и обучение модели:** \n",
|
|||
|
" Исследование и выбор подходящих моделей для прогнозирования динамических изменений цен с учетом спроса (например, случайные леса, градиентный бустинг, временные ряды). Обучение модели для прогнозирования изменения объема продаж в зависимости от цен и конкурентов. Оценка модели с использованием метрик MSE и RMSE для минимизации ошибки прогнозирования. Прогнозирование оптимальной цены для каждого товара, которая максимизирует продажи и прибыль.\n",
|
|||
|
" \n",
|
|||
|
" - **Развёртывание модели:** \n",
|
|||
|
" Создание системы, которая автоматически рекомендует изменение цен в зависимости от спроса и данных о конкурентах. Разработка API для интеграции в систему ценообразования магазина. Создание интерфейса для мониторинга изменения цен и влияния на продажи в реальном времени."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 3,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Index(['category', 'sub_category', 'href', 'items', 'price'], dtype='object')\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import matplotlib.ticker as ticker\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"\n",
|
|||
|
"# Загрузка данных\n",
|
|||
|
"df = pd.read_csv(\"..//static//csv//jio_mart_items.csv\")\n",
|
|||
|
"\n",
|
|||
|
"# Срез данных, первые 15000 строк\n",
|
|||
|
"df = df.iloc[:15000]\n",
|
|||
|
"\n",
|
|||
|
"# Вывод\n",
|
|||
|
"print(df.columns)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>category</th>\n",
|
|||
|
" <th>sub_category</th>\n",
|
|||
|
" <th>href</th>\n",
|
|||
|
" <th>items</th>\n",
|
|||
|
" <th>price</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>Groceries</td>\n",
|
|||
|
" <td>Fruits & Vegetables</td>\n",
|
|||
|
" <td>https://www.jiomart.com/c/groceries/fruits-veg...</td>\n",
|
|||
|
" <td>Fresh Dates (Pack) (Approx 450 g - 500 g)</td>\n",
|
|||
|
" <td>109.0</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>Groceries</td>\n",
|
|||
|
" <td>Fruits & Vegetables</td>\n",
|
|||
|
" <td>https://www.jiomart.com/c/groceries/fruits-veg...</td>\n",
|
|||
|
" <td>Tender Coconut Cling Wrapped (1 pc) (Approx 90...</td>\n",
|
|||
|
" <td>49.0</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>Groceries</td>\n",
|
|||
|
" <td>Fruits & Vegetables</td>\n",
|
|||
|
" <td>https://www.jiomart.com/c/groceries/fruits-veg...</td>\n",
|
|||
|
" <td>Mosambi 1 kg</td>\n",
|
|||
|
" <td>69.0</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>Groceries</td>\n",
|
|||
|
" <td>Fruits & Vegetables</td>\n",
|
|||
|
" <td>https://www.jiomart.com/c/groceries/fruits-veg...</td>\n",
|
|||
|
" <td>Orange Imported 1 kg</td>\n",
|
|||
|
" <td>125.0</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>Groceries</td>\n",
|
|||
|
" <td>Fruits & Vegetables</td>\n",
|
|||
|
" <td>https://www.jiomart.com/c/groceries/fruits-veg...</td>\n",
|
|||
|
" <td>Banana Robusta 6 pcs (Box) (Approx 800 g - 110...</td>\n",
|
|||
|
" <td>44.0</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" category sub_category \\\n",
|
|||
|
"0 Groceries Fruits & Vegetables \n",
|
|||
|
"1 Groceries Fruits & Vegetables \n",
|
|||
|
"2 Groceries Fruits & Vegetables \n",
|
|||
|
"3 Groceries Fruits & Vegetables \n",
|
|||
|
"4 Groceries Fruits & Vegetables \n",
|
|||
|
"\n",
|
|||
|
" href \\\n",
|
|||
|
"0 https://www.jiomart.com/c/groceries/fruits-veg... \n",
|
|||
|
"1 https://www.jiomart.com/c/groceries/fruits-veg... \n",
|
|||
|
"2 https://www.jiomart.com/c/groceries/fruits-veg... \n",
|
|||
|
"3 https://www.jiomart.com/c/groceries/fruits-veg... \n",
|
|||
|
"4 https://www.jiomart.com/c/groceries/fruits-veg... \n",
|
|||
|
"\n",
|
|||
|
" items price \n",
|
|||
|
"0 Fresh Dates (Pack) (Approx 450 g - 500 g) 109.0 \n",
|
|||
|
"1 Tender Coconut Cling Wrapped (1 pc) (Approx 90... 49.0 \n",
|
|||
|
"2 Mosambi 1 kg 69.0 \n",
|
|||
|
"3 Orange Imported 1 kg 125.0 \n",
|
|||
|
"4 Banana Robusta 6 pcs (Box) (Approx 800 g - 110... 44.0 "
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Для наглядности\n",
|
|||
|
"df.head()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 5,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>price</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>count</th>\n",
|
|||
|
" <td>15000.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>mean</th>\n",
|
|||
|
" <td>373.427633</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>std</th>\n",
|
|||
|
" <td>463.957949</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>min</th>\n",
|
|||
|
" <td>5.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>25%</th>\n",
|
|||
|
" <td>123.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>50%</th>\n",
|
|||
|
" <td>250.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>75%</th>\n",
|
|||
|
" <td>446.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>max</th>\n",
|
|||
|
" <td>14999.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" price\n",
|
|||
|
"count 15000.000000\n",
|
|||
|
"mean 373.427633\n",
|
|||
|
"std 463.957949\n",
|
|||
|
"min 5.000000\n",
|
|||
|
"25% 123.000000\n",
|
|||
|
"50% 250.000000\n",
|
|||
|
"75% 446.000000\n",
|
|||
|
"max 14999.000000"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 5,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Описание данных (основные статистические показатели)\n",
|
|||
|
"df.describe()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 6,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"category 0\n",
|
|||
|
"sub_category 0\n",
|
|||
|
"href 0\n",
|
|||
|
"items 0\n",
|
|||
|
"price 0\n",
|
|||
|
"dtype: int64\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"category False\n",
|
|||
|
"sub_category False\n",
|
|||
|
"href False\n",
|
|||
|
"items False\n",
|
|||
|
"price False\n",
|
|||
|
"dtype: bool"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 6,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Процент пропущенных значений признаков\n",
|
|||
|
"for i in df.columns:\n",
|
|||
|
" null_rate = df[i].isnull().sum() / len(df) * 100\n",
|
|||
|
" if null_rate > 0:\n",
|
|||
|
" print(f'{i} Процент пустых значений: %{null_rate:.2f}')\n",
|
|||
|
"\n",
|
|||
|
"# Проверка на пропущенные данные\n",
|
|||
|
"print(df.isnull().sum())\n",
|
|||
|
"\n",
|
|||
|
"df.isnull().any()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Нет пропущенных данных."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Разбиваем на выборки (обучающую, тестовую, контрольную)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Размер обучающей выборки: 12000\n",
|
|||
|
"Размер контрольной выборки: 3000\n",
|
|||
|
"Размер тестовой выборки: 3000\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"\n",
|
|||
|
"# Разделение данных на обучающую и тестовую выборки (80% - обучение, 20% - тестовая)\n",
|
|||
|
"train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"# Разделение данных на обучающую и контрольную выборки (80% - обучение, 20% - контроль)\n",
|
|||
|
"train_data, val_data = train_test_split(df, test_size=0.2, random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"print(\"Размер обучающей выборки: \", len(train_data))\n",
|
|||
|
"print(\"Размер контрольной выборки: \", len(val_data))\n",
|
|||
|
"print(\"Размер тестовой выборки: \", len(test_data))"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 8,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABU3ElEQVR4nO3deVxUVeMG8Gd2NlkEAXEl94XUMA0tNcW9svLN9GepZWqmlVZqvOWWGWm+5ZpWb6m9WaaVtpnmXhruS+5aorgBorLNwKzn9wfOjQFEhBlm5vJ8Px8+NfeeufecWeDxLPcqhBACRERERDKldHcFiIiIiFyJYYeIiIhkjWGHiIiIZI1hh4iIiGSNYYeIiIhkjWGHiIiIZI1hh4iIiGSNYYeIiIhkjWGHiMgFMjMz8ddff8Fisbi7KuREQghcv34dZ86ccXdV6A4w7BAROYHZbMbs2bPRqlUr6HQ6hISEoFGjRti8ebO7q+YVjh49irVr10qPDx06hJ9//tl9FSokJycHb775Jpo0aQKtVovQ0FA0btwYp06dcnfVqIzU7q4Aud6yZcvwzDPPSI91Oh3q1q2LHj16YPLkyYiIiHBj7Yi8n9FoRI8ePbBr1y48//zzmDFjBvz8/KBSqRAbG+vu6nmFnJwcjBo1CpGRkQgNDcXLL7+M3r17o2/fvm6t17Vr19C5c2ekpKTgxRdfRMeOHaHVaqHRaFC/fn231o3KjmGnCnnrrbcQHR2N/Px87NixA4sXL8a6detw9OhR+Pn5ubt6RF5r1qxZ2L17NzZs2IAuXbq4uzpeKS4uTvoBgMaNG2PEiBFurhUwYcIEXLlyBUlJSWjRooW7q0PlxLBThfTu3Rtt27YFADz33HMIDQ3F+++/j++//x6DBg1yc+2IvJPFYsHcuXPx6quvMuhU0Nq1a3H8+HHk5eUhJiYGWq3WrfVJT0/H8uXLsWTJEgYdL8c5O1VY165dAQDJyckAgOvXr+O1115DTEwMAgICEBgYiN69e+Pw4cPFnpufn49p06ahcePG8PHxQc2aNfH444/j77//BgCcO3cOCoXilj+F/yhs27YNCoUCX3/9Nf79738jMjIS/v7+eOSRR3DhwoVi5969ezd69eqFoKAg+Pn5oXPnzti5c2eJbezSpUuJ5582bVqxsl988QViY2Ph6+uL6tWrY+DAgSWev7S2FWaz2TB37ly0aNECPj4+iIiIwKhRo3Djxg2HcvXr18dDDz1U7Dxjx44tdsyS6v7ee+8Ve02BgqGVqVOnomHDhtDpdKhTpw4mTpwIo9FY4mtVWJcuXdCyZcti2+fMmQOFQoFz5845bM/MzMS4ceNQp04d6HQ6NGzYELNmzYLNZpPK2F+3OXPmFDtuy5YtS/xMfPPNN7es47Bhw8o0jFC/fn3p/VEqlYiMjMSTTz6JlJSU2z4XAD788EO0aNECOp0OUVFRGDNmDDIzM6X9p06dwo0bN1CtWjV07twZfn5+CAoKwkMPPYSjR49K5bZu3QqFQoE1a9YUO8eXX34JhUKBpKQkqc7Dhg1zKGN/TbZt2yZt+/333/HEE0+gbt260ns8fvx45OXlOTx32rRpxT5LK1asQOvWreHj44PQ0FAMGjSo2GsybNgwBAQEOGz75ptvitUDAAICAorVGSjb96pLly7S+9+8eXPExsbi8OHDJX6vSlL0ex4WFoa+ffs6vP5Awfdn7NixtzzOsmXLHD7fe/fuhc1mg8lkQtu2bUt9rQBgy5YteOCBB+Dv74/g4GD069cPJ06ccChjfy9OnjyJAQMGIDAwUBq2y8/PL1bfwt93i8WCPn36oHr16jh+/LhD2bL+/qqq2LNThdmDSWhoKADg7NmzWLt2LZ544glER0cjLS0NH330ETp37ozjx48jKioKAGC1WvHQQw9h8+bNGDhwIF5++WXk5ORg48aNOHr0KBo0aCCdY9CgQejTp4/DeRMSEkqsz8yZM6FQKDBp0iSkp6dj7ty5iI+Px6FDh+Dr6wug4JdJ7969ERsbi6lTp0KpVGLp0qXo2rUrfv/9d7Rr167YcWvXro3ExEQAQG5uLkaPHl3iuSdPnowBAwbgueeew9WrV7FgwQJ06tQJBw8eRHBwcLHnjBw5Eg888AAA4Lvvviv2R2zUqFHSfKmXXnoJycnJWLhwIQ4ePIidO3dCo9GU+DrciczMTKlthdlsNjzyyCPYsWMHRo4ciWbNmuHIkSP44IMPcPr0aYeJoBVlMBjQuXNnXLp0CaNGjULdunXxxx9/ICEhAVeuXMHcuXOddq7yeuCBBzBy5EjYbDYcPXoUc+fOxeXLl/H777+X+rxp06Zh+vTpiI+Px+jRo3Hq1CksXrwYe/fuld7Da9euASj4XDdq1AjTp09Hfn4+Fi1ahI4dO2Lv3r1o3LgxunTpgjp16mDFihV47LHHHM6zYsUKNGjQQBrCKavVq1fDYDBg9OjRCA0NxZ49e7BgwQJcvHgRq1evvuXzvvzySzz11FNo1aoVEhMTce3aNcyfPx87duzAwYMHERYWdkf1uJXyfK/sJk2adEfnatq0Kd544w0IIfD333/j/fffR58+fcocaktif2/Hjh2L2NhYvPvuu7h69WqJr9WmTZvQu3dv3HXXXZg2bRry8vKwYMECdOzYEQcOHCgWzAcMGID69esjMTERu3btwvz583Hjxg18/vnnt6zPc889h23btmHjxo1o3ry5tL0ir3OVIUj2li5dKgCITZs2iatXr4oLFy6IlStXitDQUOHr6ysuXrwohBAiPz9fWK1Wh+cmJycLnU4n3nrrLWnbZ599JgCI999/v9i5bDab9DwA4r333itWpkWLFqJz587S461btwoAolatWiI7O1vavmrVKgFAzJs3Tzp2o0aNRM+ePaXzCCGEwWAQ0dHRonv37sXO1aFDB9GyZUvp8dWrVwUAMXXqVGnbuXPnhEqlEjNnznR47pEjR4RarS62/cyZMwKAWL58ubRt6tSpovDX6ffffxcAxIoVKxyeu379+mLb69WrJ/r27Vus7mPGjBFFv6JF6z5x4kQRHh4uYmNjHV7T//3vf0KpVIrff//d4flLliwRAMTOnTuLna+wzp07ixYtWhTb/t577wkAIjk5Wdo2Y8YM4e/vL06fPu1Q9vXXXxcqlUqkpKQIIcr3mVi9evUt6zh06FBRr169UtshRMHrO3ToUIdt//d//yf8/PxKfV56errQarWiR48eDt+LhQsXCgDis88+c6hrWFiYyMjIkMqdPn1aaDQa0b9/f2lbQkKC0Ol0IjMz0+E8arXa4X2Njo4WQ4YMcaiP/Txbt26VthkMhmL1TkxMFAqFQpw/f17aVvjzabFYREREhGjQoIHIzc2Vymzbtk0AEK+++qq0bejQocLf39/h+KtXry5WDyGE8Pf3d3id7+R71blzZ4f3f926dQKA6NWrV7HvQEmKPl8IIf79738LACI9PV3aBkCMGTPmlsex/660f77tj5s3b+7wWtvfi8KvVevWrUV4eLi4du2atO3w4cNCqVQ6vJf29+KRRx5xOPcLL7wgAIjDhw871Nf+uUhISBAqlUqsXbvW4Xl3+vurquIwVhUSHx+PGjVqoE6dOhg4cCACAgKwZs0a1KpVC0DBKi2lsuAjYbVace3aNQQEBKBJkyY4cOCAdJxvv/0WYWFhePHFF4udoyxdzrcyZMgQVKtWTXr8r3/9CzVr1sS6desAFCxFPXPmDP7v//4P165dQ0ZGBjIyMqDX69GtWzf89ttvDsMmQMFwm4+PT6nn/e6772Cz2TBgwADpmBkZGYiMjESjRo2wdetWh/ImkwlAwet1K6tXr0ZQUBC6d+/ucMzY2FgEBAQUO6bZbHYol5GRUaxLu6hLly5hwYIFmDx5crGhhtWrV6NZs2Zo2rSpwzHtQ5dFz18Rq1evxgMPPICQkBCHc8XHx8NqteK3335zKG8wGIq11Wq1lnjsnJwcZGRkOAwblYfRaERGRgbS09OxceNGbNmyBd26dSv1OZs2bYLJZMK4ceOk7wUAjBgxAoGBgcWWRT/zzDNSLykANGrUCI888gjWr18
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABfP0lEQVR4nO3dd3gUVcM28Ht2N7upm14hjd4Fg2KkCpFqQbA+qIAoikGliIoNxEdR8VEQEfRTAV8LggpYQTqiASVKbwECiUA66WXb+f7Y7JBNgSRsssnk/l3uJTtzduacrXfOOTMjCSEEiIiIiBRK5ewKEBERETUkhh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNI2zK0BERKQEBoMBOTk5sFgsCAsLc3Z1qAL27BARUZP2+eef48yZM/L9FStW4Ny5c86rUAV79+7Ff/7zHwQEBECn0yE0NBRjx451drWoEoYdBVmxYgUkSZJvrq6u6NChA6ZOnYr09HRnV4+IqF5+++03PPPMMzhz5gw2btyI+Ph4qFTO//lav349+vXrhyNHjuC1117Dpk2bsGnTJnz44YfOrhpVwmEsBZo3bx6io6NRWlqKXbt2YenSpfj5559x6NAhuLu7O7t6RER1Mn36dAwaNAjR0dEAgBkzZiA0NNSpdcrJycHDDz+MYcOGYc2aNdBqtU6tD10ew44CjRgxAr179wYAPPzww/D398c777yD9evX47777nNy7YiI6qZTp044deoUDh06hICAALRt29bZVcLy5ctRWlqKFStWMOg0A87vB6QGN3jwYABAcnIyAOtfJE8//TS6d+8OT09P6PV6jBgxAvv376/y2NLSUsydOxcdOnSAq6srQkNDMWbMGJw6dQoAcObMGbuhs8q3QYMGydvavn07JEnC119/jeeffx4hISHw8PDAbbfdhtTU1Cr73rNnD4YPHw5vb2+4u7tj4MCB+P3336tt46BBg6rd/9y5c6uU/fzzzxETEwM3Nzf4+fnh3nvvrXb/l2tbRRaLBQsXLkTXrl3h6uqK4OBgPProo7h48aJduaioKNxyyy1V9jN16tQq26yu7gsWLKjynAJAWVkZ5syZg3bt2kGn0yE8PBzPPPMMysrKqn2uKho0aBC6detWZfnbb78NSZLs5kkAQG5uLqZNm4bw8HDodDq0a9cOb775JiwWi1zG9ry9/fbbVbbbrVu3at8T33zzTY11nDBhAqKioq7YlqioKPn1UalUCAkJwT333IOUlJRaPXbChAl2yyZPngxXV1ds377dbvkHH3yArl27QqfTISwsDPHx8cjNzbUrU9vntWKdq7vZ2l3xOX333XcRGRkJNzc3DBw4EIcOHaqyn61bt6J///7w8PCAj48Pbr/9dhw9evSKz1vFW8V21/TeragurzsAZGRkYNKkSQgODoarqyuuueYarFy5stptrlixAh4eHujTpw/atm2L+Ph4SJJU5TWrqU62m4uLC6KiojBr1iwYDAa5nG0KwN69e2vc1qBBg+zasHv3bvTs2ROvv/66/Hlo37493njjDbvPAwCYTCa8+uqraNu2LXQ6HaKiovD8889X+Yzanudff/0VPXv2hKurK7p06YLvvvvOrpytvhU/n4cPH4avry9uueUWmEwmeXltPrMtAXt2WgBbMPH39wcAnD59GuvWrcNdd92F6OhopKen48MPP8TAgQNx5MgR+SgCs9mMW265BVu2bMG9996Lp556CgUFBdi0aRMOHTpk99fVfffdh5EjR9rtd/bs2dXW57XXXoMkSXj22WeRkZGBhQsXIi4uDvv27YObmxsA65f1iBEjEBMTgzlz5kClUmH58uUYPHgwfvvtN1x//fVVttu6dWvMnz8fAFBYWIgpU6ZUu++XXnoJd999Nx5++GFkZmZi8eLFGDBgAP755x/4+PhUeczkyZPRv39/AMB3332HtWvX2q1/9NFHsWLFCkycOBFPPvkkkpOT8f777+Off/7B77//DhcXl2qfh7rIzc2V21aRxWLBbbfdhl27dmHy5Mno3LkzDh48iHfffRcnTpzAunXrrnrfNsXFxRg4cCDOnTuHRx99FBEREfjjjz8we/ZsXLhwAQsXLnTYvuqrf//+mDx5MiwWCw4dOoSFCxfi/Pnz+O233+q0nTlz5uCTTz7B119/bfcDN3fuXLzyyiuIi4vDlClTcPz4cSxduhR//fVXvV7rhQsXorCwEABw9OhRvP7663j++efRuXNnAICnp6dd+c8++wwFBQWIj49HaWkpFi1ahMGDB+PgwYMIDg4GAGzevBkjRoxAmzZtMHfuXJSUlGDx4sXo27cv/v7772qDo+15q1iPhlRSUoJBgwbh5MmTmDp1KqKjo7FmzRpMmDABubm5eOqpp2p87MmTJ/H//t//q9P+bJ/hsrIybNy4EW+//TZcXV3x6quv1rsN2dnZ2LVrF3bt2oWHHnoIMTEx2LJlC2bPno0zZ85g2bJlctmHH34YK1euxJ133omZM2diz549mD9/Po4ePVrl+yQpKQn33HMPHnvsMYwfPx7Lly/HXXfdhQ0bNuDmm2+uti6pqakYPnw4OnXqhNWrV0Ojsf60N4fPbKMRpBjLly8XAMTmzZtFZmamSE1NFatWrRL+/v7Czc1N/Pvvv0IIIUpLS4XZbLZ7bHJystDpdGLevHnysk8//VQAEO+8806VfVksFvlxAMSCBQuqlOnatasYOHCgfH/btm0CgGjVqpXIz8+Xl69evVoAEIsWLZK33b59ezFs2DB5P0IIUVxcLKKjo8XNN99cZV833nij6Natm3w/MzNTABBz5syRl505c0ao1Wrx2muv2T324MGDQqPRVFmelJQkAIiVK1fKy+bMmSMqfmx+++03AUB88cUXdo/dsGFDleWRkZFi1KhRVeoeHx8vKn8UK9f9mWeeEUFBQSImJsbuOf2///s/oVKpxG+//Wb3+GXLlgkA4vfff6+yv4oGDhwounbtWmX5ggULBACRnJwsL3v11VeFh4eHOHHihF3Z5557TqjVapGSkiKEqN97Ys2aNTXWcfz48SIyMvKy7RDC+vyOHz/ebtl//vMf4e7uXqfHfvjhhwKAWLx4sV2ZjIwModVqxdChQ+0+P++//74AID799FN5WV2eVxvbc7Ft27Yq62zPacXPsRBC7NmzRwAQ06dPl5f17NlTBAUFiezsbHnZ/v37hUqlEg8++GCVbbdq1UpMnDjxsvWo6b1bXR1r87ovXLhQABCff/65vMxgMIjY2Fjh6ekpfz/Ytrl8+XK53N133y26desmwsPDq7zeNdWp4uOFECIsLEyMHDlSvm/77vzrr79q3NbAgQPt2jBw4EABQMydO9eu3IQJEwQAcfDgQSGEEPv27RMAxMMPP2xX7umnnxYAxNatW+VlkZGRAoD49ttv5WV5eXkiNDRU9OrVq0p9k5OTRU5OjujSpYvo2LGjyMrKsttHbT+zLQGHsRQoLi4OgYGBCA8Px7333gtPT0+sXbsWrVq1AgDodDr5SAaz2Yzs7Gx4enqiY8eO+Pvvv+XtfPvttwgICMATTzxRZR+Vh13q4sEHH4SXl5d8/84770RoaCh+/vlnAMC+ffuQlJSE//znP8jOzkZWVhaysrJQVFSEIUOGYOfOnVW6YEtLS+Hq6nrZ/X733XewWCy4++675W1mZWUhJCQE7du3x7Zt2+zK27q5dTpdjdtcs2YNvL29cfPNN9ttMyYmBp6enlW2aTQa7cplZWWhtLT0svU+d+4cFi9ejJdeeqnKX/pr1qxB586d0alTJ7tt2oYuK+//aqxZswb9+/eHr6+v3b7i4uJgNpuxc+dOu/LFxcVV2mo2m6vddkFBAbKysqoMB9VVWVkZsrKykJGRgU2bNmHr1q0YMmRIrR+/fv16PP7445g1axamTp1qt27z5s0wGAyYNm2a3ZFAjzzyCPR6PX766Se78mazuUr7i4uLr6p9o0ePlj/HAHD99dejT58+8mfnwoUL2LdvHyZMmAA/Pz+5XI8ePXDzzTfL5SoyGAyXfY/b2N6
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABcO0lEQVR4nO3dd3gU1cIG8Hd2N7upm14hCaF30FAMKqBEqhXsXEVFUAwqRdDYKHpFxStYEPS7CnoVuaKCohTpWAJKLqEIxACBRCCd9LLtfH9sdsiSQhI22WTy/h73MTtzduacnd3k5ZwzM5IQQoCIiIhIoVTOrgARERFRU2LYISIiIkVj2CEiIiJFY9ghIiIiRWPYISIiIkVj2CEiIiJFY9ghIiIiRWPYISIiIkVj2CEiojbPZDIhKysLaWlpzq4KNQGGHSIiajI//PADkpKS5Ofr16/Hn3/+6bwKVZGSkoIpU6YgNDQUWq0WwcHBiImJAW8soDwMO2Rn1apVkCRJfri6uqJr166YPn06MjMznV09ImplDh8+jKeffhopKSnYu3cvHn/8cRQVFTm7Wti7dy8GDRqEHTt24LnnnsOWLVuwdetWrF+/HpIkObt65GAS741FVa1atQoPP/wwFi5ciKioKJSXl+OXX37Bf/7zH0RGRuLIkSNwd3d3djWJqJXIzs7GkCFDcOLECQDA+PHj8c033zi1TgaDAf369YNer8dPP/0Eb29vp9aHmp7G2RWglmnMmDEYMGAAAODRRx+Fv78/3n77bXz33Xe47777nFw7ImotAgMDceTIEfkfSj169HB2lbBhwwYkJyfj+PHjDDptBIexqF5uvPFGAEBqaioAIC8vD8888wz69OkDT09P6PV6jBkzBgcPHqz22vLycsyfPx9du3aFq6srQkNDMX78eJw8eRIAcPr0abuhs0sfw4cPl7e1a9cuSJKE//73v3j++ecREhICDw8P3HrrrUhPT6+273379mH06NHw9vaGu7s7hg0bhl9//bXGNg4fPrzG/c+fP79a2c8//xzR0dFwc3ODn58f7r333hr3X1fbqrJYLFi6dCl69eoFV1dXBAcH47HHHsOFCxfsynXo0AE333xztf1Mnz692jZrqvvixYurvacAUFFRgXnz5qFz587Q6XQIDw/H3LlzUVFRUeN7VdXw4cPRu3fvasvfeustSJKE06dP2y3Pz8/HjBkzEB4eDp1Oh86dO+ONN96AxWKRy9jet7feeqvadnv37l3jZ+Lrr7+utY4PPfQQOnTocNm2dOjQQT4+KpUKISEhuOeeey47abXq62p6VN13fY81AGzatAnDhg2Dl5cX9Ho9Bg4ciNWrVwOo/fNa02fMZDLhlVdeQadOnaDT6dChQwc8//zz1Y5vfdtfUlKC2bNny8ewW7dueOutt6rNdbF9BnU6HaKjo9GjR49aP4M1qdoWtVqNdu3aYerUqcjPz5fLNOb47927F1FRUfjmm2/QqVMnaLVaREREYO7cuSgrK6v2+g8++AC9evWCTqdDWFgY4uLi7OoAXPweJCYmYsiQIXBzc0NUVBRWrFhhV85W3127dsnLzp07hw4dOmDAgAEoLi6Wl1/J95LssWeH6sUWTPz9/QEAp06dwvr163HXXXchKioKmZmZ+PDDDzFs2DAcPXoUYWFhAACz2Yybb74Z27dvx7333ounn34aRUVF2Lp1K44cOYJOnTrJ+7jvvvswduxYu/3Gx8fXWJ9//vOfkCQJzz77LLKysrB06VLExsYiKSkJbm5uAIAdO3ZgzJgxiI6Oxrx586BSqbBy5UrceOON+PnnnzFo0KBq223fvj0WLVoEACguLsa0adNq3PdLL72Eu+++G48++iiys7Px3nvvYejQoThw4AB8fHyqvWbq1Km4/vrrAQDffvst1q1bZ7f+sccek4cQn3rqKaSmpuL999/HgQMH8Ouvv8LFxaXG96Eh8vPz5bZVZbFYcOutt+KXX37B1KlT0aNHDxw+fBhLlizBX3/9hfXr11/xvm1KS0sxbNgwnD17Fo899hgiIiLw22+/IT4+HufPn8fSpUsdtq/Guv766zF16lRYLBYcOXIES5cuxblz5/Dzzz/X+pqlS5fKf6SOHTuG1157Dc8//7zci+Hp6SmXre+xXrVqFR555BH06tUL8fHx8PHxwYEDB7B582bcf//9eOGFF/Doo48CAHJycjBz5ky7z1lVjz76KD799FPceeedmD17Nvbt24dFixbh2LFj1T6Ll2u/EAK33nordu7cicmTJ6N///7YsmUL5syZg7Nnz2LJkiW1vk+1fQbrcscdd2D8+PEwmUxISEjARx99hLKyMvznP/9p0Haqys3NxalTp/D8889j/PjxmD17Nvbv34/FixfjyJEj+PHHH+WwOH/+fCxYsACxsbGYNm0akpOTsXz5cvzxxx/VvpsXLlzA2LFjcffdd+O+++7DV199hWnTpkGr1eKRRx6psS4FBQUYM2YMXFxcsHHjRvmz0pzfyzZBEFWxcuVKAUBs27ZNZGdni/T0dLFmzRrh7+8v3NzcxN9//y2EEKK8vFyYzWa716ampgqdTicWLlwoL/vkk08EAPH2229X25fFYpFfB0AsXry4WplevXqJYcOGyc937twpAIh27dqJwsJCeflXX30lAIh33nlH3naXLl3EqFGj5P0IIURpaamIiooSN910U7V9DRkyRPTu3Vt+np2dLQCIefPmyctOnz4t1Gq1+Oc//2n32sOHDwuNRlNteUpKigAgPv30U3nZvHnzRNWv3s8//ywAiC+++MLutZs3b662PDIyUowbN65a3ePi4sSlX+dL6z537lwRFBQkoqOj7d7T//znP0KlUomff/7Z7vUrVqwQAMSvv/5abX9VDRs2TPTq1ava8sWLFwsAIjU1VV72yiuvCA8PD/HXX3/ZlX3uueeEWq0WaWlpQojGfSbWrl1bax0nTZokIiMj62yHENb3d9KkSXbL7r//fuHu7n7Z115an507d1ZbV99jnZ+fL7y8vMTgwYNFWVmZXdmqn2cb2/u1cuXKauuSkpIEAPHoo4/aLX/mmWcEALFjxw55WX3av379egFAvPrqq3bl7rzzTiFJkjhx4oS8rL6fwdpc+nohrN/Tnj17ys8bc/wnTZokAIiHHnrIrpztu7lhwwYhhBBZWVlCq9WKkSNH2v2+e//99wUA8cknn8jLhg0bJgCIf/3rX/KyiooK0b9/fxEUFCQMBoNdfXfu3CnKy8vF8OHDRVBQkN37JsSVfy/JHoexqEaxsbEIDAxEeHg47r33Xnh6emLdunVo164dAECn00Glsn58zGYzcnNz4enpiW7duuF///ufvJ1vvvkGAQEBePLJJ6vt40rOeHjwwQfh5eUlP7/zzjsRGhqKjRs3AgCSkpKQkpKC+++/H7m5ucjJyUFOTg5KSkowYsQI7Nmzx27YBLAOt7m6uta532+//RYWiwV33323vM2cnByEhISgS5cu2Llzp115g8EAwPp+1Wbt2rXw9vbGTTfdZLfN6OhoeHp6Vtum0Wi0K5eTk4Py8vI663327Fm89957eOmll+x6GWz779GjB7p37263TdvQ5aX7vxJr167F9ddfD19fX7t9xcbGwmw2Y8+ePXblS0tLq7XVbDbXuO2ioiLk5ORUG15oqIqKCuTk5CArKwtbt27Fjh07MGLEiCvapk19j/XWrVtRVFSE5557rtpnsqHfG9t3YtasWXbLZ8+eDQD48ccf7ZZfrv0bN26EWq3GU089VW17Qghs2rSpxnrU9Rmsi+0zkJGRgW+++QYHDx6s8Xg05vjPmTPH7vnMmTOhVqvl92Tbtm0wGAyYMWOG/PsOAKZMmQK9Xl/tvdNoNHjsscfk51qtFo899hiysrKQmJhoV9ZiseDBBx/E3r17sXHjRrtebqB5v5dtAYexqEbLli1D165dodFoEBwcjG7dutl92S0WC9555x188MEHSE1NtfsDZBvqAqzDX926dYNG49iPWpcuXeyeS5KEzp07y/NDUlJSAACTJk2qdRsFBQXw9fWVn+fk5FTb7qVSUlIghKi13KX
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Средняя цена в обучающей выборке: 373.7302916666667\n",
|
|||
|
"Средняя цена в контрольной выборке: 372.217\n",
|
|||
|
"Средняя цена в тестовой выборке: 372.217\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Оценка сбалансированности целевой переменной (цена)\n",
|
|||
|
"# Визуализация распределения цены в выборках (гистограмма)\n",
|
|||
|
"def plot_price_distribution(data, title):\n",
|
|||
|
" sns.histplot(data['price'], kde=True)\n",
|
|||
|
" plt.title(title)\n",
|
|||
|
" plt.xlabel('Цена')\n",
|
|||
|
" plt.ylabel('Частота')\n",
|
|||
|
" plt.show()\n",
|
|||
|
"\n",
|
|||
|
"plot_price_distribution(train_data, 'Распределение цены в обучающей выборке')\n",
|
|||
|
"plot_price_distribution(val_data, 'Распределение цены в контрольной выборке')\n",
|
|||
|
"plot_price_distribution(test_data, 'Распределение цены в тестовой выборке')\n",
|
|||
|
"\n",
|
|||
|
"# Оценка сбалансированности данных по целевой переменной (price)\n",
|
|||
|
"print(\"Средняя цена в обучающей выборке: \", train_data['price'].mean())\n",
|
|||
|
"print(\"Средняя цена в контрольной выборке: \", val_data['price'].mean())\n",
|
|||
|
"print(\"Средняя цена в тестовой выборке: \", test_data['price'].mean())"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 9,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAHHCAYAAABeLEexAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABS7UlEQVR4nO3deVhO+f8/8Odd2leVNpLsW1lCsqQhJdkNg0b2NUwYfPqMsY4x9l2YQZbMYAxmLJEtW7bILsu3RmNahIpQqffvD7/Ox+0uKuWO83xc131dzjnv+5zXWZz72Tnvc98KIYQAERERkYxpqLsAIiIiInVjICIiIiLZYyAiIiIi2WMgIiIiItljICIiIiLZYyAiIiIi2WMgIiIiItljICIiIiLZYyAiIqIiSU9PR1xcHJ48eaLuUqiYPX36FLGxsUhPT1d3KR8NAxERfbKePXuGxYsXS8MpKSlYsWKF+gqSge3bt6NNmzYwMjKCoaEhKlasiLlz56q7rE9CaT5ehRBYs2YNmjZtCn19fRgbG8PBwQGbN29Wd2kfjYI/3VE8goODMWDAAGlYR0cHFStWhKenJ77//ntYWVmpsTqiz1N2djZMTEywevVquLm5YcGCBbh16xZCQ0PVXdpn6T//+Q/mzJmDzp07o1evXrCwsIBCoUD16tVhZ2en7vJKvdJ8vPbu3Rtbt25Fv3790KFDB5iYmEChUMDJyQnlypVTd3kfRRl1F/C5mTFjBhwcHPDy5UucPHkSQUFB2LdvH65duwZ9fX11l0f0WdHU1MT06dPh5+eHnJwcGBsbY+/eveou67MUHh6OOXPmYPbs2fjPf/6j7nI+SaX1eN24cSO2bt2KzZs3o0+fPuouR214haiY5F4hOn/+PBo1aiSNHz9+PBYuXIgtW7agd+/eaqyQ6PP1zz//IC4uDrVq1YKpqam6y/ksdezYEY8fP8apU6fUXconr7Qdr46OjnByckJISIi6S1Er9iEqYa1btwYAxMTEAAAeP36Mb7/9Fo6OjjA0NISxsTG8vb1x+fJllfe+fPkS06ZNQ/Xq1aGrqwsbGxt069YN9+7dAwDExsZCoVDk+3J3d5fmdezYMSgUCmzduhX//e9/YW1tDQMDA3Tq1AlxcXEqyz579izatWsHExMT6Ovro1WrVvmeCN3d3fNc/rRp01Tabt68Gc7OztDT04OZmRl69eqV5/LftW5vysnJweLFi1GnTh3o6urCysoKw4YNU+nkWalSJXTo0EFlOaNGjVKZZ161z5s3T2WbAkBGRgamTp2KqlWrQkdHB3Z2dpg4cSIyMjLy3FZvcnd3V5nfrFmzoKGhgS1btkjjTpw4gR49eqBixYrSMsaOHYsXL15Ibfr37//OY0GhUCA2NlZqv3//frRs2RIGBgYwMjKCj48Prl+/rlRLfvOsWrWqUruVK1eiTp060NHRga2tLfz9/ZGSkqKyrnXr1kVkZCSaNWsGPT09ODg4YNWqVUrtco/TY8eOKY338fFR2S/Tpk2T9l2FChXg6uqKMmXKwNraOs95vC33/cnJyUrjL1y4AIVCgeDgYKXxJXWsjRo1Kt8ag4ODVfZdXt7eV2XLloW7uztOnDjxzvflOnLkiHQ8mJqaonPnzrh586ZSmzNnzqBu3bro1asXzMzMoKenh8aNG2PXrl1Sm2fPnsHAwADffPONyjL++ecfaGpqYvbs2VLNlSpVUmn39n7++++/MXLkSNSoUQN6enowNzdHjx49VLZJXsfO+fPn0bZtWxgZGcHAwCDPbZK7jS9cuCCNS05OzvM80KFDhzxrLsj5sriO19yXkZERmjRporT9gf/9X8tP7rk19/hOT0/HtWvXYGdnBx8fHxgbG+e7rQDg//7v/9CjRw+YmZlBX18fTZs2VbnKVZjPm4KeB4HCfS4VBW+ZlbDc8GJubg7g9cG0a9cu9OjRAw4ODkhMTMTq1avRqlUr3LhxA7a2tgBe32vu0KEDDh8+jF69euGbb77B06dPERYWhmvXrqFKlSrSMnr37o327dsrLTcwMDDPembNmgWFQoFJkyYhKSkJixcvhoeHB6KioqCnpwfg9cnR29sbzs7OmDp1KjQ0NLB+/Xq0bt0aJ06cQJMmTVTmW6FCBelE9+zZM4wYMSLPZX///ffo2bMnBg8ejIcPH2LZsmVwc3PDpUuX8vxLaejQoWjZsiUA4I8//sDOnTuVpg8bNky6OjdmzBjExMRg+fLluHTpEk6dOgUtLa08t0NhpKSkSOv2ppycHHTq1AknT57E0KFDUatWLVy9ehWLFi3C7du3VU5U77N+/XpMnjwZCxYsULpsvX37djx//hwjRoyAubk5zp07h2XLluGff/7B9u3bAbzeDh4eHtJ7+vbti65du6Jbt27SuNx+AJs2bUK/fv3g5eWFOXPm4Pnz5wgKCkKLFi1w6dIlpRO+jo4OfvnlF6U6jYyMpH9PmzYN06dPh4eHB0aMGIHo6GgEBQXh/PnzKtv/yZMnaN++PXr27InevXtj27ZtGDFiBLS1tTFw4MB8t8vx48exb9++Am3DBQsWIDExsUBtC+tjHGsfwsLCAosWLQLwOnwsWbIE7du3R1xc3DuvQhw6dAje3t6oXLkypk2bhhcvXmDZsmVo3rw5Ll68KB0Pjx49wpo1a2BoaIgxY8agXLly2Lx5M7p164aQkBD07t0bhoaG6Nq1K7Zu3YqFCxdCU1NTWs6vv/4KIQR8fX0LtV7nz5/H6dOn0atXL1SoUAGxsbEICgqCu7s7bty4kW9XhLt378Ld3R36+vqYMGEC9PX18fPPP8PDwwNhYWFwc3MrVB35Kcr5MldRjtdNmzYBeB3aVq5ciR49euDatWuoUaNGkep/9OgRAGDOnDmwtrbGhAkToKurm+e2SkxMRLNmzfD8+XOMGTMG5ubm2LBhAzp16oTff/8dXbt2VZp3QT5v3pbfefBDtnOBCSoW69evFwDEoUOHxMOHD0VcXJz47bffhLm5udDT0xP//POPEEKIly9fiuzsbKX3xsTECB0dHTFjxgxp3Lp16wQAsXDhQpVl5eTkSO8DIObNm6fSpk6dOqJVq1bS8NGjRwUAUb58eZGWliaN37ZtmwAglixZIs27WrVqwsvLS1qOEEI8f/5cODg4iLZt26osq1mzZqJu3brS8MOHDwUAMXXqVGlcbGys0NTUFLNmzVJ679WrV0WZMmVUxt+5c0cAEBs2bJDGTZ06Vbx5yJ44cUIAECEhIUrvDQ0NVRlvb28vfHx8VGr39/cXb/83eLv2iRMnCktLS+Hs7Ky0TTdt2iQ0NDTEiRMnlN6/atUqAUCcOnVKZXlvatWqlTS/vXv3ijJlyojx48ertHv+/LnKuNmzZwuFQiH+/vvvPOf99jrkevr0qTA1NRVDhgxRGp+QkCBMTEyUxvfr108YGBjkW39SUpLQ1tYWnp6eSsf08uXLBQCxbt06pXUFIBYsWCCNy8jIEPXr1xeWlpYiMzNTCPG/4/To0aNSOxcXF+Ht7a2yTm8fD0lJScLIyEhq++Y88pL7/ocPHyqNP3/+vAAg1q9fL40ryWPN398/3xpzzysxMTHvXJd+/foJe3t7pXFr1qwRAMS5c+fe+d7cffDo0SNp3OXLl4WGhobw8/NTqhWAOHbsmDTu+fPnolatWsLa2lrahwcOHBAAxP79+5WW4+TkpPT/Z8CAAaJixYoq9by9n/M6/iMiIgQAsXHjRmnc28dO9+7dhaamprh27ZrUJjk5WZibmwtnZ2dpXO42Pn/+vDQur3OYEEL4+PgobefCnC+L63h908GDBwUAsW3bNmlcq1atRJ06dfKdT+7nRu7xnTusra0tbt++rbQN3t5WAQEBAoDSOe/p06fCwcFBVKpUSToPFPTzJrfe950Hi/K5VBS8ZVbMPDw8UK5cOdjZ2aFXr14wNDTEzp07Ub58eQCv/+LW0Hi92bOzs/Ho0SMYGhqiRo0auHjxojSfHTt2wMLCAqNHj1ZZxtuX3QvDz89P6S/8L7/8EjY2NtJf4FFRUbhz5w769OmDR48eITk5GcnJyUhPT0ebNm1w/Phx5OTkKM3
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAHHCAYAAABeLEexAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABSuklEQVR4nO3deVxN+f8H8NettKtU2kgismWLSZY0RIixjZ3sa9kH375jCIOx78IMMjPZDcYuO8nWyC7Lt4YxKltFaP38/vDo/JxukZQb5/V8PO7j4XzO557zPvece3s553PuVQkhBIiIiIgUTEvTBRARERFpGgMRERERKR4DERERESkeAxEREREpHgMRERERKR4DERERESkeAxEREREpHgMRERERKR4DERERESkeAxERKcqLFy+wcOFCaTohIQHLli3TXEFERVxgYCBUKpWsrWzZsujTp49mCiokDEQaFBwcDJVKJT309fVRsWJF+Pv7Iy4uTtPlEX2RDAwMMHHiRISEhOD+/fsIDAzErl27NF0WEWmYjqYLIGDq1KlwdHTE69evcerUKQQFBWHv3r24evUqDA0NNV0e0RdFW1sbU6ZMga+vLzIzM2FiYoI9e/Zouiyiz0pUVBS0tL6scyoMREVAy5YtUadOHQDAgAEDYGFhgfnz52Pnzp3o1q2bhqsj+vKMHTsWXbp0wf3791G5cmWYmZlpuiT6gqWnpyMzMxO6urqaLqXA6OnpabqEAvdlxbsvRJMmTQAA0dHRAICnT5/iu+++g4uLC4yNjWFiYoKWLVvi0qVLas99/fo1AgMDUbFiRejr68PW1hYdOnTA3bt3AQAxMTGyy3TZH56entKyjh07BpVKhU2bNuG///0vbGxsYGRkhG+++Qb3799XW/fZs2fRokULmJqawtDQEI0bN0ZYWFiO2+jp6Znj+gMDA9X6/v7773B1dYWBgQHMzc3RtWvXHNf/rm17W2ZmJhYuXIiqVatCX18f1tbWGDx4MJ49eybrV7ZsWbRu3VptPf7+/mrLzKn2OXPmqL2mAJCSkoLJkyfDyckJenp6sLe3x/jx45GSkpLja/U2T09PteVNnz4dWlpaWL9+vdR28uRJdOrUCWXKlJHWMXr0aLx69Urq06dPn3ceCyqVCjExMVL/ffv2oVGjRjAyMkLx4sXh4+ODa9euyWrJbZlOTk6yfsuXL0fVqlWhp6cHOzs7+Pn5ISEhQW1bq1WrhoiICNSvXx8GBgZwdHTEihUrZP2yjtNjx47J2n18fNT2y9tjIUqXLg13d3fo6OjAxsYmx2Vkl/X8x48fy9ovXLgAlUqF4OBgWXthHWv+/v651ph1Kf7tfZeT9+3/7K/Fli1bpPehpaUlevbsiQcPHqgt9+bNm+jcuTNKliwJAwMDODs74/vvv1frV7Zs2TytNy/HXW7+97//oVOnTjA3N4ehoSHq1asnOxsYFxcHHR0dTJkyRe25UVFRUKlUWLp0qdSWkJCAUaNGwd7eHnp6enBycsKsWbOQmZkp9cn6HJo7dy4WLlyI8uXLQ09PD9evXwcALFmyBFWrVoWhoSFKlCiBOnXqyN67f//9N4YNGwZnZ2cYGBjAwsICnTp1UtufWfv51KlTGDFiBEqWLAkzMzMMHjwYqampSEhIgK+vL0qUKIESJUpg/PjxEELkWOeCBQvg4OAAAwMDNG7cGFevXn3va5t9DFFWPWFhYRgzZgxKliwJIyMjtG/fHo8ePZI9NzMzE4GBgbCzs4OhoSG+/vprXL9+XePjkniGqAjKCi8WFhYA3rypd+zYgU6dOsHR0RFxcXFYuXIlGjdujOvXr8POzg4AkJGRgdatW+Pw4cPo2rUrRo4ciefPnyM0NBRXr15F+fLlpXV069YNrVq1kq03ICAgx3qmT58OlUqFCRMmID4+HgsXLoSXlxciIyNhYGAAADhy5AhatmwJV1dXTJ48GVpaWli7di2aNGmCkydP4quvvlJbbunSpTFz5kwAbwa6Dh06NMd1//DDD+jcuTMGDBiAR48eYcmSJfDw8MDFixdz/J/9oEGD0KhRIwDAH3/8ge3bt8vmDx48GMHBwejbty9GjBiB6OhoLF26FBcvXkRYWBiKFSuW4+vwIRISEqRte1tmZia++eYbnDp1CoMGDULlypVx5coVLFiwALdu3cKOHTs+aD1r167FxIkTMW/ePHTv3l1q37JlC16+fImhQ4fCwsIC586dw5IlS/DPP/9gy5YtAN68Dl5eXtJzevXqhfbt26NDhw5SW8mSJQEAv/32G3r37g1vb2/MmjULL1++RFBQEBo2bIiLFy+ibNmy0nP09PTwyy+/yOosXry49O/AwEBMmTIFXl5eGDp0KKKiohAUFITz58+rvf7Pnj1Dq1at0LlzZ3Tr1g2bN2/G0KFDoauri379+uX6upw4cQJ79+7N02s4b968Qhuz9ymOtY+R0746f/48Fi9eLGvL2oa6deti5syZiIuLw6JFixAWFiZ7H16+fBmNGjVCsWLFMGjQIJQtWxZ3797Frl27MH36dLX1N2rUCIMGDQIA3LhxAzNmzJDN/5DjLru4uDjUr18fL1++xIgRI2BhYYF169bhm2++wdatW9G+fXtYW1ujcePG2Lx5MyZPnix7/qZNm6CtrY1OnToBAF6+fInGjRvjwYMHGDx4MMqUKYPTp08jICAADx8+lA3UB968N1+/fo1BgwZBT08P5ubm+PnnnzFixAh8++23GDlyJF6/fo3Lly/j7Nmz0vv3/PnzOH36NLp27YrSpUsjJiYGQUFB8PT0xPXr19WGUQwfPhw2NjaYMmUKzpw5g1WrVsHMzAynT59GmTJlMGPGDOzduxdz5sxBtWrV4OvrK3v+r7/+iufPn8PPzw+vX7/GokWL0KRJE1y5cgXW1ta5vr65GT58OEqUKIHJkycjJiYGCxcuhL+/PzZt2iT1CQgIwOzZs9GmTRt4e3vj0qVL8Pb2xuvXrz94fQVKkMasXbtWABCHDh0Sjx49Evfv3xcbN24UFhYWwsDAQPzzzz9CCCFev34tMjIyZM+Njo4Wenp6YurUqVLbmjVrBAAxf/58tXVlZmZKzwMg5syZo9anatWqonHjxtL00aNHBQBRqlQpkZSUJLVv3rxZABCLFi2Sll2hQgXh7e0trUcIIV6+fCkcHR1Fs2bN1NZVv359Ua1aNWn60aNHAoCYPHmy1BYTEyO0tbXF9OnTZc+9cuWK0NHRUWu/ffu2ACDWrVsntU2ePFm8fZifPHlSABAhISGy5+7fv1+t3cHBQfj4+KjV7ufnJ7K/dbLXPn78eGFlZSVcXV1lr+lvv/0mtLS0xMmTJ2XPX7FihQAgwsLC1Nb3tsaNG0vL27Nnj9DR0RFjx45V6/fy5Uu1tpkzZwqVSiX+/vvvHJedfRuyPH/+XJiZmYmBAwfK2mNjY4WpqamsvXfv3sLIyCjX+uPj44Wurq5o3ry57JheunSpACDWrFkj21YAYt68eVJbSkqKqFmzprCyshKpqalCiP8/To8ePSr1c3NzEy1btlTbpuzHQ3x8vChevLjU9+1l5CTr+Y8ePZK1nz9/XgAQa9euldoK81jz8/PLtcasz5Xo6Oh3bktu+2rLli2y1yI1NVVYWVmJatWqiVevXkn9du/eLQCISZMmSW0eHh6iePHiasfY258LWUqVKiX69u0rTWffjx9y3OVk1KhRAoDsvfb8+XPh6OgoypYtKx1/K1euFADElStXZM+vUqWKaNKkiTQ9bdo0YWRkJG7duiXr95///Edoa2uLe/fuCSH+/zPWxMRExMfHy/q2bdtWVK1a9Z115/TeDQ8PFwDEr7/+KrVl7efsn7vu7u5CpVKJIUOGSG3p6emidOnSss+irDrf/lsjhBBnz54VAMTo0aOltuzvGyHeHLO9e/dWq8fLy0tWz+jRo4W2trZISEgQQrzZfzo6OqJdu3ay5QUGBgoAsmV+arxkVgR4eXmhZMmSsLe3R9euXWFsbIzt27ejVKlSAN78Ly5r8FpGRgaePHkCY2NjODs746+//pKWs23bNlhaWmL48OFq68h+2v1D+Pr6yv6
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAHHCAYAAABeLEexAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABQ4ElEQVR4nO3deVxN+f8H8Ndt3yO0kWRLyCAmWRIiNIZhNIjsaxnb4NuMIQwNY18mzCBjMpYxdiI7ydbIrsFkGNMiVITWz+8Pj86v2y0q1Y3zej4e9/HofM7nnvM+95xbr875nHsVQggBIiIiIhnTUHcBREREROrGQERERESyx0BEREREssdARERERLLHQERERESyx0BEREREssdARERERLLHQERERESyx0BEREREssdAREQfpOfPn2PJkiXSdFJSElauXKm+gui9FxwcDIVCgXv37qm7lHJBoVAgICBAmn7fXx8GojKQc5DkPPT09FC3bl34+fkhPj5e3eURfZD09fUxbdo0hISE4MGDBwgICMCePXvUXRYRlVNa6i5ATmbNmgU7Ozu8evUKp0+fRlBQEPbv349r167BwMBA3eURfVA0NTUxc+ZM+Pj4IDs7GyYmJti3b5+6yyL6YA0YMAB9+vSBrq6uukspFgaiMtSlSxc0a9YMADBs2DBUqlQJixYtwq5du9C3b181V0f04Zk0aRK++OILPHjwAA4ODqhQoYK6SyKSpKamwtDQUN1llBhNTU1oamqqu4xi4yUzNWrfvj0AICYmBgDw5MkTfPXVV3B0dISRkRFMTEzQpUsXXL58WeW5r169QkBAAOrWrQs9PT1YWVmhZ8+euHv3LgDg3r17Spfp8j7c3NykZR0/fhwKhQJbtmzB119/DUtLSxgaGuLTTz/FgwcPVNZ97tw5dO7cGaampjAwMEDbtm0RHh6e7za6ubnlu/7c151z/Prrr3BycoK+vj7MzMzQp0+ffNf/pm3LLTs7G0uWLEGDBg2gp6cHCwsLjBw5Ek+fPlXqV6NGDXzyyScq6/Hz81NZZn61//DDDyqvKQCkpaVhxowZqF27NnR1dWFjY4MpU6YgLS0t39cqNzc3N5XlzZkzBxoaGti0aZPUdurUKfTu3RvVq1eX1jFhwgS8fPlS6jNo0KA3Hgt5r/kfOHAAbdq0gaGhIYyNjeHp6Ynr168r1VLQMmvXrq3U78cff0SDBg2gq6sLa2tr+Pr6IikpSWVbGzZsiMjISLRs2RL6+vqws7PDqlWrlPrlHKfHjx9Xavf09FTZLwEBAdK+q1atGlxcXKClpQVLS8t8l5FXzvMTExOV2i9evAiFQoHg4GCl9tI61vz8/AqssbDjNd62//O+Ftu2bZPeh5UrV0b//v3x8OFDleXeunULXl5eqFKlCvT19WFvb49vvvlGpV+NGjUKtd7CHHf5yb2v3/b65Lz+p0+fxscffww9PT3UrFkTv/zyi8rzr1+/jvbt20NfXx/VqlXDd999h+zs7HxrKOx7xsjICHfv3kXXrl1hbGwMb29vAMDt27fRq1cvWFpaQk9PD9WqVUOfPn2QnJwsPX/9+vVo3749zM3Noauri/r16yMoKEillpxtPH78OJo1awZ9fX04OjpKr/cff/wBR0dH6OnpwcnJCZcuXcq3zr///hseHh4wNDSEtbU1Zs2aBSFE/juhhF7zK1euoG3btkqv+fr168tsXBLPEKlRTnipVKkSAODvv//Gzp070bt3b9jZ2SE+Ph6rV69G27ZtcePGDVhbWwMAsrKy8Mknn+DIkSPo06cPxo0bh2fPniEsLAzXrl1DrVq1pHX07dsXXbt2VVqvv79/vvXMmTMHCoUCU6dORUJCApYsWQJ3d3dERUVBX18fAHD06FF06dIFTk5OmDFjBjQ0NKQ36qlTp/Dxxx+rLLdatWoIDAwE8Hqg6+jRo/Nd97fffgsvLy8MGzYMjx49wvLly+Hq6opLly7l+5/9iBEj0KZNGwCv3+Q7duxQmj9y5EgEBwdj8ODB+PLLLxETE4MVK1bg0qVLCA8Ph7a2dr6vQ1EkJSVJ25ZbdnY2Pv30U5w+fRojRoyAg4MDrl69isWLF+Ovv/7Czp07i7Se9evXY9q0aVi4cCH69esntW/btg0vXrzA6NGjUalSJZw/fx7Lly/Hv//+i23btgF4/Tq4u7tLzxkwYAA+++wz9OzZU2qrUqUKAGDjxo0YOHAgPDw8MG/ePLx48QJBQUFo3bo1Ll26hBo1akjP0dXVxc8//6xUp7GxsfRzQEAAZs6cCXd3d4wePRrR0dEICgrChQsXVF7/p0+fomvXrvDy8kLfvn2xdetWjB49Gjo6OhgyZEiBr8vJkyexf//+Qr2GCxcuLLUxe2VxrL2L/PbVhQsXsGzZMqW2nG1o3rw5AgMDER8fj6VLlyI8PFzpfXjlyhW0adMG2traGDFiBGrUqIG7d+9iz549mDNnjsr627RpgxEjRgAAbt68iblz5yrNL8px967u3LmDzz//HEOHDsXAgQOxbt06DBo0CE5OTmjQoAEAIC4uDu3atUNmZib+97//wdDQEGvWrJF+Dxa39szMTHh4eKB169ZYsGABDAwMkJ6eDg8PD6SlpWHs2LGwtLTEw4cPsXfvXiQlJcHU1BQAEBQUhAYNGuDTTz+FlpYW9uzZgzFjxiA7Oxu+vr4q29ivXz+MHDkS/fv3x4IFC9CtWzesWrUKX3/9NcaMGQMACAwMhJeXF6Kjo6Gh8f/nR7KystC5c2e0aNEC8+fPR2hoKGbMmIHMzEzMmjWrVF7zhw8fol27dlAoFPD394ehoSF+/vnnsr38JqjUrV+/XgAQhw8fFo8ePRIPHjwQmzdvFpUqVRL6+vri33//FUII8erVK5GVlaX03JiYGKGrqytmzZolta1bt04AEIsWLVJZV3Z2tvQ8AOKHH35Q6dOgQQPRtm1bafrYsWMCgKhatapISUmR2rdu3SoAiKVLl0rLrlOnjvDw8JDWI4QQL168EHZ2dqJjx44q62rZsqVo2LChNP3o0SMBQMyYMUNqu3fvntDU1BRz5sxReu7Vq1eFlpaWSvvt27cFALFhwwapbcaMGSL34Xzq1CkBQISEhCg9NzQ0VKXd1tZWeHp6qtTu6+sr8r5F8tY+ZcoUYW5uLpycnJRe040bNwoNDQ1x6tQppeevWrVKABDh4eEq68utbdu20vL27dsntLS0xKRJk1T6vXjxQqUtMDBQKBQK8c8//+S77LzbkOPZs2eiQoUKYvjw4UrtcXFxwtTUVKl94MCBwtDQsMD6ExIShI6OjujUqZPSMb1ixQoBQKxbt05pWwGIhQsXSm1paWmicePGwtzcXKSnpwsh/v84PXbsmNTP2dlZdOnSRWWb8h4PCQkJwtjYWOqbexn5yXn+o0ePlNovXLggAIj169dLbaV5rPn6+hZYY87vlZiYmDduS0H7atu2bUqvRXp6ujA3NxcNGzYUL1++lPrt3btXABDTp0+X2lxdXYWxsbHKMZb790KOqlWrisGDB0vTefdjUY67/OTd1znye31sbW0FAHHy5EmpLSEhQejq6iq9v8aPHy8AiHPnzin1MzU1VVpmUd8zAMT//vc/pb6XLl0SAMS2bdveuJ35vdc9PDxEzZo1ldpytvHMmTNS28GDBwUAoa+vr7TPVq9erfJ+yKlz7NixUlt2drbw9PQUOjo6Su+JvO+7d3nNx44dKxQKhbh06ZLU9vjxY2FmZlao47wk8JJZGXJ3d0eVKlVgY2ODPn36wMjICDt27EDVqlUBvP4vLielZ2Vl4fHjxzAyMoK9vT3+/PNPaTnbt29H5cqVMXbsWJV15HfquLB8fHyU/sP//PPPYWVlJf0HHhUVhdu3b6Nfv354/PgxEhMTkZiYiNTUVHTo0AEnT55UOaX86tUr6OnpvXG9f/zxB7Kzs+Hl5SUtMzExEZaWlqhTpw6OHTum1D89PR0A3vifw7Zt22BqaoqOHTsqLdPJyQlGRkYqy8zIyFDql5iYiFevXr2x7ocPH2L58uX49ttvYWRkpLJ+BwcH1KtXT2mZOZdJ866/IOfPn4eXlxd
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Размер обучающей выборки после oversampling и undersampling: 12108\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"from imblearn.over_sampling import RandomOverSampler\n",
|
|||
|
"from imblearn.under_sampling import RandomUnderSampler\n",
|
|||
|
"\n",
|
|||
|
"# Преобразование целевой переменной (цены) в категориальные диапазоны с использованием квантилей\n",
|
|||
|
"train_data['price_category'] = pd.qcut(train_data['price'], q=4, labels=['low', 'medium', 'high', 'very_high'])\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация распределения цен после преобразования в категории\n",
|
|||
|
"sns.countplot(x=train_data['price_category'])\n",
|
|||
|
"plt.title('Распределение категорий цены в обучающей выборке')\n",
|
|||
|
"plt.xlabel('Категория цены')\n",
|
|||
|
"plt.ylabel('Частота')\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# Балансировка категорий с помощью RandomOverSampler (увеличение меньшинств)\n",
|
|||
|
"ros = RandomOverSampler(random_state=42)\n",
|
|||
|
"X_train = train_data.drop(columns=['price', 'price_category'])\n",
|
|||
|
"y_train = train_data['price_category']\n",
|
|||
|
"\n",
|
|||
|
"X_resampled, y_resampled = ros.fit_resample(X_train, y_train)\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация распределения цен после oversampling\n",
|
|||
|
"sns.countplot(x=y_resampled)\n",
|
|||
|
"plt.title('Распределение категорий цены после oversampling')\n",
|
|||
|
"plt.xlabel('Категория цены')\n",
|
|||
|
"plt.ylabel('Частота')\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# Применение RandomUnderSampler для уменьшения большего класса\n",
|
|||
|
"rus = RandomUnderSampler(random_state=42)\n",
|
|||
|
"X_resampled, y_resampled = rus.fit_resample(X_resampled, y_resampled)\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация распределения цен после undersampling\n",
|
|||
|
"sns.countplot(x=y_resampled)\n",
|
|||
|
"plt.title('Распределение категорий цены после undersampling')\n",
|
|||
|
"plt.xlabel('Категория цен')\n",
|
|||
|
"plt.ylabel('Частота')\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# Печать размеров выборки после балансировки\n",
|
|||
|
"print(\"Размер обучающей выборки после oversampling и undersampling: \", len(X_resampled))"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Конструирование признаков\n",
|
|||
|
"\n",
|
|||
|
"**Процесс конструирования признаков для решения двух задач:**\n",
|
|||
|
"\n",
|
|||
|
"**Задача 1:** Оптимизация ассортимента товаров в онлайн-магазине. \n",
|
|||
|
"**Цель технического проекта:** Разработка модели для прогнозирования спроса на товары.\n",
|
|||
|
"\n",
|
|||
|
"**Задача 2:** Оптимизация ценовой политики. \n",
|
|||
|
"**Цель технического проекта:** Разработка модели для прогнозирования оптимальной цены товаров.\n",
|
|||
|
"\n",
|
|||
|
"**Унитарное кодирование** \n",
|
|||
|
"Унитарное кодирование категориальных признаков (one-hot encoding). Преобразование категориальных признаков в бинарные векторы.\n",
|
|||
|
"\n",
|
|||
|
"**Дискретизация числовых признаков** \n",
|
|||
|
"Процесс преобразования непрерывных числовых значений в дискретные категории или интервалы (бины)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 11,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Столбцы train_data_encoded: ['href', 'items', 'price', 'price_category', 'category_Groceries', 'sub_category_Dairy & Bakery', 'sub_category_Fruits & Vegetables', 'sub_category_Premium Fruits', 'sub_category_Snacks & Branded Foods', 'sub_category_Staples']\n",
|
|||
|
"Столбцы val_data_encoded: ['href', 'items', 'price', 'category_Groceries', 'sub_category_Dairy & Bakery', 'sub_category_Fruits & Vegetables', 'sub_category_Premium Fruits', 'sub_category_Snacks & Branded Foods', 'sub_category_Staples']\n",
|
|||
|
"Столбцы test_data_encoded: ['href', 'items', 'price', 'category_Groceries', 'sub_category_Dairy & Bakery', 'sub_category_Fruits & Vegetables', 'sub_category_Premium Fruits', 'sub_category_Snacks & Branded Foods', 'sub_category_Staples']\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Конструирование признаков\n",
|
|||
|
"# Унитарное кодирование категориальных признаков (применение one-hot encoding)\n",
|
|||
|
"\n",
|
|||
|
"# Пример категориальных признаков\n",
|
|||
|
"categorical_features = ['category', 'sub_category']\n",
|
|||
|
"\n",
|
|||
|
"# Применение one-hot encoding\n",
|
|||
|
"train_data_encoded = pd.get_dummies(train_data, columns=categorical_features)\n",
|
|||
|
"val_data_encoded = pd.get_dummies(val_data, columns=categorical_features)\n",
|
|||
|
"test_data_encoded = pd.get_dummies(test_data, columns=categorical_features)\n",
|
|||
|
"df_encoded = pd.get_dummies(df, columns=categorical_features)\n",
|
|||
|
"\n",
|
|||
|
"print(\"Столбцы train_data_encoded:\", train_data_encoded.columns.tolist())\n",
|
|||
|
"print(\"Столбцы val_data_encoded:\", val_data_encoded.columns.tolist())\n",
|
|||
|
"print(\"Столбцы test_data_encoded:\", test_data_encoded.columns.tolist())\n",
|
|||
|
"\n",
|
|||
|
"# Дискретизация числовых признаков (цены). Например, можно разделить цену на категории\n",
|
|||
|
"# Пример дискретизации признака 'price'\n",
|
|||
|
"train_data_encoded['price_category'] = pd.cut(train_data_encoded['price'], bins=5, labels=False)\n",
|
|||
|
"val_data_encoded['price_category'] = pd.cut(val_data_encoded['price'], bins=5, labels=False)\n",
|
|||
|
"test_data_encoded['price_category'] = pd.cut(test_data_encoded['price'], bins=5, labels=False)\n",
|
|||
|
"\n",
|
|||
|
"# Пример дискретизации признака 'price' на 5 категорий\n",
|
|||
|
"df_encoded['price_category'] = pd.cut(df_encoded['price'], bins=5, labels=False)\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Ручной синтез\n",
|
|||
|
"Создание новых признаков на основе экспертных знаний и логики предметной области. К примеру, для данных о продаже домов можно создать признак цена за единицу товара."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 14,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Преобразуем столбцы 'price' и 'items' в числовой формат\n",
|
|||
|
"train_data_encoded['price'] = pd.to_numeric(train_data_encoded['price'], errors='coerce')\n",
|
|||
|
"train_data_encoded['items'] = pd.to_numeric(train_data_encoded['items'], errors='coerce')\n",
|
|||
|
"\n",
|
|||
|
"val_data_encoded['price'] = pd.to_numeric(val_data_encoded['price'], errors='coerce')\n",
|
|||
|
"val_data_encoded['items'] = pd.to_numeric(val_data_encoded['items'], errors='coerce')\n",
|
|||
|
"\n",
|
|||
|
"test_data_encoded['price'] = pd.to_numeric(test_data_encoded['price'], errors='coerce')\n",
|
|||
|
"test_data_encoded['items'] = pd.to_numeric(test_data_encoded['items'], errors='coerce')\n",
|
|||
|
"\n",
|
|||
|
"df_encoded['price'] = pd.to_numeric(df_encoded['price'], errors='coerce')\n",
|
|||
|
"df_encoded['items'] = pd.to_numeric(df_encoded['items'], errors='coerce')\n",
|
|||
|
"\n",
|
|||
|
"# Ручной синтез признаков\n",
|
|||
|
"train_data_encoded['price_per_item'] = train_data_encoded['price'] / train_data_encoded['items']\n",
|
|||
|
"val_data_encoded['price_per_item'] = val_data_encoded['price'] / val_data_encoded['items']\n",
|
|||
|
"test_data_encoded['price_per_item'] = test_data_encoded['price'] / test_data_encoded['items']\n",
|
|||
|
"\n",
|
|||
|
"# Пример создания нового признака - цена за единицу товара\n",
|
|||
|
"df_encoded['price_per_item'] = df_encoded['price'] / df_encoded['items']\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Масштабирование признаков - это процесс преобразования числовых признаков таким образом, чтобы они имели одинаковый масштаб. Это важно для многих алгоритмов машинного обучения, которые чувствительны к масштабу признаков, таких как линейная регрессия, метод опорных векторов (SVM) и нейронные сети."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 15,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from sklearn.preprocessing import StandardScaler, MinMaxScaler\n",
|
|||
|
"\n",
|
|||
|
"# Пример масштабирования числовых признаков\n",
|
|||
|
"numerical_features = ['price', 'items']\n",
|
|||
|
"\n",
|
|||
|
"# Масштабирование с помощью StandardScaler\n",
|
|||
|
"scaler = StandardScaler()\n",
|
|||
|
"\n",
|
|||
|
"train_data_encoded[numerical_features] = scaler.fit_transform(train_data_encoded[numerical_features])\n",
|
|||
|
"val_data_encoded[numerical_features] = scaler.transform(val_data_encoded[numerical_features])\n",
|
|||
|
"test_data_encoded[numerical_features] = scaler.transform(test_data_encoded[numerical_features])\n",
|
|||
|
"\n",
|
|||
|
"# Если хотите использовать MinMaxScaler вместо StandardScaler, можно заменить:\n",
|
|||
|
"# scaler = MinMaxScaler()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Конструирование признаков с применением фреймворка Featuretools"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 17,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
" href items price \\\n",
|
|||
|
"9839 https://www.jiomart.com/c/groceries/snacks-bra... NaN -0.442827 \n",
|
|||
|
"9680 https://www.jiomart.com/c/groceries/snacks-bra... NaN -0.635331 \n",
|
|||
|
"7093 https://www.jiomart.com/c/groceries/staples/so... NaN 0.424527 \n",
|
|||
|
"11293 https://www.jiomart.com/c/groceries/snacks-bra... NaN -0.728339 \n",
|
|||
|
"820 https://www.jiomart.com/c/groceries/dairy-bake... NaN -0.624517 \n",
|
|||
|
"... ... ... ... \n",
|
|||
|
"5191 https://www.jiomart.com/c/groceries/staples/ma... NaN -0.659124 \n",
|
|||
|
"13418 https://www.jiomart.com/c/groceries/snacks-bra... NaN 0.846307 \n",
|
|||
|
"5390 https://www.jiomart.com/c/groceries/staples/ma... NaN -0.600724 \n",
|
|||
|
"860 https://www.jiomart.com/c/groceries/staples/at... NaN -0.702384 \n",
|
|||
|
"7270 https://www.jiomart.com/c/groceries/staples/dr... NaN -0.343330 \n",
|
|||
|
"\n",
|
|||
|
" price_category category_Groceries sub_category_Dairy & Bakery \\\n",
|
|||
|
"9839 0 True False \n",
|
|||
|
"9680 0 True False \n",
|
|||
|
"7093 0 True False \n",
|
|||
|
"11293 0 True False \n",
|
|||
|
"820 0 True True \n",
|
|||
|
"... ... ... ... \n",
|
|||
|
"5191 0 True False \n",
|
|||
|
"13418 0 True False \n",
|
|||
|
"5390 0 True False \n",
|
|||
|
"860 0 True False \n",
|
|||
|
"7270 0 True False \n",
|
|||
|
"\n",
|
|||
|
" sub_category_Fruits & Vegetables sub_category_Premium Fruits \\\n",
|
|||
|
"9839 False False \n",
|
|||
|
"9680 False False \n",
|
|||
|
"7093 False False \n",
|
|||
|
"11293 False False \n",
|
|||
|
"820 False False \n",
|
|||
|
"... ... ... \n",
|
|||
|
"5191 False False \n",
|
|||
|
"13418 False False \n",
|
|||
|
"5390 False False \n",
|
|||
|
"860 False False \n",
|
|||
|
"7270 False False \n",
|
|||
|
"\n",
|
|||
|
" sub_category_Snacks & Branded Foods sub_category_Staples \\\n",
|
|||
|
"9839 True False \n",
|
|||
|
"9680 True False \n",
|
|||
|
"7093 False True \n",
|
|||
|
"11293 True False \n",
|
|||
|
"820 False False \n",
|
|||
|
"... ... ... \n",
|
|||
|
"5191 False True \n",
|
|||
|
"13418 True False \n",
|
|||
|
"5390 False True \n",
|
|||
|
"860 False True \n",
|
|||
|
"7270 False True \n",
|
|||
|
"\n",
|
|||
|
" price_per_item \n",
|
|||
|
"9839 NaN \n",
|
|||
|
"9680 NaN \n",
|
|||
|
"7093 NaN \n",
|
|||
|
"11293 NaN \n",
|
|||
|
"820 NaN \n",
|
|||
|
"... ... \n",
|
|||
|
"5191 NaN \n",
|
|||
|
"13418 NaN \n",
|
|||
|
"5390 NaN \n",
|
|||
|
"860 NaN \n",
|
|||
|
"7270 NaN \n",
|
|||
|
"\n",
|
|||
|
"[11998 rows x 11 columns]\n",
|
|||
|
" price category_Groceries \\\n",
|
|||
|
"href \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... 109.0 True \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... 29.0 True \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... 13.0 True \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... 32.0 True \n",
|
|||
|
"https://www.jiomart.com/c/groceries/premium-fru... 149.0 True \n",
|
|||
|
"\n",
|
|||
|
" sub_category_Dairy & Bakery \\\n",
|
|||
|
"href \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/premium-fru... False \n",
|
|||
|
"\n",
|
|||
|
" sub_category_Fruits & Vegetables \\\n",
|
|||
|
"href \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... True \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... True \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... True \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... True \n",
|
|||
|
"https://www.jiomart.com/c/groceries/premium-fru... False \n",
|
|||
|
"\n",
|
|||
|
" sub_category_Premium Fruits \\\n",
|
|||
|
"href \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/premium-fru... True \n",
|
|||
|
"\n",
|
|||
|
" sub_category_Snacks & Branded Foods \\\n",
|
|||
|
"href \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/premium-fru... False \n",
|
|||
|
"\n",
|
|||
|
" sub_category_Staples \\\n",
|
|||
|
"href \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... False \n",
|
|||
|
"https://www.jiomart.com/c/groceries/premium-fru... False \n",
|
|||
|
"\n",
|
|||
|
" price_category \n",
|
|||
|
"href \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... 0 \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... 0 \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... 0 \n",
|
|||
|
"https://www.jiomart.com/c/groceries/fruits-vege... 0 \n",
|
|||
|
"https://www.jiomart.com/c/groceries/premium-fru... 0 \n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n",
|
|||
|
" pd.to_datetime(\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n",
|
|||
|
" pd.to_datetime(\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n",
|
|||
|
" pd.to_datetime(\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n",
|
|||
|
" pd.to_datetime(\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
|
|||
|
" series = series.replace(ww.config.get_option(\"nan_values\"), np.nan)\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
|
|||
|
" series = series.replace(ww.config.get_option(\"nan_values\"), np.nan)\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
|
|||
|
" series = series.replace(ww.config.get_option(\"nan_values\"), np.nan)\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
|
|||
|
" series = series.replace(ww.config.get_option(\"nan_values\"), np.nan)\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
|
|||
|
" series = series.replace(ww.config.get_option(\"nan_values\"), np.nan)\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
|
|||
|
" series = series.replace(ww.config.get_option(\"nan_values\"), np.nan)\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
|
|||
|
" series = series.replace(ww.config.get_option(\"nan_values\"), np.nan)\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
|
|||
|
" series = series.replace(ww.config.get_option(\"nan_values\"), np.nan)\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
|
|||
|
" series = series.replace(ww.config.get_option(\"nan_values\"), np.nan)\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
|
|||
|
" series = series.replace(ww.config.get_option(\"nan_values\"), np.nan)\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
|
|||
|
" series = series.replace(ww.config.get_option(\"nan_values\"), np.nan)\n",
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
|
|||
|
" series = series.replace(ww.config.get_option(\"nan_values\"), np.nan)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import featuretools as ft\n",
|
|||
|
"\n",
|
|||
|
"# Предобработка данных (например, кодирование категориальных признаков, удаление дубликатов)\n",
|
|||
|
"# Удаление дубликатов по идентификатору\n",
|
|||
|
"df = df.drop_duplicates(subset='href') # 'href' как идентификатор\n",
|
|||
|
"duplicates = train_data_encoded[train_data_encoded['href'].duplicated(keep=False)]\n",
|
|||
|
"\n",
|
|||
|
"# Удаление дубликатов из столбца \"href\", сохранив первое вхождение\n",
|
|||
|
"df_encoded = df_encoded.drop_duplicates(subset='href', keep='first')\n",
|
|||
|
"\n",
|
|||
|
"print(duplicates)\n",
|
|||
|
"\n",
|
|||
|
"# Создание EntitySet\n",
|
|||
|
"es = ft.EntitySet(id='product_data')\n",
|
|||
|
"\n",
|
|||
|
"# Добавление датафрейма с товарами\n",
|
|||
|
"es = es.add_dataframe(dataframe_name='products', dataframe=df_encoded, index='href')\n",
|
|||
|
"\n",
|
|||
|
"# Генерация признаков с помощью глубокой синтезы признаков\n",
|
|||
|
"feature_matrix, feature_defs = ft.dfs(entityset=es, target_dataframe_name='products', max_depth=2)\n",
|
|||
|
"\n",
|
|||
|
"# Выводим первые 5 строк сгенерированного набора признаков\n",
|
|||
|
"print(feature_matrix.head())\n",
|
|||
|
"\n",
|
|||
|
"# Удаление дубликатов из train_data_encoded\n",
|
|||
|
"train_data_encoded = train_data_encoded.drop_duplicates(subset='href')\n",
|
|||
|
"train_data_encoded = train_data_encoded.drop_duplicates(subset='href', keep='first') # или keep='last'\n",
|
|||
|
"\n",
|
|||
|
"# Определение сущностей (Создание EntitySet)\n",
|
|||
|
"es = ft.EntitySet(id='product_data')\n",
|
|||
|
"\n",
|
|||
|
"es = es.add_dataframe(dataframe_name='products', dataframe=train_data_encoded, index='href')\n",
|
|||
|
"\n",
|
|||
|
"# Генерация признаков для обучающего набора\n",
|
|||
|
"feature_matrix, feature_defs = ft.dfs(entityset=es, target_dataframe_name='products', max_depth=2)\n",
|
|||
|
"\n",
|
|||
|
"# Преобразование признаков для контрольной и тестовой выборок\n",
|
|||
|
"val_feature_matrix = ft.calculate_feature_matrix(features=feature_defs, entityset=es, instance_ids=val_data_encoded.index)\n",
|
|||
|
"test_feature_matrix = ft.calculate_feature_matrix(features=feature_defs, entityset=es, instance_ids=test_data_encoded.index)\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Оценка качества каждого набора признаков \n",
|
|||
|
"\n",
|
|||
|
"*Предсказательная способность Метрики:* RMSE, MAE, R² \n",
|
|||
|
"\n",
|
|||
|
"*Методы:* Обучение модели на обучающей выборке и оценка на контрольной и тестовой выборках. \n",
|
|||
|
"\n",
|
|||
|
"*Скорость вычисления Методы:* Измерение времени выполнения генерации признаков и обучения модели. \n",
|
|||
|
"\n",
|
|||
|
"*Надежность Методы:* Кросс-валидация, анализ чувствительности модели к изменениям в данных. \n",
|
|||
|
"\n",
|
|||
|
"*Корреляция Методы:* Анализ корреляционной матрицы признаков, удаление мультиколлинеарных признаков. \n",
|
|||
|
"\n",
|
|||
|
"*Цельность Методы:* Проверка логической связи между признаками и целевой переменной, интерпретация результатов модели. "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 18,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Время обучения модели: 0.01 секунд\n",
|
|||
|
"Среднеквадратичная ошибка: 0.12\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import time\n",
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"from sklearn.linear_model import LinearRegression\n",
|
|||
|
"from sklearn.metrics import mean_squared_error\n",
|
|||
|
"\n",
|
|||
|
"# Разделение данных на обучающую и валидационную выборки. Удаляем целевую переменную\n",
|
|||
|
"X = feature_matrix.drop('price', axis=1) # feature_matrix - ваш датафрейм с признаками\n",
|
|||
|
"y = feature_matrix['price']\n",
|
|||
|
"\n",
|
|||
|
"# One-hot encoding для категориальных переменных (преобразование категориальных объектов в числовые)\n",
|
|||
|
"X = pd.get_dummies(X, drop_first=True)\n",
|
|||
|
"\n",
|
|||
|
"# Проверяем, есть ли пропущенные значения, и заполняем их медианой или другим подходящим значением\n",
|
|||
|
"X.fillna(X.median(), inplace=True)\n",
|
|||
|
"\n",
|
|||
|
"# Разделение данных на обучающую и валидационную выборки (80% - обучающие, 20% - валидационные)\n",
|
|||
|
"X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"# Обучение модели\n",
|
|||
|
"model = LinearRegression()\n",
|
|||
|
"\n",
|
|||
|
"# Начинаем отсчет времени\n",
|
|||
|
"start_time = time.time()\n",
|
|||
|
"model.fit(X_train, y_train)\n",
|
|||
|
"\n",
|
|||
|
"# Время обучения модели\n",
|
|||
|
"train_time = time.time() - start_time\n",
|
|||
|
"\n",
|
|||
|
"# Предсказания и оценка модели\n",
|
|||
|
"predictions = model.predict(X_val)\n",
|
|||
|
"mse = mean_squared_error(y_val, predictions)\n",
|
|||
|
"\n",
|
|||
|
"print(f'Время обучения модели: {train_time:.2f} секунд')\n",
|
|||
|
"print(f'Среднеквадратичная ошибка: {mse:.2f}')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 30,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/metrics/_regression.py:492: FutureWarning: 'squared' is deprecated in version 1.4 and will be removed in 1.6. To calculate the root mean squared error, use the function'root_mean_squared_error'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"RMSE: 0.36186980038510536\n",
|
|||
|
"R²: -0.6368056983116879\n",
|
|||
|
"MAE: 0.31984719857159616 \n",
|
|||
|
"\n",
|
|||
|
"Кросс-валидация RMSE: 0.5070815501853271 \n",
|
|||
|
"\n",
|
|||
|
"Train RMSE: 0.43774086533447965\n",
|
|||
|
"Train R²: 0.22034961506082062\n",
|
|||
|
"Train MAE: 0.31183543428074156\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/metrics/_regression.py:492: FutureWarning: 'squared' is deprecated in version 1.4 and will be removed in 1.6. To calculate the root mean squared error, use the function'root_mean_squared_error'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1kAAAIjCAYAAADxz9EgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACK7UlEQVR4nOzdZ3gU5fv28XPTIZCEQEKo0qs0QZogKEgHURQFlCqgEnoRpKmIIL0ERSwoYgFBiigoIAIivUoN9UcnYEwCKaTsPC94mD9rAiQhYVK+n+PYQ/ea2d1zl9nNXnvP3GMzDMMQAAAAACBNOFkdAAAAAACyEposAAAAAEhDNFkAAAAAkIZosgAAAAAgDdFkAQAAAEAaoskCAAAAgDREkwUAAAAAaYgmCwAAAADSEE0WAAAAAKQhmiwAAABkGOfPn9eXX35pXj9z5oy++eYb6wIBqUCTBWRSXbt2Va5cuayOAQBAmrLZbOrTp49+/fVXnTlzRsOGDdPmzZutjgWkiIvVAQAk3z///KNvvvlGmzdv1qZNmxQdHa1mzZqpWrVqat++vapVq2Z1RAAAHkihQoXUs2dPNWvWTJJUoEAB/fHHH9aGAlLIZhiGYXUIAPf3/fffq2fPnrpx44aKFSumuLg4Xb58WdWqVdP+/fsVFxenLl26aN68eXJzc7M6LgAAD+TkyZO6du2aHn30UXl6elodB0gRdhcEMoEtW7bolVdeUUBAgLZs2aLTp0+rcePG8vDw0M6dO3Xx4kV16NBBX331lQYOHOhw2ylTpqhu3brKmzevcuTIoerVq2vJkiWJHsNms+mdd94xr8fHx6tFixby9fXV4cOHzXXudWnYsKEk6Y8//pDNZkv0y2PLli0TPU7Dhg3N29125swZ2Ww2h33yJeno0aN64YUX5OvrKw8PD9WoUUMrV65M9FzCwsI0cOBAFStWTO7u7ipcuLA6d+6sa9eu3TXfxYsXVaxYMdWoUUM3btyQJMXGxmrMmDGqXr26vL295enpqfr162vDhg2JHjMkJEQ9evRQ0aJF5ezsbL4mydmls1ixYmrVqlWiemBgoGw2W6L6hQsX1L17d+XPn1/u7u6qWLGivvjiC4d1bj/HpP6tc+XKpa5du5rXQ0NDNWTIEFWqVEm5cuWSl5eXmjdvrv379983u3Tv7aJYsWIO60ZGRmrw4MEqUqSI3N3dVbZsWU2ZMkXJ/b1v+/btatGihfLkySNPT09VrlxZM2fONJff3o321KlTatq0qTw9PVWwYEG99957iR4jJe+N2xdnZ2cVKlRIvXr1UlhYmLlOSl5v6dY2OmDAAPN1KFWqlD788EPZ7XZzndvvgylTpiS6z0cffdThfZOS99yXX34pm82mM2fOmLVff/1VdevWVc6cOeXt7a1WrVrp4MGDiR43KTExMXrnnXdUpkwZeXh4qECBAnr++ed18uTJe96uWLFi99x27mSz2RQYGKhvvvlGZcuWlYeHh6pXr65NmzYlut+9e/eqefPm8vLyUq5cudSoUSNt27bNYZ3br0FSl/Pnz0u6+y7ZS5YsSfK1/uGHH1S9enXlyJFD+fLl0yuvvKILFy44rPPOO++oQoUK5vusdu3aWr58ucM6SX0m7ty5M9Wvy4YNG2Sz2bRs2bJEz+Xbb7+VzWbT1q1bzVpyPmdvv35ubm66evWqw7KtW7eaWXft2pXi16hr167m50bJkiVVq1YthYaGKkeOHIm2WyAjY3dBIBOYOHGi7Ha7vv/+e1WvXj3R8nz58mnBggU6fPiwPvnkE40dO1b+/v6SpJkzZ6pNmzbq1KmTYmNj9f333+vFF1/UqlWr1LJly7s+5muvvaY//vhDa9euVYUKFSRJX3/9tbl88+bNmjdvnqZPn658+fJJkvLnz3/X+9u0aZN++eWXVD1/STp06JCeeOIJFSpUSMOHD5enp6cWL16stm3baunSpXruueckSTdu3FD9+vV15MgRde/eXY899piuXbumlStX6vz582bWO4WHh6t58+ZydXXVL7/8Yn6xioiI0GeffaYOHTqoZ8+eun79uj7//HM1bdpUO3bsUNWqVc376NKli9atW6e+ffuqSpUqcnZ21rx587Rnz55UP+ekXLlyRbVr1za/XPn5+Wn16tXq0aOHIiIiNGDAgBTf56lTp7R8+XK9+OKLKl68uK5cuaJPPvlEDRo00OHDh1WwYMH73sczzzyjzp07O9SmTp2qf//917xuGIbatGmjDRs2qEePHqpatap+/fVXDR06VBcuXND06dPv+Rhr165Vq1atVKBAAfXv318BAQE6cuSIVq1apf79+5vrJSQkqFmzZqpdu7YmTZqkNWvWaOzYsYqPj9d7771nrpeS98Zzzz2n559/XvHx8dq6davmzZun6Ohoh/dEckVFRalBgwa6cOGCevfuraJFi+qvv/7SiBEjdOnSJc2YMSPF95mU5L7nNm/erBYtWuiRRx7R2LFjFRcXp48++khPPPGEdu7cqTJlytz1tgkJCWrVqpXWr1+vl19+Wf3799f169e1du1aHTx4UCVLlrznY1etWlWDBw92qC1YsEBr165NtO7GjRu1aNEi9evXT+7u7vroo4/UrFkz7dixQ48++qikW58T9evXl5eXl4YNGyZXV1d98sknatiwoTZu3KhatWo53Od7772n4sWLO9R8fX3vmTkpX375pbp166bHH39cEyZM0JUrVzRz5kxt2bJFe/fulY+Pj6RbPzI899xzKlasmKKjo/Xll1+qXbt22rp1q2rWrHnX+3/rrbfuuux+r0vDhg1VpEgRffPNN+bn5G3ffPONSpYsqTp16khK/ufsbc7Ozlq4cKHDj3vz58+Xh4eHYmJiUvUaJWXMmDGJ7g/I8AwAGZ6vr6/xyCOPONS6dOlieHp6OtRGjx5tSDJ++uknsxYVFeWwTmxsrPHoo48aTz/9tENdkjF27FjDMAxjxIgRhrOzs7F8+fK7Zpo/f74hyTh9+nSiZRs2bDAkGRs2bDBrtWrVMpo3b+7wOIZhGE899ZTx5JNPOtz+9OnThiRj/vz5Zq1Ro0ZGpUqVjJiYGLNmt9uNunXrGqVLlzZrY8aMMSQZP/74Y6Jcdrs9Ub6YmBijYcOGhr+/v3HixAmH9ePj442bN2861P79918jf/78Rvfu3c1adHS04eTkZPTu3dth3aT+jZLyyCOPGC1btkxU79Onj/Hfj+kePXoYBQoUMK5du+ZQf/nllw1vb2/z3/v2c/zhhx8S3a+np6fRpUsX83pMTIyRkJDgsM7p06cNd3d347333rtvfklGnz59EtVbtmzpsN0uX77ckGS8//77Duu98MILhs1mS/T63yk+Pt4oXry48cgjjxj//vuvw7Lb/66Gces1l2T07dvXYXnLli0NNzc34+rVq2Y9Ne+N2+rWrWtUqFDBvJ6S13vcuHGGp6enERwc7LDe8OHDDWdnZ+Ps2bOGYfzf+2Dy5MmJ7rNixYpGgwYNEj1+ct5z/33vVq9e3fD29jYuX75srhMcHGy4uroa7dq1S/TYd/riiy8MSca0adMSLbvz3yUpKdnuJRmSjF27dpm1//3vf4aHh4fx3HPPmbW2bdsabm5uxsmTJ83axYsXjdy5czt8ztx+DXbu3HnXfHd7//7www8Or3VsbKzh7+9vPProo0Z0dLS53qpVqwxJxpgxY+76GCEhIYYkY8qUKWatQYMGDv+2v/zyiyHJaNasWapflxEjRhju7u5GWFiYw2O7uLg4bBvJ/Zy9/fp16NDBqFSpklmPjIw0vLy8jI4dOzq8vil5jbp06eLwuXHw4EHDycnJ3JaT+psDZETsLghkAtevXzdHpu7l9khSRESEWcuRI4f5///++6/Cw8NVv379u46wBAUFacKECZo1a5aeffbZB0x+y48//qidO3dq4sSJiZb5+/ubu+fcTWhoqH7//Xe1b99e169f17Vr13Tt2jX9888/atq0qY4fP27ucrJ06VJVqVIl0S+ukhLtamO329W5c2dt27ZNv/zyS6Jf3Z2dnc3j2+x2u0JDQxUfH68aNWo4vH6RkZGy2+3Kmzdv8l6
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.ensemble import RandomForestRegressor\n",
|
|||
|
"from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error\n",
|
|||
|
"from sklearn.model_selection import train_test_split, cross_val_score\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"\n",
|
|||
|
"# Удаление строк с NaN\n",
|
|||
|
"feature_matrix = feature_matrix.dropna()\n",
|
|||
|
"val_feature_matrix = val_feature_matrix.dropna()\n",
|
|||
|
"test_feature_matrix = test_feature_matrix.dropna()\n",
|
|||
|
"\n",
|
|||
|
"# Разделение данных на обучающую и тестовую выборки\n",
|
|||
|
"X_train = feature_matrix.drop('price', axis=1)\n",
|
|||
|
"y_train = feature_matrix['price']\n",
|
|||
|
"X_val = val_feature_matrix.drop('price', axis=1)\n",
|
|||
|
"y_val = val_feature_matrix['price']\n",
|
|||
|
"X_test = test_feature_matrix.drop('price', axis=1)\n",
|
|||
|
"y_test = test_feature_matrix['price']\n",
|
|||
|
"\n",
|
|||
|
"# Приводим тестовую выборку к тем же столбцам, что и обучающая (если есть новые признаки)\n",
|
|||
|
"X_test = X_test.reindex(columns=X_train.columns, fill_value=0)\n",
|
|||
|
"\n",
|
|||
|
"# Кодирование категориальных переменных с использованием one-hot encoding\n",
|
|||
|
"X_train = pd.get_dummies(X_train, drop_first=True)\n",
|
|||
|
"X_val = pd.get_dummies(X_val, drop_first=True)\n",
|
|||
|
"X_test = pd.get_dummies(X_test, drop_first=True)\n",
|
|||
|
"\n",
|
|||
|
"# Разбиваем данные на тренировочные и тестовые\n",
|
|||
|
"X_train, X_test, y_train, y_test = train_test_split(X_train, y_train, test_size=0.2, random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"# Выбор модели\n",
|
|||
|
"model = RandomForestRegressor(random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"# Обучение модели\n",
|
|||
|
"model.fit(X_train, y_train)\n",
|
|||
|
"\n",
|
|||
|
"# Предсказания и оценка\n",
|
|||
|
"y_pred = model.predict(X_test)\n",
|
|||
|
"\n",
|
|||
|
"rmse = mean_squared_error(y_test, y_pred, squared=False)\n",
|
|||
|
"r2 = r2_score(y_test, y_pred)\n",
|
|||
|
"mae = mean_absolute_error(y_test, y_pred)\n",
|
|||
|
"\n",
|
|||
|
"print(f\"RMSE: {rmse}\")\n",
|
|||
|
"print(f\"R²: {r2}\")\n",
|
|||
|
"print(f\"MAE: {mae} \\n\")\n",
|
|||
|
"\n",
|
|||
|
"# Кросс-валидация\n",
|
|||
|
"scores = cross_val_score(model, X_train, y_train, cv=5, scoring='neg_mean_squared_error')\n",
|
|||
|
"rmse_cv = (-scores.mean())**0.5\n",
|
|||
|
"print(f\"Кросс-валидация RMSE: {rmse_cv} \\n\")\n",
|
|||
|
"\n",
|
|||
|
"# Анализ важности признаков\n",
|
|||
|
"feature_importances = model.feature_importances_\n",
|
|||
|
"feature_names = X_train.columns\n",
|
|||
|
"\n",
|
|||
|
"# Проверка на переобучение\n",
|
|||
|
"y_train_pred = model.predict(X_train)\n",
|
|||
|
"\n",
|
|||
|
"rmse_train = mean_squared_error(y_train, y_train_pred, squared=False)\n",
|
|||
|
"r2_train = r2_score(y_train, y_train_pred)\n",
|
|||
|
"mae_train = mean_absolute_error(y_train, y_train_pred)\n",
|
|||
|
"\n",
|
|||
|
"print(f\"Train RMSE: {rmse_train}\")\n",
|
|||
|
"print(f\"Train R²: {r2_train}\")\n",
|
|||
|
"print(f\"Train MAE: {mae_train}\")\n",
|
|||
|
"print()\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация результатов\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"plt.scatter(y_test, y_pred, alpha=0.5)\n",
|
|||
|
"plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2)\n",
|
|||
|
"plt.xlabel('Фактическая цена')\n",
|
|||
|
"plt.ylabel('Прогнозируемая цена')\n",
|
|||
|
"plt.title('Фактическая цена по сравнению с прогнозируемой')\n",
|
|||
|
"plt.show()\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Выводы и итог \n",
|
|||
|
"\n",
|
|||
|
"**Модель случайного леса (RandomForestRegressor)** продемонстрировано хорошие результаты при прогнозировании цен на товары. Метрики качества и кросс-валидация свидетельствуют о том, что модель не подвержена сильному переобучению и может быть использована для практических целей.\n",
|
|||
|
"\n",
|
|||
|
"*Точность предсказаний:* Модель демонстрирует довольно высокий R² (0.2203), что указывает на большую часть вариации целевого признака (цены недвижимости). Однако, значения RMSE и MAE остаются высоки (0.4377 и 0.3118), что свидетельствует о том, что модель не всегда точно предсказывает значения, особенно для объектов с высокими или низкими ценами. \n",
|
|||
|
"\n",
|
|||
|
"*Переобучение:* Разница между RMSE на обучающей и тестовой выборках незначительна, что указывает на то, что модель не склонна к переобучению. Однако в будущем стоит следить за этой метрикой при добавлении новых признаков или усложнении модели, чтобы избежать излишней подгонки под тренировочные данные. Также стоит быть осторожным и продолжать мониторинг этого показателя. \n",
|
|||
|
"\n",
|
|||
|
"*Кросс-валидация:* При кросс-валидации наблюдается небольшое увеличение ошибки RMSE по сравнению с тестовой выборкой (рост на 2-3%). Это может указывать на небольшую нестабильность модели при использовании разных подвыборок данных. Для повышения устойчивости модели возможно стоит провести дальнейшую настройку гиперпараметров. \n",
|
|||
|
"\n",
|
|||
|
"*Рекомендации:* Следует уделить внимание дополнительной обработке категориальных признаков, улучшению метода feature engineering, а также возможной оптимизации модели (например, через подбор гиперпараметров) для повышения точности предсказаний на экстремальных значениях."
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.9.7"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|