From e887393dd0a806cee220b3b8b2e334261779b0ee Mon Sep 17 00:00:00 2001 From: kaznacheeva Date: Sat, 9 Nov 2024 11:59:00 +0400 Subject: [PATCH 1/3] =?UTF-8?q?4=20=D0=BB=D0=B0=D0=B1=D0=B0=20=D0=BD=D0=B0?= =?UTF-8?q?=D1=87=D0=B0=D0=BB=D0=BE?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- lab_3/Lab3.ipynb | 1496 ++++++++++++++++++++++++---------------------- lab_4/Lab4.ipynb | 66 ++ 2 files changed, 839 insertions(+), 723 deletions(-) create mode 100644 lab_4/Lab4.ipynb diff --git a/lab_3/Lab3.ipynb b/lab_3/Lab3.ipynb index e37d516..6ad94cc 100644 --- a/lab_3/Lab3.ipynb +++ b/lab_3/Lab3.ipynb @@ -4,77 +4,159 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Приступаем к работе...\n", + "*Вариант задания:* Заработная плата рабочих мест в области Data Science (вариант - 8) \n", "\n", - "*Вариант задания:* Продажи домов в округе Кинг (вариант - 6) \n", - "Определим бизнес-цели и цели технического проекта \n", + "Бизнес-цели для датасета о заработной плате в Data Science:\n", + "Оптимизация стратегии найма и оплаты труда в Data Science\n", "\n", - "### Бизнес-цели: \n", - "1. Оптимизация процесса оценки стоимости дома \n", + "Формулировка: Разработать модель, которая позволяет точно прогнозировать и оптимизировать заработную плату для специалистов в области Data Science на основе их опыта, типа занятости, местоположения и других факторов.\n", "\n", - "**Формулировка:** Разработать модель, которая позволяет автоматически и точно оценивать стоимость дома на основании его характеристик (таких как площадь, количество комнат, состояние, местоположение). \n", - "**Цель:** Увеличить точность оценки стоимости недвижимости для агенств и потенциальных покупателей, а также сократить время и затраты на оценку недвижимости, обеспечивая более точное предсказание цены. \n", + "Цель: Увеличить привлекательность компании для талантливых специалистов в Data Science, обеспечивая конкурентоспособные зарплаты, а также оптимизировать расходы на персонал, избегая переплат и недоплат.\n", "\n", - "**Ключевые показатели успеха (KPI):** \n", - "*Точность модели прогнозирования* (RMSE): Минимизация среднеквадратичной ошибки до уровня ниже 10% от реальной цены, чтобы учитывать большие отклонения оценке.\n", - "*Средная абсолютная ошибка* (MAE): Модель должна предсказать цену с минимальной ошибкой и снизить MAE до 5% или меньше учитывая большие отклонения в оценке. \n", - "*Скорость оценки:* Уменьшение времени на оценку стоимости дома, чтобы быстрее получать результат.\n", - "*Доступность:* Внедрение модели в реальную систему для использования агентами недвижимости.\n", + "Ключевые показатели успеха (KPI):\n", "\n", - "2. Оптимизация затрат на ремонт перед продажей \n", + "Точность модели прогнозирования зарплаты (RMSE): Минимизация среднеквадратичной ошибки до уровня ниже 10% от реальной зарплаты, чтобы учитывать большие отклонения в оценке.\n", "\n", - "**Формулировка:** Разработать модель, которая поможет продавцам домов и агентствам недвижимости определить, какие улучшения или реновации дадут наибольший прирост стоимости дома при минимальных затратах. Это поможет избежать ненужных расходов и максимизировать прибыль от продажи. \n", - "**Цель:** Снизить затраты на ремонт перед продажей, рекомендовать только те улучшения, которые максимально увеличат стоимость недвижимости, и сократить время на принятие решений по реновациям. \n", + "Средняя абсолютная ошибка (MAE): Модель должна предсказать зарплату с минимальной ошибкой и снизить MAE до 5% или меньше, учитывая большие отклонения в оценке.\n", "\n", - "**Ключевые показатели успеха (KPI):** \n", - "*Возврат инвестиций* (ROI): Продавцы должны получать не менее 20% прироста стоимости дома на каждый вложенный доллар в реновацию. Например, если на ремонт было потрачено $10,000, цена дома должна увеличиться как минимум на $12,000. \n", - "*Средняя стоимость ремонта на 1 сделку* (CPA): Задача снизить расходы на ремонт, минимизировав ненужные траты. Например, оптимизация затрат до $5,000 на дом с учетом максимального прироста в цене. \n", - "*Сокращение времени на принятие решений:* Модель должна сокращать время, необходимое на оценку вариантов реноваций, до нескольких минут, что ускорит подготовку дома к продажи.\n", + "Скорость оценки зарплаты: Уменьшение времени на оценку зарплаты для новых сотрудников, чтобы быстрее принимать решения о найме.\n", "\n", - "### Технические цели проекта для каждой выделенной бизнес-цели\n", + "Доступность: Внедрение модели в систему управления персоналом для использования HR-специалистами.\n", "\n", - "1. **Создание модели для точной оценки стоимости дома.** \n", - "*Сбор и подготовка данных:* Очистка данных от пропусков, выбросов, дубликатов (аномальных значений в столбцах price, sqft_living, bedrooms). Преобразование категориальных переменных (view, condition, waterfront) в числовую форму с применением One-Hot-Encoding. Нормализация и стандартизация с применением методов масштабирования данных (нормировка, стандартизация для числовых признаков, чтобы привести их к 1ому масштабу). Разбиение набора данных на обучающую, контрольную и тестовую выборки для предотвращения утечек данных и переобучения. \n", - "*Разработка и обучение модели:* Исследование моделей машинного обучения, проводя эксперименты с различными алгоритмами (линейная регрессия, случайный лес, градиентный бустинг, деревья решений) для предсказания стоимости недвижимости. Обучение модели на обучающей выборке с использованием метрик оценки качества, таких как RMSE (Root Mean Square Error) и MAE (Mean Absolute Error). Оценка качества моделей на тестовой выборке, минимизируя MAE и RMSE для получения точных прогнозов стоимости. \n", - "*Развёртывание модели:* Интеграция модели в существующую систему или разработка API для доступа к модели с недвижимостью и частными продавцами. Создание веб-приложения или мобильного интерфейса для удобного использования модели и получения прогнозов в режиме реального времени.\n", + "Оптимизация распределения ресурсов в компании\n", "\n", - "2. **Разработка модели для рекомендаций по реновациям.** \n", - "*Сбор и подготовка данных:* Сбор данных о типах и стоимости реноваций, а также их влияние на конечную стоимость дома. Очистка и устранение неточных или неполных данных о ремонтах. Преобразование категориальных признаков (реновации, например, обновление крыши, замена окон) в числовой формат для представления этих данных с применением One-Hot-Encoding. Разбиение данных на обучающую и тестовую выборки для обучения модели. \n", - "*Разработка и обучение модели:* Использование модели регрессий (линейная регрессия, случайный лес) для предсказания и моделирования влияния конкретных реноваций на увеличение стоимости недвижимости. Оценка метрики (CPA - Cost Per Acquisition) оценка затрат на реновацию одной продажи и (ROI - Return on Investment) расчёт возврата на инвестиции от реновации дома, прирост стоимости после реновации. Обучение модели с целью прогнозирования изменений, которые могут принести наибольшую пользу для стоимости домов и реноваций. \n", - "*Развёртывание модели:* Создание интерфейса, где пользователи смогут вводить информацию о текущем состоянии дома и получать рекомендации по реновациям с расчётом ROI. Создать рекомендационную систему для продавцов недвижимости, которая будет предлагать набор реноваций.\n" + "Формулировка: Разработать модель, которая поможет компаниям определить оптимальное распределение ресурсов (бюджета) на Data Science проекты и команды, учитывая уровень зарплат, опыт и другие факторы.\n", + "\n", + "Цель: Снизить затраты на Data Science проекты, оптимизировать распределение бюджета, обеспечивая максимальную эффективность и результативность проектов.\n", + "\n", + "Ключевые показатели успеха (KPI):\n", + "\n", + "Возврат инвестиций (ROI): Проекты должны показывать не менее 20% прироста в результатах (например, увеличение прибыли, улучшение показателей) на каждый вложенный доллар в Data Science.\n", + "\n", + "Средняя стоимость проекта на 1 сотрудника (CPA): Задача снизить расходы на проекты, минимизировав ненужные траты. Например, оптимизация затрат до $50,000 на проект с учетом максимального прироста в результатах.\n", + "\n", + "Сокращение времени на принятие решений: Модель должна сокращать время, необходимое на оценку вариантов распределения ресурсов, до нескольких минут, что ускорит принятие решений.\n", + "\n", + "Оптимизация стратегии развития карьеры в Data Science\n", + "\n", + "Формулировка: Разработать модель, которая поможет специалистам в Data Science определить оптимальные пути развития карьеры, учитывая текущий уровень зарплаты, опыт и перспективы роста.\n", + "\n", + "Цель: Повысить удовлетворенность и мотивацию специалистов в Data Science, обеспечивая им четкие пути развития карьеры и возможность получения конкурентоспособных зарплат.\n", + "\n", + "Ключевые показатели успеха (KPI):\n", + "\n", + "Уровень удовлетворенности сотрудников: Увеличение уровня удовлетворенности сотрудников на 15% за счет предоставления четких путей развития карьеры и возможностей для роста.\n", + "\n", + "Средний срок пребывания в компании: Увеличение среднего срока пребывания сотрудников в компании на 20% за счет предоставления привлекательных перспектив развития.\n", + "\n", + "Доступность: Внедрение модели в систему управления карьерой для использования сотрудниками и HR-специалистами.\n", + "\n", + "**Технические цели проекта для каждой выделенной бизнес-цели**\n", + "\n", + "Оптимизация стратегии найма и оплаты труда в Data Science\n", + "\n", + "Сбор и подготовка данных:\n", + "\n", + "Сбор данных: Получение данных о заработных платах специалистов в Data Science из различных источников (например, Glassdoor, LinkedIn, Kaggle).\n", + "\n", + "Очистка данных: Удаление пропусков, выбросов и дубликатов. Преобразование категориальных переменных (например, experience_level, employment_type, employee_residence, company_location) в числовую форму с использованием One-Hot Encoding.\n", + "\n", + "Нормализация и стандартизация: Применение методов масштабирования данных (нормировка, стандартизация) для числовых признаков (например, salary_in_usd, remote_ratio).\n", + "\n", + "Разбиение данных: Разделение набора данных на обучающую, контрольную и тестовую выборки для предотвращения утечек данных и переобучения.\n", + "\n", + "Разработка и обучение модели:\n", + "\n", + "Исследование моделей: Эксперименты с различными алгоритмами (линейная регрессия, случайный лес, градиентный бустинг, деревья решений) для предсказания заработной платы.\n", + "\n", + "Обучение модели: Обучение модели на обучающей выборке с использованием метрик оценки качества, таких как RMSE (Root Mean Square Error) и MAE (Mean Absolute Error).\n", + "\n", + "Оценка качества: Оценка качества моделей на тестовой выборке, минимизируя MAE и RMSE для получения точных прогнозов заработной платы.\n", + "\n", + "Развёртывание модели:\n", + "\n", + "Интеграция модели: Интеграция модели в существующую систему управления персоналом или разработка API для доступа к модели.\n", + "\n", + "Создание интерфейса: Создание веб-приложения или мобильного интерфейса для удобного использования модели и получения прогнозов в режиме реального времени.\n", + "\n", + "Оптимизация распределения ресурсов в компании\n", + "\n", + "Сбор и подготовка данных:\n", + "\n", + "Сбор данных: Получение данных о затратах на Data Science проекты, результатах проектов, уровнях зарплат сотрудников и других релевантных факторов.\n", + "\n", + "Очистка данных: Удаление пропусков, выбросов и дубликатов. Преобразование категориальных переменных в числовую форму с использованием One-Hot Encoding.\n", + "\n", + "Нормализация и стандартизация: Применение методов масштабирования данных для числовых признаков.\n", + "\n", + "Разбиение данных: Разделение набора данных на обучающую, контрольную и тестовую выборки.\n", + "\n", + "Разработка и обучение модели:\n", + "\n", + "Исследование моделей: Эксперименты с различными алгоритмами (линейная регрессия, случайный лес, градиентный бустинг) для предсказания оптимального распределения ресурсов.\n", + "\n", + "Обучение модели: Обучение модели на обучающей выборке с использованием метрик оценки качества, таких как ROI (Return on Investment) и CPA (Cost Per Acquisition).\n", + "\n", + "Оценка качества: Оценка качества моделей на тестовой выборке, минимизируя CPA и максимизируя ROI.\n", + "\n", + "Развёртывание модели:\n", + "\n", + "Интеграция модели: Интеграция модели в систему управления проектами или разработка API для доступа к модели.\n", + "\n", + "Создание интерфейса: Создание веб-приложения или мобильного интерфейса для удобного использования модели и получения рекомендаций по распределению ресурсов.\n", + "\n", + "Оптимизация стратегии развития карьеры в Data Science\n", + "\n", + "Сбор и подготовка данных:\n", + "\n", + "Сбор данных: Получение данных о карьерных траекториях специалистов в Data Science, уровнях зарплат, опыте и других релевантных факторах.\n", + "\n", + "Очистка данных: Удаление пропусков, выбросов и дубликатов. Преобразование категориальных переменных в числовую форму с использованием One-Hot Encoding.\n", + "\n", + "Нормализация и стандартизация: Применение методов масштабирования данных для числовых признаков.\n", + "\n", + "Разбиение данных: Разделение набора данных на обучающую, контрольную и тестовую выборки.\n", + "\n", + "Разработка и обучение модели:\n", + "\n", + "Исследование моделей: Эксперименты с различными алгоритмами (линейная регрессия, случайный лес, градиентный бустинг) для предсказания оптимальных путей развития карьеры.\n", + "\n", + "Обучение модели: Обучение модели на обучающей выборке с использованием метрик оценки качества, таких как MAE (Mean Absolute Error) и RMSE (Root Mean Square Error).\n", + "\n", + "Оценка качества: Оценка качества моделей на тестовой выборке, минимизируя MAE и RMSE.\n", + "\n", + "Развёртывание модели:\n", + "\n", + "Интеграция модели: Интеграция модели в систему управления карьерой или разработка API для доступа к модели.\n", + "\n", + "Создание интерфейса: Создание веб-приложения или мобильного интерфейса для удобного использования модели и получения рекомендаций по развитию карьеры." ] }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Index(['id', 'date', 'price', 'bedrooms', 'bathrooms', 'sqft_living',\n", - " 'sqft_lot', 'floors', 'waterfront', 'view', 'condition', 'grade',\n", - " 'sqft_above', 'sqft_basement', 'yr_built', 'yr_renovated', 'zipcode',\n", - " 'lat', 'long', 'sqft_living15', 'sqft_lot15'],\n", + "Index(['work_year', 'experience_level', 'employment_type', 'job_title',\n", + " 'salary', 'salary_currency', 'salary_in_usd', 'employee_residence',\n", + " 'remote_ratio', 'company_location', 'company_size'],\n", " dtype='object')\n" ] } ], "source": [ "import pandas as pd\n", - "import matplotlib.pyplot as plt\n", - "import matplotlib.ticker as ticker\n", - "import seaborn as sns\n", - "\n", - "# Подключим датафрейм и выгрузим данные\n", - "df = pd.read_csv(\".//static//csv//kc_house_data.csv\")\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", "print(df.columns)" ] }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 3, "metadata": {}, "outputs": [ { @@ -98,188 +180,118 @@ " \n", " \n", " \n", - " id\n", - " date\n", - " price\n", - " bedrooms\n", - " bathrooms\n", - " sqft_living\n", - " sqft_lot\n", - " floors\n", - " waterfront\n", - " view\n", - " ...\n", - " grade\n", - " sqft_above\n", - " sqft_basement\n", - " yr_built\n", - " yr_renovated\n", - " zipcode\n", - " lat\n", - " long\n", - " sqft_living15\n", - " sqft_lot15\n", + " work_year\n", + " experience_level\n", + " employment_type\n", + " job_title\n", + " salary\n", + " salary_currency\n", + " salary_in_usd\n", + " employee_residence\n", + " remote_ratio\n", + " company_location\n", + " company_size\n", " \n", " \n", " \n", " \n", " 0\n", - " 7129300520\n", - " 20141013T000000\n", - " 221900.0\n", - " 3\n", - " 1.00\n", - " 1180\n", - " 5650\n", - " 1.0\n", - " 0\n", - " 0\n", - " ...\n", - " 7\n", - " 1180\n", - " 0\n", - " 1955\n", - " 0\n", - " 98178\n", - " 47.5112\n", - " -122.257\n", - " 1340\n", - " 5650\n", + " 2023\n", + " SE\n", + " FT\n", + " Principal Data Scientist\n", + " 80000\n", + " EUR\n", + " 85847\n", + " ES\n", + " 100\n", + " ES\n", + " L\n", " \n", " \n", " 1\n", - " 6414100192\n", - " 20141209T000000\n", - " 538000.0\n", - " 3\n", - " 2.25\n", - " 2570\n", - " 7242\n", - " 2.0\n", - " 0\n", - " 0\n", - " ...\n", - " 7\n", - " 2170\n", - " 400\n", - " 1951\n", - " 1991\n", - " 98125\n", - " 47.7210\n", - " -122.319\n", - " 1690\n", - " 7639\n", + " 2023\n", + " MI\n", + " CT\n", + " ML Engineer\n", + " 30000\n", + " USD\n", + " 30000\n", + " US\n", + " 100\n", + " US\n", + " S\n", " \n", " \n", " 2\n", - " 5631500400\n", - " 20150225T000000\n", - " 180000.0\n", - " 2\n", - " 1.00\n", - " 770\n", - " 10000\n", - " 1.0\n", - " 0\n", - " 0\n", - " ...\n", - " 6\n", - " 770\n", - " 0\n", - " 1933\n", - " 0\n", - " 98028\n", - " 47.7379\n", - " -122.233\n", - " 2720\n", - " 8062\n", + " 2023\n", + " MI\n", + " CT\n", + " ML Engineer\n", + " 25500\n", + " USD\n", + " 25500\n", + " US\n", + " 100\n", + " US\n", + " S\n", " \n", " \n", " 3\n", - " 2487200875\n", - " 20141209T000000\n", - " 604000.0\n", - " 4\n", - " 3.00\n", - " 1960\n", - " 5000\n", - " 1.0\n", - " 0\n", - " 0\n", - " ...\n", - " 7\n", - " 1050\n", - " 910\n", - " 1965\n", - " 0\n", - " 98136\n", - " 47.5208\n", - " -122.393\n", - " 1360\n", - " 5000\n", + " 2023\n", + " SE\n", + " FT\n", + " Data Scientist\n", + " 175000\n", + " USD\n", + " 175000\n", + " CA\n", + " 100\n", + " CA\n", + " M\n", " \n", " \n", " 4\n", - " 1954400510\n", - " 20150218T000000\n", - " 510000.0\n", - " 3\n", - " 2.00\n", - " 1680\n", - " 8080\n", - " 1.0\n", - " 0\n", - " 0\n", - " ...\n", - " 8\n", - " 1680\n", - " 0\n", - " 1987\n", - " 0\n", - " 98074\n", - " 47.6168\n", - " -122.045\n", - " 1800\n", - " 7503\n", + " 2023\n", + " SE\n", + " FT\n", + " Data Scientist\n", + " 120000\n", + " USD\n", + " 120000\n", + " CA\n", + " 100\n", + " CA\n", + " M\n", " \n", " \n", "\n", - "

5 rows × 21 columns

\n", "" ], "text/plain": [ - " id date price bedrooms bathrooms sqft_living \\\n", - "0 7129300520 20141013T000000 221900.0 3 1.00 1180 \n", - "1 6414100192 20141209T000000 538000.0 3 2.25 2570 \n", - "2 5631500400 20150225T000000 180000.0 2 1.00 770 \n", - "3 2487200875 20141209T000000 604000.0 4 3.00 1960 \n", - "4 1954400510 20150218T000000 510000.0 3 2.00 1680 \n", + " work_year experience_level employment_type job_title \\\n", + "0 2023 SE FT Principal Data Scientist \n", + "1 2023 MI CT ML Engineer \n", + "2 2023 MI CT ML Engineer \n", + "3 2023 SE FT Data Scientist \n", + "4 2023 SE FT Data Scientist \n", "\n", - " sqft_lot floors waterfront view ... grade sqft_above sqft_basement \\\n", - "0 5650 1.0 0 0 ... 7 1180 0 \n", - "1 7242 2.0 0 0 ... 7 2170 400 \n", - "2 10000 1.0 0 0 ... 6 770 0 \n", - "3 5000 1.0 0 0 ... 7 1050 910 \n", - "4 8080 1.0 0 0 ... 8 1680 0 \n", + " salary salary_currency salary_in_usd employee_residence remote_ratio \\\n", + "0 80000 EUR 85847 ES 100 \n", + "1 30000 USD 30000 US 100 \n", + "2 25500 USD 25500 US 100 \n", + "3 175000 USD 175000 CA 100 \n", + "4 120000 USD 120000 CA 100 \n", "\n", - " yr_built yr_renovated zipcode lat long sqft_living15 \\\n", - "0 1955 0 98178 47.5112 -122.257 1340 \n", - "1 1951 1991 98125 47.7210 -122.319 1690 \n", - "2 1933 0 98028 47.7379 -122.233 2720 \n", - "3 1965 0 98136 47.5208 -122.393 1360 \n", - "4 1987 0 98074 47.6168 -122.045 1800 \n", - "\n", - " sqft_lot15 \n", - "0 5650 \n", - "1 7639 \n", - "2 8062 \n", - "3 5000 \n", - "4 7503 \n", - "\n", - "[5 rows x 21 columns]" + " company_location company_size \n", + "0 ES L \n", + "1 US S \n", + "2 US S \n", + "3 CA M \n", + "4 CA M " ] }, - "execution_count": 24, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } @@ -291,7 +303,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -315,260 +327,86 @@ " \n", " \n", " \n", - " id\n", - " price\n", - " bedrooms\n", - " bathrooms\n", - " sqft_living\n", - " sqft_lot\n", - " floors\n", - " waterfront\n", - " view\n", - " condition\n", - " grade\n", - " sqft_above\n", - " sqft_basement\n", - " yr_built\n", - " yr_renovated\n", - " zipcode\n", - " lat\n", - " long\n", - " sqft_living15\n", - " sqft_lot15\n", + " work_year\n", + " salary\n", + " salary_in_usd\n", + " remote_ratio\n", " \n", " \n", " \n", " \n", " count\n", - " 2.161300e+04\n", - " 2.161300e+04\n", - " 21613.000000\n", - " 21613.000000\n", - " 21613.000000\n", - " 2.161300e+04\n", - " 21613.000000\n", - " 21613.000000\n", - " 21613.000000\n", - " 21613.000000\n", - " 21613.000000\n", - " 21613.000000\n", - " 21613.000000\n", - " 21613.000000\n", - " 21613.000000\n", - " 21613.000000\n", - " 21613.000000\n", - " 21613.000000\n", - " 21613.000000\n", - " 21613.000000\n", + " 3755.000000\n", + " 3.755000e+03\n", + " 3755.000000\n", + " 3755.000000\n", " \n", " \n", " mean\n", - " 4.580302e+09\n", - " 5.400881e+05\n", - " 3.370842\n", - " 2.114757\n", - " 2079.899736\n", - " 1.510697e+04\n", - " 1.494309\n", - " 0.007542\n", - " 0.234303\n", - " 3.409430\n", - " 7.656873\n", - " 1788.390691\n", - " 291.509045\n", - " 1971.005136\n", - " 84.402258\n", - " 98077.939805\n", - " 47.560053\n", - " -122.213896\n", - " 1986.552492\n", - " 12768.455652\n", + " 2022.373635\n", + " 1.906956e+05\n", + " 137570.389880\n", + " 46.271638\n", " \n", " \n", " std\n", - " 2.876566e+09\n", - " 3.671272e+05\n", - " 0.930062\n", - " 0.770163\n", - " 918.440897\n", - " 4.142051e+04\n", - " 0.539989\n", - " 0.086517\n", - " 0.766318\n", - " 0.650743\n", - " 1.175459\n", - " 828.090978\n", - " 442.575043\n", - " 29.373411\n", - " 401.679240\n", - " 53.505026\n", - " 0.138564\n", - " 0.140828\n", - " 685.391304\n", - " 27304.179631\n", + " 0.691448\n", + " 6.716765e+05\n", + " 63055.625278\n", + " 48.589050\n", " \n", " \n", " min\n", - " 1.000102e+06\n", - " 7.500000e+04\n", + " 2020.000000\n", + " 6.000000e+03\n", + " 5132.000000\n", " 0.000000\n", - " 0.000000\n", - " 290.000000\n", - " 5.200000e+02\n", - " 1.000000\n", - " 0.000000\n", - " 0.000000\n", - " 1.000000\n", - " 1.000000\n", - " 290.000000\n", - " 0.000000\n", - " 1900.000000\n", - " 0.000000\n", - " 98001.000000\n", - " 47.155900\n", - " -122.519000\n", - " 399.000000\n", - " 651.000000\n", " \n", " \n", " 25%\n", - " 2.123049e+09\n", - " 3.219500e+05\n", - " 3.000000\n", - " 1.750000\n", - " 1427.000000\n", - " 5.040000e+03\n", - " 1.000000\n", + " 2022.000000\n", + " 1.000000e+05\n", + " 95000.000000\n", " 0.000000\n", - " 0.000000\n", - " 3.000000\n", - " 7.000000\n", - " 1190.000000\n", - " 0.000000\n", - " 1951.000000\n", - " 0.000000\n", - " 98033.000000\n", - " 47.471000\n", - " -122.328000\n", - " 1490.000000\n", - " 5100.000000\n", " \n", " \n", " 50%\n", - " 3.904930e+09\n", - " 4.500000e+05\n", - " 3.000000\n", - " 2.250000\n", - " 1910.000000\n", - " 7.618000e+03\n", - " 1.500000\n", + " 2022.000000\n", + " 1.380000e+05\n", + " 135000.000000\n", " 0.000000\n", - " 0.000000\n", - " 3.000000\n", - " 7.000000\n", - " 1560.000000\n", - " 0.000000\n", - " 1975.000000\n", - " 0.000000\n", - " 98065.000000\n", - " 47.571800\n", - " -122.230000\n", - " 1840.000000\n", - " 7620.000000\n", " \n", " \n", " 75%\n", - " 7.308900e+09\n", - " 6.450000e+05\n", - " 4.000000\n", - " 2.500000\n", - " 2550.000000\n", - " 1.068800e+04\n", - " 2.000000\n", - " 0.000000\n", - " 0.000000\n", - " 4.000000\n", - " 8.000000\n", - " 2210.000000\n", - " 560.000000\n", - " 1997.000000\n", - " 0.000000\n", - " 98118.000000\n", - " 47.678000\n", - " -122.125000\n", - " 2360.000000\n", - " 10083.000000\n", + " 2023.000000\n", + " 1.800000e+05\n", + " 175000.000000\n", + " 100.000000\n", " \n", " \n", " max\n", - " 9.900000e+09\n", - " 7.700000e+06\n", - " 33.000000\n", - " 8.000000\n", - " 13540.000000\n", - " 1.651359e+06\n", - " 3.500000\n", - " 1.000000\n", - " 4.000000\n", - " 5.000000\n", - " 13.000000\n", - " 9410.000000\n", - " 4820.000000\n", - " 2015.000000\n", - " 2015.000000\n", - " 98199.000000\n", - " 47.777600\n", - " -121.315000\n", - " 6210.000000\n", - " 871200.000000\n", + " 2023.000000\n", + " 3.040000e+07\n", + " 450000.000000\n", + " 100.000000\n", " \n", " \n", "\n", "" ], "text/plain": [ - " id price bedrooms bathrooms sqft_living \\\n", - "count 2.161300e+04 2.161300e+04 21613.000000 21613.000000 21613.000000 \n", - "mean 4.580302e+09 5.400881e+05 3.370842 2.114757 2079.899736 \n", - "std 2.876566e+09 3.671272e+05 0.930062 0.770163 918.440897 \n", - "min 1.000102e+06 7.500000e+04 0.000000 0.000000 290.000000 \n", - "25% 2.123049e+09 3.219500e+05 3.000000 1.750000 1427.000000 \n", - "50% 3.904930e+09 4.500000e+05 3.000000 2.250000 1910.000000 \n", - "75% 7.308900e+09 6.450000e+05 4.000000 2.500000 2550.000000 \n", - "max 9.900000e+09 7.700000e+06 33.000000 8.000000 13540.000000 \n", - "\n", - " sqft_lot floors waterfront view condition \\\n", - "count 2.161300e+04 21613.000000 21613.000000 21613.000000 21613.000000 \n", - "mean 1.510697e+04 1.494309 0.007542 0.234303 3.409430 \n", - "std 4.142051e+04 0.539989 0.086517 0.766318 0.650743 \n", - "min 5.200000e+02 1.000000 0.000000 0.000000 1.000000 \n", - "25% 5.040000e+03 1.000000 0.000000 0.000000 3.000000 \n", - "50% 7.618000e+03 1.500000 0.000000 0.000000 3.000000 \n", - "75% 1.068800e+04 2.000000 0.000000 0.000000 4.000000 \n", - "max 1.651359e+06 3.500000 1.000000 4.000000 5.000000 \n", - "\n", - " grade sqft_above sqft_basement yr_built yr_renovated \\\n", - "count 21613.000000 21613.000000 21613.000000 21613.000000 21613.000000 \n", - "mean 7.656873 1788.390691 291.509045 1971.005136 84.402258 \n", - "std 1.175459 828.090978 442.575043 29.373411 401.679240 \n", - "min 1.000000 290.000000 0.000000 1900.000000 0.000000 \n", - "25% 7.000000 1190.000000 0.000000 1951.000000 0.000000 \n", - "50% 7.000000 1560.000000 0.000000 1975.000000 0.000000 \n", - "75% 8.000000 2210.000000 560.000000 1997.000000 0.000000 \n", - "max 13.000000 9410.000000 4820.000000 2015.000000 2015.000000 \n", - "\n", - " zipcode lat long sqft_living15 sqft_lot15 \n", - "count 21613.000000 21613.000000 21613.000000 21613.000000 21613.000000 \n", - "mean 98077.939805 47.560053 -122.213896 1986.552492 12768.455652 \n", - "std 53.505026 0.138564 0.140828 685.391304 27304.179631 \n", - "min 98001.000000 47.155900 -122.519000 399.000000 651.000000 \n", - "25% 98033.000000 47.471000 -122.328000 1490.000000 5100.000000 \n", - "50% 98065.000000 47.571800 -122.230000 1840.000000 7620.000000 \n", - "75% 98118.000000 47.678000 -122.125000 2360.000000 10083.000000 \n", - "max 98199.000000 47.777600 -121.315000 6210.000000 871200.000000 " + " work_year salary salary_in_usd remote_ratio\n", + "count 3755.000000 3.755000e+03 3755.000000 3755.000000\n", + "mean 2022.373635 1.906956e+05 137570.389880 46.271638\n", + "std 0.691448 6.716765e+05 63055.625278 48.589050\n", + "min 2020.000000 6.000000e+03 5132.000000 0.000000\n", + "25% 2022.000000 1.000000e+05 95000.000000 0.000000\n", + "50% 2022.000000 1.380000e+05 135000.000000 0.000000\n", + "75% 2023.000000 1.800000e+05 175000.000000 100.000000\n", + "max 2023.000000 3.040000e+07 450000.000000 100.000000" ] }, - "execution_count": 25, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -580,65 +418,45 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "id 0\n", - "date 0\n", - "price 0\n", - "bedrooms 0\n", - "bathrooms 0\n", - "sqft_living 0\n", - "sqft_lot 0\n", - "floors 0\n", - "waterfront 0\n", - "view 0\n", - "condition 0\n", - "grade 0\n", - "sqft_above 0\n", - "sqft_basement 0\n", - "yr_built 0\n", - "yr_renovated 0\n", - "zipcode 0\n", - "lat 0\n", - "long 0\n", - "sqft_living15 0\n", - "sqft_lot15 0\n", + "work_year 0\n", + "experience_level 0\n", + "employment_type 0\n", + "job_title 0\n", + "salary 0\n", + "salary_currency 0\n", + "salary_in_usd 0\n", + "employee_residence 0\n", + "remote_ratio 0\n", + "company_location 0\n", + "company_size 0\n", "dtype: int64\n" ] }, { "data": { "text/plain": [ - "id False\n", - "date False\n", - "price False\n", - "bedrooms False\n", - "bathrooms False\n", - "sqft_living False\n", - "sqft_lot False\n", - "floors False\n", - "waterfront False\n", - "view False\n", - "condition False\n", - "grade False\n", - "sqft_above False\n", - "sqft_basement False\n", - "yr_built False\n", - "yr_renovated False\n", - "zipcode False\n", - "lat False\n", - "long False\n", - "sqft_living15 False\n", - "sqft_lot15 False\n", + "work_year False\n", + "experience_level False\n", + "employment_type False\n", + "job_title False\n", + "salary False\n", + "salary_currency False\n", + "salary_in_usd False\n", + "employee_residence False\n", + "remote_ratio False\n", + "company_location False\n", + "company_size False\n", "dtype: bool" ] }, - "execution_count": 26, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } @@ -656,13 +474,6 @@ "df.isnull().any()" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Ооо, пропущенных колонок нету :)" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -672,16 +483,16 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Размер обучающей выборки: 17290\n", - "Размер контрольной выборки: 4323\n", - "Размер тестовой выборки: 4323\n" + "Размер обучающей выборки: 3004\n", + "Размер контрольной выборки: 751\n", + "Размер тестовой выборки: 751\n" ] } ], @@ -701,12 +512,12 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 7, "metadata": {}, "outputs": [ { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "
" ] @@ -716,7 +527,7 @@ }, { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "
" ] @@ -726,7 +537,7 @@ }, { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "
" ] @@ -738,40 +549,46 @@ "name": "stdout", "output_type": "stream", "text": [ - "Средняя цена в обучающей выборке: 537768.04794679\n", - "Средняя цена в контрольной выборке: 549367.443673375\n", - "Средняя цена в тестовой выборке: 549367.443673375\n" + "Средняя заработная плата в обучающей выборке: 138055.9893475366\n", + "Средняя заработная плата в контрольной выборке: 135627.99201065247\n", + "Средняя заработная плата в тестовой выборке: 135627.99201065247\n" ] } ], "source": [ - "# Оценка сбалансированности целевой переменной (цена)\n", - "# Визуализация распределения цены в выборках (гистограмма)\n", - "def plot_price_distribution(data, title):\n", - " sns.histplot(data['price'], kde=True)\n", + "import pandas as pd\n", + "import seaborn as sns\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Предположим, что у вас уже есть данные, разделенные на обучающую, контрольную и тестовую выборки\n", + "# train_data, val_data, test_data\n", + "\n", + "# Визуализация распределения заработной платы в выборках (гистограмма)\n", + "def plot_salary_distribution(data, title):\n", + " sns.histplot(data['salary_in_usd'], kde=True)\n", " plt.title(title)\n", - " plt.xlabel('Цена')\n", + " plt.xlabel('Заработная плата (USD)')\n", " plt.ylabel('Частота')\n", " plt.show()\n", "\n", - "plot_price_distribution(train_data, 'Распределение цены в обучающей выборке')\n", - "plot_price_distribution(val_data, 'Распределение цены в контрольной выборке')\n", - "plot_price_distribution(test_data, 'Распределение цены в тестовой выборке')\n", + "plot_salary_distribution(train_data, 'Распределение заработной платы в обучающей выборке')\n", + "plot_salary_distribution(val_data, 'Распределение заработной платы в контрольной выборке')\n", + "plot_salary_distribution(test_data, 'Распределение заработной платы в тестовой выборке')\n", "\n", - "# Оценка сбалансированности данных по целевой переменной (price)\n", - "print(\"Средняя цена в обучающей выборке: \", train_data['price'].mean())\n", - "print(\"Средняя цена в контрольной выборке: \", val_data['price'].mean())\n", - "print(\"Средняя цена в тестовой выборке: \", test_data['price'].mean())" + "# Оценка сбалансированности данных по целевой переменной (salary_in_usd)\n", + "print(\"Средняя заработная плата в обучающей выборке: \", train_data['salary_in_usd'].mean())\n", + "print(\"Средняя заработная плата в контрольной выборке: \", val_data['salary_in_usd'].mean())\n", + "print(\"Средняя заработная плата в тестовой выборке: \", test_data['salary_in_usd'].mean())" ] }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 10, "metadata": {}, "outputs": [ { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "
" ] @@ -781,7 +598,7 @@ }, { "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAHHCAYAAABeLEexAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABMpElEQVR4nO3deVgV9f///8cBZBEERQXElbTcza0U99xIyTRNs8x9y9Tc0t6+K9fMtNz3yq2yRTMtNfd9IbfEXNH8avo2xRVRVEB4/f7ox/l4ABURxZz77bq4Ls5rXjPznDNzDg9mXnOOzRhjBAAAYGFOmV0AAABAZiMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAbCUa9euafz48fbHUVFRmjJlSuYVBDzmhgwZIpvN5tBWqFAhtWvXLnMKekgIRJlozpw5stls9h93d3c988wz6tGjhyIjIzO7POCJ5OHhoQ8++EDz5s3TqVOnNGTIEC1ZsiSzywKQyVwyuwBIw4YNU1BQkG7evKktW7Zo2rRp+vXXX7V//35lzZo1s8sDnijOzs4aOnSo2rRpo8TERHl7e2vZsmWZXRbwrxIRESEnpyfrnAqB6DHQoEEDVaxYUZLUqVMn5cyZU2PHjtXPP/+s119/PZOrA548/fr102uvvaZTp06pePHiyp49e2aXhCfYrVu3lJiYKFdX18wuJcO4ublldgkZ7smKd0+I2rVrS5KOHz8uSbp06ZLeffddlS5dWl5eXvL29laDBg20d+/eFPPevHlTQ4YM0TPPPCN3d3flyZNHTZs21bFjxyRJJ06ccLhMl/ynVq1a9mVt2LBBNptNP/zwg/773/8qICBAnp6eevnll3Xq1KkU696+fbtefPFF+fj4KGvWrKpZs6a2bt2a6jbWqlUr1fUPGTIkRd9vvvlGFSpUkIeHh3x9fdWyZctU13+3bbtdYmKixo8fr5IlS8rd3V3+/v7q2rWrLl++7NCvUKFCeumll1Ksp0ePHimWmVrtn376aYrnVJJiY2M1ePBgFSlSRG5ubsqfP78GDBig2NjYVJ+r29WqVSvF8kaMGCEnJyd9++239rbNmzerefPmKlCggH0dffr00Y0bN+x92rVrd9djwWaz6cSJE/b+y5cvV/Xq1eXp6als2bIpNDRUBw4ccKjlTsssUqSIQ7+pU6eqZMmScnNzU2BgoLp3766oqKgU21qqVCnt3r1bVapUkYeHh4KCgjR9+nSHfknH6YYNGxzaQ0NDU+yX28dC5MuXT8HBwXJxcVFAQECqy0guaf4LFy44tO/atUs2m01z5sxxaH9Yx1qPHj3uWGPSpfjb911q7rX/kz8XCxYssL8Oc+XKpTfffFOnT59OsdzDhw+rRYsWyp07tzw8PFS0aFG9//77KfoVKlQoTetNy3F3J//v//0/NW/eXL6+vsqaNasqV67scDYwMjJSLi4uGjp0aIp5IyIiZLPZNHnyZHtbVFSUevfurfz588vNzU1FihTRqFGjlJiYaO+T9D702Wefafz48SpcuLDc3Nx08OBBSdKkSZNUsmRJZc2aVTly5FDFihUdXrt//fWX3n77bRUtWlQeHh7KmTOnmjdvnmJ/Ju3nLVu26J133lHu3LmVPXt2de3aVXFxcYqKilKbNm2UI0cO5ciRQwMGDJAxJtU6x40bp4IFC8rDw0M1a9bU/v377/ncJh9DlFTP1q1b1bdvX+XOnVuenp565ZVXdP78eYd5ExMTNWTIEAUGBipr1qx64YUXdPDgwUwfl8QZosdQUnjJmTOnpH9e1IsXL1bz5s0VFBSkyMhIzZgxQzVr1tTBgwcVGBgoSUpISNBLL72ktWvXqmXLlurVq5euXr2q1atXa//+/SpcuLB9Ha+//roaNmzosN6BAwemWs+IESNks9n03nvv6dy5cxo/frzq1q2r8PBweXh4SJLWrVunBg0aqEKFCho8eLCcnJw0e/Zs1a5dW5s3b9bzzz+fYrn58uXTyJEjJf0z0LVbt26prvvDDz9UixYt1KlTJ50/f16TJk1SjRo1tGfPnlT/s+/SpYuqV68uSfrpp5+0aNEih+ldu3bVnDlz1L59e73zzjs6fvy4Jk+erD179mjr1q3KkiVLqs/D/YiKirJv2+0SExP18ssva8uWLerSpYuKFy+uffv2ady4cTpy5IgWL158X+uZPXu2PvjgA40ZM0ZvvPGGvX3BggW6fv26unXrppw5c2rHjh2aNGmS/ve//2nBggWS/nke6tata5+ndevWeuWVV9S0aVN7W+7cuSVJX3/9tdq2bauQkBCNGjVK169f17Rp01StWjXt2bNHhQoVss/j5uamL7/80qHObNmy2X8fMmSIhg4dqrp166pbt26KiIjQtGnTtHPnzhTP/+XLl9WwYUO1aNFCr7/+uubPn69u3brJ1dVVHTp0uOPzsmnTJv36669peg7HjBnz0MbsPYpj7UGktq927typiRMnOrQlbcNzzz2nkSNHKjIyUhMmTNDWrVsdXod//PGHqlevrixZsqhLly4qVKiQjh07piVLlmjEiBEp1l+9enV16dJFknTo0CF9/PHHDtPv57hLLjIyUlWqVNH169f1zjvvKGfOnJo7d65efvll/fjjj3rllVfk7++vmjVrav78+Ro8eLDD/D/88IOcnZ3VvHlzSdL169dVs2ZNnT59Wl27dlWBAgW0bds2DRw4UGfOnHEYqC/989q8efOmunTpIjc3N/n6+uqLL77QO++8o1dffVW9evXSzZs39ccff2j79u321+/OnTu1bds2tWzZUvny5dOJEyc0bdo01apVSwcPHkwxjKJnz54KCAjQ0KFD9dtvv+nzzz9X9uzZtW3bNhUoUEAff/yxfv31V3366acqVaqU2rRp4zD/V199patXr6p79+66efOmJkyYoNq1a2vfvn3y9/e/4/N7Jz179lSOHDk0ePBgnThxQuPHj1ePHj30ww8/2PsMHDhQo0ePVqNGjRQSEqK9e/cqJCREN2/evO/1ZSiDTDN79mwjyaxZs8acP3/enDp1ynz//fcmZ86cxsPDw/zvf/8zxhhz8+ZNk5CQ4DDv8ePHjZubmxk2bJi9bdasWUaSGTt2bIp1JSYm2ueTZD799NMUfUqWLGlq1qxpf7x+/XojyeTNm9dER0fb2+fPn28kmQkTJtiX/fTTT5uQkBD7eowx5vr16yYoKMjUq1cvxbqqVKliSpUqZX98/vx5I8kMHjzY3nbixAnj7OxsRowY4TDvvn37jIuLS4r2o0ePGklm7ty59rbBgweb2w/zzZs3G0lm3rx5DvOuWLEiRXvBggVNaGhoitq7d+9ukr90ktc+YMAA4+fnZypUqODwnH799dfGycnJbN682WH+6dOnG0lm69atKdZ3u5o1a9qXt2zZMuPi4mL69euXot/169dTtI0cOdLYbDbz119/pbrs5NuQ5OrVqyZ79uymc+fODu1nz541Pj4+Du1t27Y1np6ed6z/3LlzxtXV1dSvX9/hmJ48ebKRZGbNmuWwrZLMmDFj7G2xsbGmbNmyxs/Pz8TFxRlj/u84Xb9+vb1fpUqVTIMGDVJsU/Lj4dy5cyZbtmz2vrcvIzVJ858/f96hfefOnUaSmT17tr3tYR5r3bt3v2ONSe8rx48fv+u23GlfLViwwOG5iIuLM35+fqZUqVLmxo0b9n5Lly41ksygQYPsbTVq1DDZsmVLcYzd/r6QJG/evKZ9+/b2x8n34/0cd6np3bu3keTwWrt69aoJCgoyhQoVsh9/M2bMMJLMvn37HOYvUaKEqV27tv3x8OHDjaenpzly5IhDv//85z/G2dnZnDx50hjzf++x3t7e5ty5cw59GzdubEqWLHnXulN77YaFhRlJ5quvvrK3Je3n5O+7wcHBxmazmbfeesveduvWLZMvXz6H96KkOm//W2OMMdu3bzeSTJ8+fextyV83xvxzzLZt2zZFPXXr1nWop0+fPsbZ2dlERUUZY/7Zfy4uLqZJkyYOyxsyZIiR5LDMR41LZo+BunXrKnfu3MqfP79atmwpLy8vLVq0SHnz5pX0z39xSYPXEhISdPHiRXl5ealo0aL6/fff7ctZuHChcuXKpZ49e6ZYR/LT7vejTZs2Dv/hv/rqq8qTJ4/9P/Dw8HAdPXpUb7zxhi5evKgLFy7owoULiomJUZ06dbRp0yaHU8rSP5f23N3d77ren376SYmJiWrRooV9mRcuXFBAQICefvpprV+/3qF/XFycpLtf216wYIF8fHxUr149h2VWqFBBXl5eKZYZHx/v0O/ChQv3/C/m9OnTmjRpkj788EN5eXmlWH/x4sVVrFgxh2UmXSZNvv472bFjh1q0aKFmzZrp008/TTE96cydJMXExOjChQuqUqWKjDHas2dPmtaRZPXq1YqKitLrr7/uULOzs7MqVaqU5polac2aNYqLi1Pv3r0dBmR27tw51cHNLi4u6tq1q/2xq6urunbtqnPnzmn37t2pruOnn37Szp079cknn9yznuHDh8vHx0fvvPNOmrchrR7msXbz5k1duHBBFy9eTPHaymi7du3SuXPn9Pbbbzu8ZkNDQ1WsWDH7Pjt//rw2bdqkDh06qECBAg7LSO39Jy4u7q6v1Qc97n799Vc9//zzqlatmr3Ny8tLXbp00YkTJ+yXsJo2bSoXFxeHMxj79+/XwYMH9dprr9nbFixYoOrVqytHjhwO9dStW1cJCQnatGmTw/qbNWtmP8OaJHv27Prf//6nnTt33rHu21+78fHxunjxoooUKaLs2bM7vN8n6dixo8PzW6lSJRlj1LFjR3ubs7OzKlasqP/3//5fivmbNGli/1sjSc8//7wqVaqU5jOsyXXp0sWhnurVqyshIUF//fWXJGnt2rW6deuW3n77bYf5Uvu79ahxyewxMGXKFD3zzDNycXGRv7+/ihYt6vDHIjExURMmTNDUqVN1/PhxJSQk2KclXVaT/rnUVrRoUbm4ZOxuffrppx0eJ40JSbqmffToUUlS27Zt77iMK1euKEeOHPbHFy5cSLHc5I4ePSpjzB37Jb/ckDQGJXkISb7MK1euyM/PL9Xp586dc3i8atWqFG9q9zJ48GAFBgaqa9eu+vHHH1Os/9ChQ3dcZvL1p+b06dMKDQ1VTEyMLl68mOofm5MnT2rQoEH65ZdfUoxXuXLlyn1szf/t36TQlpy3t3eal5X0pli0aFGHdldXVz311FP26UkCAwPl6enp0PbMM89I+mcMROXKlR2mJSQk6L///a9atWqlMmXK3LWW48ePa8aMGZo2bdo9w3l6PMxjbebMmZo5c6akf567SpUqaezYsfabMzLSnfaZJBUrVkxbtmyRJPsf21KlSqVpuVeuXLnna1VK/3H3119/qVKlSinaixcvbp9eqlQp5cqVS3Xq1NH8+fM1fPhwSf9cLnNxcXG4fHz06FH98ccfaX7tBgUFpejz3nvvac2aNXr++edVpEgR1a9fX2+88YaqVq1q73Pjxg2NHDlSs2fP1unTpx3G/aT22k0ePn18fCRJ+fPnT9Ge/L1ASvn+Lv3zGps/f35qm3lPyetJet9PWnfS8ZR8XKGvr6/D34jMQCB6DDz//PN3fSP7+OOP9eGHH6pDhw4aPny4fH195eTkpN69ez/0/w7TIqmGTz/9VGXLlk21z+1vfHFxcTpz5ozq1at3z+XabDYtX75czs7Od12mJJ09e1aSFBAQcNdl+vn5ad68ealOT/5mV6lSJX300UcObZMnT9bPP/+c6vyHDh3SnDlz9M0336Q6PiQxMVGlS5fW2LFjU50/+ZtYav7880+VL19e48aNU+vWrTV37lyHMJqQkKB69erp0qVLeu+991SsWDF5enrq9OnTateu3X0fM0n9v/7661Sf24wO4A9i5syZOnHihFauXHnPvu+//76efvpptW3bVps3b87wWh7msda4cWP16NFDxhgdP35cw4YN00svvWQPEY+7S5cuKS4u7p6vVenRHHctW7ZU+/btFR4errJly2r+/PmqU6eOcuXK5VBPvXr1NGDAgFSXkRTUk9x+pidJ8eLFFRERoaVLl2rFihVauHChpk6dqkGDBtkHdvfs2VOzZ89W7969FRwcLB8fH9lsNrVs2TLV125q7413ar89XD0sd6rnUaz7QT0+72S4ox9//FEvvPCC/T/CJFFRUQ4v2MKFC2v79u2Kj4/P0MGayd9kjTH6888/7f+BJw3W9vb2dhikeyd79+5VfHz8Pf+bLVy4sIwxCgoKSvFmk5qDBw/KZrOl+p/s7ctcs2aNqlatmuobVnK5cuVKsU13G/g8cOBAlS1b1uFUe/L17927V3Xq1En3Zcyky5X+/v76+eef1a9fPzVs2ND+B3bfvn06cuSI5s6d6zCAcvXq1elaX9L+9fPzS9P+vZuCBQtK+ucOnqeeesreHhcXp+PHj6dY/t9//62YmBiHs0RHjhyRpBQDaq9fv66hQ4fq7bfftq/nTvbs2aPvv/9eixcvvuMb+IN6mMdavnz5HPp6eXmpVatW9305NC1u32fJz9ZERETYpyftz7TcoZR0uSrpbE1qHvS4K1iwoCIiIlK0Hz582D49SZMmTdS1a1f7ZbMjR46kuMmkcOHCunbt2gO/Bjw9PfXaa6/ptddeU1xcnJo2baoRI0Zo4MCBcnd3148//qi2bdtqzJgx9nlu3ryZ4i7MjJJaiD5y5MhdB6w/iKTn/c8//3Q4i3bx4sVUz2A9Sowh+hdwdnZOka4XLFiQ4pbXZs2a6cKFCw63iSZ5kHSedBdCkh9//FFnzpxRgwYNJEkVKlRQ4cKF9dlnn+natWsp5k9+y+WCBQvk7Oyc6m3Gt2vatKn9Q/SS12+M0cWLF+2Pb926pYULF+r555+/62n4Fi1aKCEhwX5q/Ha3bt16oDedsLAw/fzzz/rkk0/uGHZatGih06dP64svvkgx7caNG4qJibnnep555hn73R+TJk1SYmKievXqZZ+e9Af+9ufMGKMJEybc1/YkCQkJkbe3tz7++GPFx8enmJ58/95N3bp15erqqokTJzrUN3PmTF25ckWhoaEO/W/duqUZM2bYH8fFxWnGjBnKnTu3KlSo4NB3woQJiomJSfUW7+T+85//qGrVqnr55ZfTXPv9epjHWnJJZw4eRrirWLGi/Pz8NH36dIePhli+fLkOHTpk32e5c+dWjRo1NGvWLJ08edJhGclfv99//71cXV0dxvck96DHXcOGDbVjxw6FhYXZ22JiYvT555+rUKFCKlGihL09e/bsCgkJ0fz58+21NWnSxGF5LVq0UFhYWKpnH6OionTr1q271iPJ4T1L+udyZ4kSJWSMsW9jau/3kyZNchgqkZEWL17s8Ldkx44d2r59u/39PaPVqVNHLi4umjZtmkN7an+3HjXOEP0LvPTSSxo2bJjat2+vKlWqaN++fZo3b57Df9jSP4Ofv/rqK/Xt21c7duxQ9erVFRMTozVr1ujtt99W48aN07V+X19fVatWTe3bt1dkZKTGjx+vIkWKqHPnzpIkJycnffnll2rQoIFKliyp9u3bK2/evDp9+rTWr18vb29vLVmyRDExMZoyZYomTpyoZ555xuHzRpKC1B9//KGwsDAFBwercOHC+uijjzRw4ECdOHFCTZo0UbZs2XT8+HEtWrRIXbp00bvvvqs1a9boww8/1B9//HHPr2CoWbOmunbtqpEjRyo8PFz169dXlixZdPToUS1YsEATJkzQq6++mq7nadWqVapXr95d/4Ns3bq15s+fr7feekvr169X1apVlZCQoMOHD2v+/PlauXLlfY0DCQgI0KeffqpOnTrpzTffVMOGDVWsWDEVLlxY7777rk6fPi1vb28tXLgw3f99eXt7a9q0aWrdurXKly+vli1bKnfu3Dp58qSWLVumqlWrpvnNLHfu3Bo4cKCGDh2qF198US+//LIiIiI0depUPffcc3rzzTcd+gcGBmrUqFE6ceKEnnnmGf3www8KDw/X559/nuIs6KpVqzRixAiHcXV3smrVqjt+RlZarFu3zmEMS9J/2fv27dO+fftUunTph3qsnTx5UitWrLBfMhsxYoQKFiyocuXKZfhlsyxZsmjUqFFq3769atasqddff91+232hQoXUp08fe9+JEyeqWrVqKl++vLp06aKgoCCdOHFCy5Yts998MXjwYH333Xf6z3/+c9dxQA963P3nP//Rd999pwYNGuidd96Rr6+v5s6dq+PHj2vhwoUpPmX5tdde05tvvqmpU6cqJCQkxUd69O/fX7/88oteeukltWvXThUqVFBMTIz27dunH3/8USdOnHA4Y5+a+vXrKyAgQFWrVpW/v78OHTqkyZMnKzQ01H7jyksvvaSvv/5aPj4+KlGihMLCwrRmzZo0HdfpUaRIEVWrVk3dunVTbGysxo8fr5w5c97x0uCD8vf3V69evTRmzBi9/PLLevHFF7V3714tX75cuXLleqAbgB7Yo72pDbdLuk1x586dd+138+ZN069fP5MnTx7j4eFhqlatasLCwhxuwU5y/fp18/7775ugoCCTJUsWExAQYF599VVz7NgxY0z6brv/7rvvzMCBA42fn5/x8PAwoaGhqd66vWfPHtO0aVOTM2dO4+bmZgoWLGhatGhh1q5d67Due/0kv+1y4cKFplq1asbT09N4enqaYsWKme7du5uIiAhjjDE9e/Y0NWrUMCtWrEhRU2q3ixpjzOeff24qVKhgPDw8TLZs2Uzp0qXNgAEDzN9//23vc7+3QttsNrN7926H9tT2UVxcnBk1apQpWbKkcXNzMzly5DAVKlQwQ4cONVeuXEmxvnstzxhjateubQoUKGCuXr1qjDHm4MGDpm7dusbLy8vkypXLdO7c2ezduzfFreHJtyG12+6TrF+/3oSEhBgfHx/j7u5uChcubNq1a2d27dpl73Ov2+6TTJ482RQrVsxkyZLF+Pv7m27dupnLly+n2NaSJUuaXbt2meDgYOPu7m4KFixoJk+enKIuSSZPnjwmJibmrtuUdDw0btw41WWk9bb7+zl+H8axlvRjs9lMQECAadq0qTl06JAxJuNvu0/yww8/mHLlyhk3Nzfj6+trWrVq5XC7dpL9+/ebV155xWTPnt24u7ubokWLmg8//NAYY8x3331nSpUqZSZMmJDiVvw77YO0HHd3cuzYMfPqq6/aa3n++efN0qVLU+0bHR1tPDw8jCTzzTffpNrn6tWrZuDAgaZIkSLG1dXV5MqVy1SpUsV89tln9o+BuNt77IwZM0yNGjXs75GFCxc2/fv3d3jdX7582bRv397kypXLeHl5mZCQEHP48OE73uae/O/HnT4aIvn+vr3OMWPGmPz58xs3NzdTvXp1s3fv3lSXebu01pPafr1165b58MMPTUBAgPHw8DC1a9c2hw4dMjlz5nT4uIBHzWbMv2CkEzLFhg0b9MILL2jBggXp/k/2didOnFBQUJCOHz9+x+vTQ4YM0YkTJ1J84i+sp1atWrpw4UKaxqQ8LpI+ZZfjF4+7pPfjTz/9VO+++25ml6OoqCjlyJFDH330UZouez8MjCECAACPzO1fIZQk6ZO+k3810aPEGCI8Mkl3wtxt0HOZMmXsX0UC/NuULl06s0sAHns//PCD5syZo4YNG8rLy0tbtmzRd999p/r16zt8JtOjRiDCI5MrVy598803d+1z+wehAf82/fr1y+wSgMdemTJl5OLiotGjRys6Oto+0Dr553A9aowhAgAAlscYIgAAYHkEIgAAYHmMIUqDxMRE/f3338qWLVvmfmgUAABIM2OMrl69qsDAwBQfxpkcgSgN/v777zR96SYAAHj8nDp1Svny5btrHwJRGiR9pPqpU6fu+lHzAADg8REdHa38+fPb/47fDYEoDZIuk3l7exOIAAD4l0nLcBcGVQMAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMtzyewCnmQV+n+V2SXg/7f70zYPfR3s78cH+9ta2N/W8rD2N2eIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5T02geiTTz6RzWZT79697W03b95U9+7dlTNnTnl5ealZs2aKjIx0mO/kyZMKDQ1V1qxZ5efnp/79++vWrVsOfTZs2KDy5cvLzc1NRYoU0Zw5cx7BFgEAgH+LxyIQ7dy5UzNmzFCZMmUc2vv06aMlS5ZowYIF2rhxo/7++281bdrUPj0hIUGhoaGKi4vTtm3bNHfuXM2ZM0eDBg2y9zl+/LhCQ0P1wgsvKDw8XL1791anTp20cuXKR7Z9AADg8ZbpgejatWtq1aqVvvjiC+XIkcPefuXKFc2cOVNjx45V7dq1VaFCBc2ePVvbtm3Tb7/9JklatWqVDh48qG+++UZly5ZVgwYNNHz4cE2ZMkVxcXGSpOnTpysoKEhjxoxR8eLF1aNHD7366qsaN25cpmwvAAB4/GR6IOrevbtCQ0NVt25dh/bdu3crPj7eob1YsWIqUKCAwsLCJElhYWEqXbq0/P397X1CQkIUHR2tAwcO2PskX3ZISIh9GamJjY1VdHS0ww8AAHhyuWTmyr///nv9/vvv2rlzZ4ppZ8+elaurq7Jnz+7Q7u/vr7Nnz9r73B6GkqYnTbtbn+joaN24cUMeHh4p1j1y5EgNHTo03dsFAAD+XTLtDNGpU6fUq1cvzZs3T+7u7plVRqoGDhyoK1eu2H9OnTqV2SUBAICHKNMC0e7du3Xu3DmVL19eLi4ucnFx0caNGzVx4kS5uLjI399fcXFxioqKcpgvMjJSAQEBkqSAgIAUd50lPb5XH29v71TPDkmSm5ubvL29HX4AAMCTK9MCUZ06dbRv3z6Fh4fbfypWrKhWrVrZf8+SJYvWrl1rnyciIkInT55UcHCwJCk4OFj79u3TuXPn7H1Wr14tb29vlShRwt7n9mUk9UlaBgAAQKaNIcqWLZtKlSrl0Obp6amcOXPa2zt27Ki+ffvK19dX3t7e6tmzp4KDg1W5cmVJUv369VWiRAm1bt1ao0eP1tmzZ/XBBx+oe/fucnNzkyS99dZbmjx5sgYMGKAOHTpo3bp1mj9/vpYtW/ZoNxgAADy2MnVQ9b2MGzdOTk5OatasmWJjYxUSEqKpU6fapzs7O2vp0qXq1q2bgoOD5enpqbZt22rYsGH2PkFBQVq2bJn69OmjCRMmKF++fPryyy8VEhKSGZsEAAAeQ49VINqwYYPDY3d3d02ZMkVTpky54zwFCxbUr7/+etfl1qpVS3v27MmIEgEAwBMo0z+HCAAAILMRiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOVlaiCaNm2aypQpI29vb3l7eys4OFjLly+3T79586a6d++unDlzysvLS82aNVNkZKTDMk6ePKnQ0FBlzZpVfn5+6t+/v27duuXQZ8OGDSpfvrzc3NxUpEgRzZkz51FsHgAA+JfI1ECUL18+ffLJJ9q9e7d27dql2rVrq3Hjxjpw4IAkqU+fPlqyZIkWLFigjRs36u+//1bTpk3t8yckJCg0NFRxcXHatm2b5s6dqzlz5mjQoEH2PsePH1doaKheeOEFhYeHq3fv3urUqZNWrlz5yLcXAAA8nlwyc+WNGjVyeDxixAhNmzZNv/32m/Lly6eZM2fq22+/Ve3atSVJs2fPVvHixfXbb7+pcuXKWrVqlQ4ePKg1a9bI399fZcuW1fDhw/Xee+9pyJAhcnV11fTp0xUUFKQxY8ZIkooXL64tW7Zo3LhxCgkJeeTbDAAAHj+PzRiihIQEff/994qJiVFwcLB2796t+Ph41a1b196nWLFiKlCggMLCwiRJYWFhKl26tPz9/e19QkJCFB0dbT/LFBYW5rCMpD5Jy0hNbGysoqOjHX4AAMCTK9MD0b59++Tl5SU3Nze99dZbWrRokUqUKKGzZ8/K1dVV2bNnd+jv7++vs2fPSpLOnj3rEIaSpidNu1uf6Oho3bhxI9WaRo4cKR8fH/tP/vz5M2JTAQDAYyrTA1HRokUVHh6u7du3q1u3bmrbtq0OHjyYqTUNHDhQV65csf+cOnUqU+sBAAAPV6aOIZIkV1dXFSlSRJJUoUIF7dy5UxMmTNBrr72muLg4RUVFOZwlioyMVEBAgCQpICBAO3bscFhe0l1ot/dJfmdaZGSkvL295eHhkWpNbm5ucnNzy5DtAwAAj79MP0OUXGJiomJjY1WhQgVlyZJFa9eutU+LiIjQyZMnFRwcLEkKDg7Wvn37dO7cOXuf1atXy9vbWyVKlLD3uX0ZSX2SlgEAAJCpZ4gGDhyoBg0aqECBArp69aq+/fZbbdiwQStXrpSPj486duyovn37ytfXV97e3urZs6eCg4NVuXJlSVL9+vVVokQJtW7dWqNHj9bZs2f1wQcfqHv37vYzPG+99ZYmT56sAQMGqEOHDlq3bp3mz5+vZcuWZeamAwCAx0imBqJz586pTZs2OnPmjHx8fFSmTBmtXLlS9erVkySNGzdOTk5OatasmWJjYxUSEqKpU6fa53d2dtbSpUvVrVs3BQcHy9PTU23bttWwYcPsfYKCgrRs2TL16dNHEyZMUL58+fTll19yyz0AALDL1EA0c+bMu053d3fXlClTNGXKlDv2KViwoH799de7LqdWrVras2dPumoEAABPvsduDBEAAMCjRiACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACWRyACAACW55LeGWNiYrRx40adPHlScXFxDtPeeeedBy4MAADgUUlXINqzZ48aNmyo69evKyYmRr6+vrpw4YKyZs0qPz8/AhEAAPhXSdclsz59+qhRo0a6fPmyPDw89Ntvv+mvv/5ShQoV9Nlnn2V0jQAAAA9VugJReHi4+vXrJycnJzk7Oys2Nlb58+fX6NGj9d///jejawQAAHio0hWIsmTJIienf2b18/PTyZMnJUk+Pj46depUxlUHAADwCKRrDFG5cuW0c+dOPf3006pZs6YGDRqkCxcu6Ouvv1apUqUyukYAAICHKl1niD7++GPlyZNHkjRixAjlyJFD3bp10/nz5/X5559naIEAAAAPW7rOEFWsWNH+u5+fn1asWJFhBQEAADxq6TpDVLt2bUVFRWVwKQAAAJkjXYFow4YNKT6MEQAA4N8q3V/dYbPZMrIOAACATJPur+545ZVX5Orqmuq0devWpbsgAACARy3dgSg4OFheXl4ZWQsAAECmSFcgstls6t+/v/z8/DK6HgAAgEcuXWOIjDEZXQcAAECmSVcgGjx4MJfLAADAEyNdl8wGDx4sSTp//rwiIiIkSUWLFlXu3LkzrjIAAIBHJF1niK5fv64OHTooMDBQNWrUUI0aNRQYGKiOHTvq+vXrGV0jAADAQ5WuQNSnTx9t3LhRv/zyi6KiohQVFaWff/5ZGzduVL9+/TK6RgAAgIcqXZfMFi5cqB9//FG1atWytzVs2FAeHh5q0aKFpk2bllH1AQAAPHTpvmTm7++fot3Pz49LZgAA4F8nXYEoODhYgwcP1s2bN+1tN27c0NChQxUcHJxhxQEAADwK6bpkNn78eL344ovKly+fnn32WUnS3r175e7urpUrV2ZogQAAAA9bugJR6dKldfToUc2bN0+HDx+WJL3++utq1aqVPDw8MrRAAACAhy1dgWjTpk2qUqWKOnfunNH1AAAAPHLpGkP0wgsv6NKlSxldCwAAQKbgu8wAAIDlpeuSmSSFhYUpR44cqU6rUaNGugsCAAB41NIdiF555ZVU2202mxISEtJdEAAAwKOWrktmknT27FklJiam+CEMAQCAf5t0BSKbzZbRdQAAAGQaBlUDAADLS9cYosTExIyuAwAAINOk6wzRyJEjNWvWrBTts2bN0qhRox64KAAAgEcpXYFoxowZKlasWIr2kiVLavr06Q9cFAAAwKOUrkB09uxZ5cmTJ0V77ty5debMmQcuCgAA4FFKVyDKnz+/tm7dmqJ969atCgwMfOCiAAAAHqV0Daru3Lmzevfurfj4eNWuXVuStHbtWg0YMED9+vXL0AIBAAAetnQFov79++vixYt6++23FRcXJ0lyd3fXe++9p4EDB2ZogQAAAA9bugKRzWbTqFGj9OGHH+rQoUPy8PDQ008/LTc3t4yuDwAA4KFL93eZSZKXl5eee+65jKoFAAAgU6Q7EO3atUvz58/XyZMn7ZfNkvz0008PXBgAAMCjkq67zL7//ntVqVJFhw4d0qJFixQfH68DBw5o3bp18vHxyegaAQAAHqp0BaKPP/5Y48aN05IlS+Tq6qoJEybo8OHDatGihQoUKJDRNQIAADxU6QpEx44dU2hoqCTJ1dVVMTExstls6tOnjz7//PMMLRAAAOBhS1cgypEjh65evSpJyps3r/bv3y9JioqK0vXr1zOuOgAAgEcgXYGoRo0aWr16tSSpefPm6tWrlzp37qzXX39dderUSfNyRo4cqeeee07ZsmWTn5+fmjRpooiICIc+N2/eVPfu3ZUzZ055eXmpWbNmioyMdOhz8uRJhYaGKmvWrPLz81P//v1169Ythz4bNmxQ+fLl5ebmpiJFimjOnDnp2XQAAPAESlcgmjx5slq2bClJev/999W3b19FRkaqWbNmmjlzZpqXs3HjRnXv3l2//fabVq9erfj4eNWvX18xMTH2Pn369NGSJUu0YMECbdy4UX///beaNm1qn56QkKDQ0FDFxcVp27Ztmjt3rubMmaNBgwbZ+xw/flyhoaF64YUXFB4ert69e6tTp05auXJlejYfAAA8Ye7rtvvo6Oh/ZnJxkZeXl/3x22+/rbfffvu+V75ixQqHx3PmzJGfn592796tGjVq6MqVK5o5c6a+/fZb+1eEzJ49W8WLF9dvv/2mypUra9WqVTp48KDWrFkjf39/lS1bVsOHD9d7772nIUOGyNXVVdOnT1dQUJDGjBkjSSpevLi2bNmicePGKSQk5L7rBgAAT5b7OkOUPXt25ciR454/6XXlyhVJkq+vryRp9+7dio+PV926de19ihUrpgIFCigsLEySFBYWptKlS8vf39/eJyQkRNHR0Tpw4IC9z+3LSOqTtIzkYmNjFR0d7fADAACeXPd1hmj9+vUOj40xatiwob788kvlzZv3gQpJTExU7969VbVqVZUqVUqSdPbsWbm6uip79uwOff39/XX27Fl7n9vDUNL0pGl36xMdHa0bN27Iw8PDYdrIkSM1dOjQB9oeAADw73FfgahmzZop2pydnVW5cmU99dRTD1RI9+7dtX//fm3ZsuWBlpMRBg4cqL59+9ofR0dHK3/+/JlYEQAAeJge6LvMMkqPHj20dOlSbdq0Sfny5bO3BwQEKC4uTlFRUQ5niSIjIxUQEGDvs2PHDoflJd2Fdnuf5HemRUZGytvbO8XZIUlyc3Pji2oBALCQdN1lluTUqVO6fv26cubMma75jTHq0aOHFi1apHXr1ikoKMhheoUKFZQlSxatXbvW3hYREaGTJ08qODhYkhQcHKx9+/bp3Llz9j6rV6+Wt7e3SpQoYe9z+zKS+iQtAwAAWNt9nSGaOHGi/fcLFy7ou+++U+3atdP9/WXdu3fXt99+q59//lnZsmWzj/nx8fGRh4eHfHx81LFjR/Xt21e+vr7y9vZWz549FRwcrMqVK0uS6tevrxIlSqh169YaPXq0zp49qw8++EDdu3e3n+V56623NHnyZA0YMEAdOnTQunXrNH/+fC1btixddQMAgCfLfQWicePGSZJsNpty5cqlRo0a6YMPPkj3yqdNmyZJqlWrlkP77Nmz1a5dO/s6nZyc1KxZM8XGxiokJERTp06193V2dtbSpUvVrVs3BQcHy9PTU23bttWwYcPsfYKCgrRs2TL16dNHEyZMUL58+fTll19yyz0AAJB0n4Ho+PHjGbpyY8w9+7i7u2vKlCmaMmXKHfsULFhQv/76612XU6tWLe3Zs+e+awQAAE++BxpDBAAA8CQgEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMsjEAEAAMvL1EC0adMmNWrUSIGBgbLZbFq8eLHDdGOMBg0apDx58sjDw0N169bV0aNHHfpcunRJrVq1kre3t7Jnz66OHTvq2rVrDn3++OMPVa9eXe7u7sqfP79Gjx79sDcNAAD8i2RqIIqJidGzzz6rKVOmpDp99OjRmjhxoqZPn67t27fL09NTISEhunnzpr1Pq1atdODAAa1evVpLly7Vpk2b1KVLF/v06Oho1a9fXwULFtTu3bv16aefasiQIfr8888f+vYBAIB/B5fMXHmDBg3UoEGDVKcZYzR+/Hh98MEHaty4sSTpq6++kr+/vxYvXqyWLVvq0KFDWrFihXbu3KmKFStKkiZNmqSGDRvqs88+U2BgoObNm6e4uDjNmjVLrq6uKlmypMLDwzV27FiH4AQAAKzrsR1DdPz4cZ09e1Z169a1t/n4+KhSpUoKCwuTJIWFhSl79uz2MCRJdevWlZOTk7Zv327vU6NGDbm6utr7hISEKCIiQpcvX0513bGxsYqOjnb4AQAAT67HNhCdPXtWkuTv7+/Q7u/vb5929uxZ+fn5OUx3cXGRr6+vQ5/UlnH7OpIbOXKkfHx87D/58+d/8A0CAACPrcc2EGWmgQMH6sqVK/afU6dOZXZJAADgIXpsA1FAQIAkKTIy0qE9MjLSPi0gIEDnzp1zmH7r1i1dunTJoU9qy7h9Hcm5ubnJ29vb4QcAADy5HttAFBQUpICAAK1du9beFh0dre3btys4OFiSFBwcrKioKO3evdveZ926dUpMTFSlSpXsfTZt2qT4+Hh7n9WrV6to0aLKkSPHI9oaAADwOMvUQHTt2jWFh4crPDxc0j8DqcPDw3Xy5EnZbDb17t1bH330kX755Rft27dPbdq0UWBgoJo0aSJJKl68uF588UV17txZO3bs0NatW9WjRw+1bNlSgYGBkqQ33nhDrq6u6tixow4cOKAffvhBEyZMUN++fTNpqwEAwOMmU2+737Vrl1544QX746SQ0rZtW82ZM0cDBgxQTEyMunTpoqioKFWrVk0rVqyQu7u7fZ558+apR48eqlOnjpycnNSsWTNNnDjRPt3Hx0erVq1S9+7dVaFCBeXKlUuDBg3ilnsAAGCXqYGoVq1aMsbccbrNZtOwYcM0bNiwO/bx9fXVt99+e9f1lClTRps3b053nQAA4Mn22I4hAgAAeFQIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIsFYimTJmiQoUKyd3dXZUqVdKOHTsyuyQAAPAYsEwg+uGHH9S3b18NHjxYv//+u5599lmFhITo3LlzmV0aAADIZJYJRGPHjlXnzp3Vvn17lShRQtOnT1fWrFk1a9aszC4NAABkMksEori4OO3evVt169a1tzk5Oalu3boKCwvLxMoAAMDjwCWzC3gULly4oISEBPn7+zu0+/v76/Dhwyn6x8bGKjY21v74ypUrkqTo6Oj7Wm9C7I10VIuH4X73XXqwvx8f7G9rYX9by/3s76S+xph79rVEILpfI0eO1NChQ1O058+fPxOqQUbwmfRWZpeAR4j9bS3sb2tJz/6+evWqfHx87trHEoEoV65ccnZ2VmRkpEN7ZGSkAgICUvQfOHCg+vbta3+cmJioS5cuKWfOnLLZbA+93sdFdHS08ufPr1OnTsnb2zuzy8FDxv62Fva3tVh1fxtjdPXqVQUGBt6zryUCkaurqypUqKC1a9eqSZMmkv4JOWvXrlWPHj1S9Hdzc5Obm5tDW/bs2R9BpY8nb29vS72ArI79bS3sb2ux4v6+15mhJJYIRJLUt29ftW3bVhUrVtTzzz+v8ePHKyYmRu3bt8/s0gAAQCazTCB67bXXdP78eQ0aNEhnz55V2bJltWLFihQDrQEAgPVYJhBJUo8ePVK9RIbUubm5afDgwSkuH+LJxP62Fva3tbC/781m0nIvGgAAwBPMEh/MCAAAcDcEIgAAYHkEIgAAYHkEIouqVauWevfundll4DGS/JgoVKiQxo8fn2n14P7c6zVts9m0ePHiNC9vw4YNstlsioqKeuDa8Pi613GRnuNgyJAhKlu27APX9qhZ6i4zAGm3c+dOeXp6ZnYZyCBnzpxRjhw5MrsM/MtUqVJFZ86cSfOHG/6bEYgApCp37tyZXQIyUGpfU4R/t/j4eGXJkuWhrsPV1dUyxw6XzKDLly+rTZs2ypEjh7JmzaoGDRro6NGjkv75HpjcuXPrxx9/tPcvW7as8uTJY3+8ZcsWubm56fr164+8diuoVauWevbsqd69eytHjhzy9/fXF198Yf+k9WzZsqlIkSJavny5fZ79+/erQYMG8vLykr+/v1q3bq0LFy7Yp8fExKhNmzby8vJSnjx5NGbMmBTrvf2S2YkTJ2Sz2RQeHm6fHhUVJZvNpg0bNkj6v1PrK1euVLly5eTh4aHatWvr3LlzWr58uYoXLy5vb2+98cYbHCsPSWJiogYMGCBfX18FBARoyJAh9mnJL41s27ZNZcuWlbu7uypWrKjFixen2MeStHv3blWsWFFZs2ZVlSpVFBER8Wg25gnz+eefKzAwUImJiQ7tjRs3VocOHSRJP//8s8qXLy93d3c99dRTGjp0qG7dumXva7PZNG3aNL388svy9PTURx99pCJFiuizzz5zWGZ4eLhsNpv+/PPPNNV24cIFvfLKK8qaNauefvpp/fLLL/ZpqV0y++KLL5Q/f35lzZpVr7zyisaOHZvq11t9/fXXKlSokHx8fNSyZUtdvXo1TfVkFgIR1K5dO+3atUu//PKLwsLCZIxRw4YNFR8fL5vNpho1atj/6F2+fFmHDh3SjRs3dPjwYUnSxo0b9dxzzylr1qyZuBVPtrlz5ypXrlzasWOHevbsqW7duql58+aqUqWKfv/9d9WvX1+tW7fW9evXFRUVpdq1a6tcuXLatWuXVqxYocjISLVo0cK+vP79+2vjxo36+eeftWrVKm3YsEG///57htQ6ZMgQTZ48Wdu2bdOpU6fUokULjR8/Xt9++62WLVumVatWadKkSRmyLjiaO3euPD09tX37do0ePVrDhg3T6tWrU/SLjo5Wo0aNVLp0af3+++8aPny43nvvvVSX+f7772vMmDHatWuXXFxc7H+8cX+aN2+uixcvav369fa2S5cuacWKFWrVqpU2b96sNm3aqFevXjp48KBmzJihOXPmaMSIEQ7LGTJkiF555RXt27dPHTt2VIcOHTR79myHPrNnz1aNGjVUpEiRNNU2dOhQtWjRQn/88YcaNmyoVq1a6dKlS6n23bp1q9566y316tVL4eHhqlevXooaJenYsWNavHixli5dqqVLl2rjxo365JNP0lRPpjGwpJo1a5pevXqZI0eOGElm69at9mkXLlwwHh4eZv78+cYYYyZOnGhKlixpjDFm8eLFplKlSqZx48Zm2rRpxhhj6tata/773/8++o2wiJo1a5pq1arZH9+6dct4enqa1q1b29vOnDljJJmwsDAzfPhwU79+fYdlnDp1ykgyERER5urVq8bV1dW+f40x5uLFi8bDw8P06tXL3lawYEEzbtw4Y4wxx48fN5LMnj177NMvX75sJJn169cbY4xZv369kWTWrFlj7zNy5EgjyRw7dsze1rVrVxMSEvIgTwlSkfw4McaY5557zrz33nvGGGMkmUWLFhljjJk2bZrJmTOnuXHjhr3vF1984bCPU9ufy5YtM5Ic5kPaNW7c2HTo0MH+eMaMGSYwMNAkJCSYOnXqmI8//tih/9dff23y5MljfyzJ9O7d26HP6dOnjbOzs9m+fbsxxpi4uDiTK1cuM2fOnDTVJMl88MEH9sfXrl0zkszy5cuNMf93HFy+fNkYY8xrr71mQkNDHZbRqlUr4+PjY388ePBgkzVrVhMdHW1v69+/v6lUqVKaasosnCGyuEOHDsnFxUWVKlWyt+XMmVNFixbVoUOHJEk1a9bUwYMHdf78eW3cuFG1atVSrVq1tGHDBsXHx2vbtm2qVatWJm2BNZQpU8b+u7Ozs3LmzKnSpUvb25K+k+/cuXPau3ev1q9fLy8vL/tPsWLFJP3zX9uxY8cUFxfnsM99fX1VtGjRDK/V399fWbNm1VNPPeXQdu7cuQxZFxzd/txLUp48eVJ9riMiIlSmTBm5u7vb255//vl7LjPpUjn7L31atWqlhQsXKjY2VpI0b948tWzZUk5OTtq7d6+GDRvm8Lrt3Lmzzpw543CJuWLFig7LDAwMVGhoqGbNmiVJWrJkiWJjY9W8efM013X7Pvb09JS3t/cd93FERESKYyW1Y6dQoULKli2b/fGdjsXHCYOqcU+lS5eWr6+vNm7cqI0bN2rEiBEKCAjQqFGjtHPnTsXHx6tKlSqZXeYTLfnASZvN5tBms9kk/TOG5Nq1a2rUqJFGjRqVYjl58uRJ87iC2zk5/fO/k7ntm37i4+PvWWvyOpPako+jQMZ4GM/1nY4z3L9GjRrJGKNly5bpueee0+bNmzVu3DhJ0rVr1zR06FA1bdo0xXy3B9fU7vzs1KmTWrdurXHjxmn27Nl67bXX7msIw8M+bjJqmQ8bgcjiihcvrlu3bmn79u32UHPx4kVFRESoRIkSkv45kKtXr66ff/5ZBw4cULVq1ZQ1a1bFxsZqxowZqlixIrdnP0bKly+vhQsXqlChQnJxSfkSL1y4sLJkyaLt27erQIECkv4ZG3bkyBHVrFkz1WUm3XF25swZlStXTpJSDL7Fv0fRokX1zTffKDY21v5lnzt37szkqp587u7uatq0qebNm6c///xTRYsWVfny5SX987qNiIhI87if2zVs2FCenp6aNm2aVqxYoU2bNmV06XZFixZNcaw8KccOl8ws7umnn1bjxo3VuXNnbdmyRXv37tWbb76pvHnzqnHjxvZ+tWrV0nfffaeyZcvKy8tLTk5OqlGjhubNm3fHP6LIHN27d9elS5f0+uuva+fOnTp27JhWrlyp9u3bKyEhQV5eXurYsaP69++vdevWaf/+/WrXrp39LFBqPDw8VLlyZX3yySc6dOiQNm7cqA8++OARbhUy0htvvKHExER16dJFhw4d0sqVK+13KiWdBcLD0apVKy1btkyzZs1Sq1at7O2DBg3SV199paFDh+rAgQM6dOiQvv/++zS9zpydndWuXTsNHDhQTz/9tIKDgx9a/T179tSvv/6qsWPH6ujRo5oxY4aWL1/+RBw3BCJo9uzZqlChgl566SUFBwfLGKNff/3V4ZRnzZo1lZCQ4DBWqFatWinakPkCAwO1detWJSQkqH79+ipdurR69+6t7Nmz20PPp59+qurVq6tRo0aqW7euqlWrpgoVKtx1ubNmzdKtW7dUoUIF9e7dWx999NGj2Bw8BN7e3lqyZInCw8NVtmxZvf/++xo0aJAkx8szyHi1a9eWr6+vIiIi9MYbb9jbQ0JCtHTpUq1atUrPPfecKleurHHjxqlgwYJpWm7Hjh0VFxen9u3bP6zSJUlVq1bV9OnTNXbsWD377LNasWKF+vTp80QcNzZz+6AAAIAlzZs3T+3bt9eVK1fk4eGR2eXgPm3evFl16tTRqVOn7DdZPCqdO3fW4cOHtXnz5ke63ozGGCIAsKCvvvpKTz31lPLmzau9e/fqvffeU4sWLQhD/zKxsbE6f/68hgwZoubNmz+SMPTZZ5+pXr168vT01PLlyzV37lxNnTr1oa/3YeOSGQBY0NmzZ/Xmm2+qePHi6tOnj5o3b67PP/88s8vCffruu+9UsGBBRUVFafTo0Q7T5s2b53Ab/+0/JUuWTPc6d+zYoXr16ql06dKaPn26Jk6cqE6dOj3opmQ6LpkBAPAEunr1qiIjI1OdliVLljSPT7IKAhEAALA8LpkBAADLIxABAADLIxABAADLIxABAADLIxABuKN27dqpSZMmDm3nz59XqVKlVKlSJV25ciVzCgOADEYgApBm58+fV+3ateXh4aFVq1bJx8cns0sCgAxBIAKQJhcuXFCdOnXk5uam1atXO4ShsWPHqnTp0vL09FT+/Pn19ttv69q1a5KkDRs2yGaz3fEnyZYtW1S9enV5eHgof/78eueddxQTE2OfXqhQoRTzvvvuu/bp06ZNU+HCheXq6qqiRYvq66+/dqjfZrNp2rRpatCggTw8PPTUU0/pxx9/tE8/ceKEbDabwsPD7W0ffvihbDabxo8fb287fPiw6tWrJx8fH3sd2bNnv+PzlrT9UVFRKepZvHix/XFsbKzeffdd5c2bV56enqpUqZI2bNhgnz5nzpwU60le853WJUlRUVGy2WwOywTwfwhEAO7p4sWLqlu3rlxcXLR69eoUf5idnJw0ceJEHThwQHPnztW6des0YMAASVKVKlV05swZnTlzRgsXLpQk++MzZ85Iko4dO6YXX3xRzZo10x9//KEffvhBW7ZsUY8ePRzWM2zYMId5Bw8eLElatGiRevXqpX79+mn//v3q2rWr2rdvr/Xr1zvM/+GHH6pZs2bau3evWrVqpZYtW+rQoUOpbvP//vc/jR8/PsVXWXTo0EHx8fHaunWrzpw54xCWHkSPHj0UFham77//Xn/88YeaN2+uF198UUePHs2Q5QO4BwMAd9C2bVtTo0YNU7ZsWZMlSxZTuXJlc+vWrXvOt2DBApMzZ84U7evXrzepve107NjRdOnSxaFt8+bNxsnJydy4ccMYY0zBggXNuHHjUl1flSpVTOfOnR3amjdvbho2bGh/LMm89dZbDn0qVapkunXrZowx5vjx40aS2bNnjzHGmDZt2piOHTumWK+Hh4eZN2+e/fHs2bONj49PqnXdvs2XL192aJdkFi1aZIwx5q+//jLOzs7m9OnTDn3q1KljBg4ceMf1JK/5TusyxpjLly8bSWb9+vV3rBWwMs4QAbirTZs2KTExUeHh4frzzz9TfF+SJK1Zs0Z16tRR3rx5lS1bNrVu3VoXL17U9evX07SOvXv3as6cOQ7ftRQSEqLExEQdP378nvMfOnRIVatWdWirWrVqirM/wcHBKR6ndobo999/16JFizR8+PAU04KCgrRo0aI0b1ta7Nu3TwkJCXrmmWccnoONGzfq2LFj9n5XrlxJ0/dR5cuXT9myZVNQUJA6d+7M4HcgDfi2ewB39dRTT2nt2rXKlSuXpk6dqjfffFOhoaEqU6aMpH/Gsbz00kvq1q2bRowYIV9fX23ZskUdO3ZUXFycsmbNes91XLt2TV27dtU777yTYlqBAgUyfJvupV+/fnr33XeVJ0+eFNNmzpyptm3bKlu2bPLw8NCtW7fk7u7+QOu7du2anJ2dtXv3bjk7OztM8/Lysv+eLVs2/f777/bHp0+fVq1atVIsb/PmzcqWLZtOnDihTp066f3339dHH330QDUCTzoCEYC7Kl26tHLlyiVJat68uX766Se1adNGO3bskKurq3bv3q3ExESNGTNGTk7/nHSeP3/+fa2jfPnyOnjwoIoUKZKuGosXL66tW7eqbdu29ratW7eqRIkSDv1+++03tWnTxuFxuXLlHPr88ssvOnLkiJYtW5bquipXrqyXX35ZmzZt0jfffKNFixbp448/TlfdScqVK6eEhASdO3dO1atXv2M/Jycnh+fIxSX1t/CgoCBlz55dRYoUUfPmzRUWFvZA9QFWQCACcF+mTJmiUqVKaejQoRoxYoSKFCmi+Ph4TZo0SY0aNdLWrVs1ffr0+1rme++9p8qVK6tHjx7q1KmTPD09dfDgQa1evVqTJ0++5/z9+/dXixYtVK5cOdWtW1dLlizRTz/9pDVr1jj0W7BggSpWrKhq1app3rx52rFjh2bOnOnQZ/To0Zo0adIdz2wtXLhQc+bM0e7du1WgQAH5+fmlaRtjY2N18+ZNh7b4+HglJibqmWeeUatWrdSmTRuNGTNG5cqV0/nz57V27VqVKVNGoaGhaVpH8nWdOHFCy5cvV7Vq1e5rfsCKGEME4L74+vrqiy++0KhRo7R9+3Y9++yzGjt2rEaNGqVSpUpp3rx5Gjly5H0ts0yZMtq4caOOHDmi6tWrq1y5cho0aJACAwPTNH+TJk00YcIEffbZZypZsqRmzJih2bNnp7icNHToUH3//fcqU6aMvvrqK3333XcpziIVKVLE4UzT7Y4cOaJOnTrp22+/ve9LeQEBAfLw8LD/SFKLFi20adMmSdLs2bPVpk0b9evXT0WLFlWTJk20c+fOdF0yTFpX9erV9eyzz973/gCsyGaMMZldBAA8bDabTYsWLUrxyduZqUmTJurdu3eq44AAPFqcIQKATOLq6mofdwUgczGGCAAyyf0OPgfw8BCIAFgCowMA3A3nagEAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOX9fwLPn/VSlnXqAAAAAElFTkSuQmCC", + "image/png": "", "text/plain": [ "
" ] @@ -791,7 +608,7 @@ }, { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "
" ] @@ -803,36 +620,42 @@ "name": "stdout", "output_type": "stream", "text": [ - "Размер обучающей выборки после oversampling и undersampling: 17620\n" + "Размер обучающей выборки после oversampling и undersampling: 3044\n" ] } ], "source": [ + "import pandas as pd\n", + "import seaborn as sns\n", + "import matplotlib.pyplot as plt\n", "from sklearn.model_selection import train_test_split\n", "from imblearn.over_sampling import RandomOverSampler\n", "from imblearn.under_sampling import RandomUnderSampler\n", "\n", - "# Преобразование целевой переменной (цены) в категориальные диапазоны с использованием квантилей\n", - "train_data['price_category'] = pd.qcut(train_data['price'], q=4, labels=['low', 'medium', 'high', 'very_high'])\n", + "# Предположим, что у вас уже есть данные, разделенные на обучающую, контрольную и тестовую выборки\n", + "# train_data, val_data, test_data\n", "\n", - "# Визуализация распределения цен после преобразования в категории\n", - "sns.countplot(x=train_data['price_category'])\n", - "plt.title('Распределение категорий цены в обучающей выборке')\n", - "plt.xlabel('Категория цены')\n", + "# Преобразование целевой переменной (заработная плата) в категориальные диапазоны с использованием квантилей\n", + "train_data['salary_category'] = pd.qcut(train_data['salary_in_usd'], q=4, labels=['low', 'medium', 'high', 'very_high'])\n", + "\n", + "# Визуализация распределения заработной платы после преобразования в категории\n", + "sns.countplot(x=train_data['salary_category'])\n", + "plt.title('Распределение категорий заработной платы в обучающей выборке')\n", + "plt.xlabel('Категория заработной платы')\n", "plt.ylabel('Частота')\n", "plt.show()\n", "\n", "# Балансировка категорий с помощью RandomOverSampler (увеличение меньшинств)\n", "ros = RandomOverSampler(random_state=42)\n", - "X_train = train_data.drop(columns=['price', 'price_category'])\n", - "y_train = train_data['price_category']\n", + "X_train = train_data.drop(columns=['salary_in_usd', 'salary_category'])\n", + "y_train = train_data['salary_category']\n", "\n", "X_resampled, y_resampled = ros.fit_resample(X_train, y_train)\n", "\n", - "# Визуализация распределения цен после oversampling\n", + "# Визуализация распределения заработной платы после oversampling\n", "sns.countplot(x=y_resampled)\n", - "plt.title('Распределение категорий цены после oversampling')\n", - "plt.xlabel('Категория цены')\n", + "plt.title('Распределение категорий заработной платы после oversampling')\n", + "plt.xlabel('Категория заработной платы')\n", "plt.ylabel('Частота')\n", "plt.show()\n", "\n", @@ -840,10 +663,10 @@ "rus = RandomUnderSampler(random_state=42)\n", "X_resampled, y_resampled = rus.fit_resample(X_resampled, y_resampled)\n", "\n", - "# Визуализация распределения цен после undersampling\n", + "# Визуализация распределения заработной платы после undersampling\n", "sns.countplot(x=y_resampled)\n", - "plt.title('Распределение категорий цены после undersampling')\n", - "plt.xlabel('Категория цен')\n", + "plt.title('Распределение категорий заработной платы после undersampling')\n", + "plt.xlabel('Категория заработной платы')\n", "plt.ylabel('Частота')\n", "plt.show()\n", "\n", @@ -855,33 +678,35 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Конструирование признаков \n", - "\n", + "### Конструирование признаков\n", "Теперь приступим к конструированию признаков для решения каждой задачи.\n", "\n", - "**Процесс конструирования признаков** \n", - "Задача 1: Прогнозирование цен недвижимости. Цель технического проекта: Разработка модели машинного обучения для точного прогнозирования рыночной стоимости недвижимости. \n", - "Задача 2: Оптимизация затрат на ремонт перед продажей. Цель технического проекта: Разработка модели машинного обучения для точного прогнозирования по рекомендациям по реновациям.\n", + "**Процесс конструирования признаков**\n", + "Задача 1: Прогнозирование заработной платы в Data Science. Цель технического проекта: Разработка модели машинного обучения для точного прогнозирования заработной платы специалистов в области Data Science.\n", + "Задача 2: Оптимизация распределения ресурсов в компании. Цель технического проекта: Разработка модели машинного обучения для оптимизации распределения ресурсов на Data Science проекты.\n", "\n", - "**Унитарное кодирование** \n", + "**Унитарное кодирование**\n", "Унитарное кодирование категориальных признаков (one-hot encoding). Преобразование категориальных признаков в бинарные векторы.\n", "\n", - "**Дискретизация числовых признаков** \n", + "**Дискретизация числовых признаков**\n", "Процесс преобразования непрерывных числовых значений в дискретные категории или интервалы (бины)." ] }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Столбцы train_data_encoded: ['id', 'price', 'bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'grade', 'sqft_above', 'sqft_basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat', 'long', 'sqft_living15', 'sqft_lot15', 'price_category', 'date_20140502T000000', 'date_20140503T000000', 'date_20140504T000000', 'date_20140505T000000', 'date_20140506T000000', 'date_20140507T000000', 'date_20140508T000000', 'date_20140509T000000', 'date_20140510T000000', 'date_20140511T000000', 'date_20140512T000000', 'date_20140513T000000', 'date_20140514T000000', 'date_20140515T000000', 'date_20140516T000000', 'date_20140517T000000', 'date_20140518T000000', 'date_20140519T000000', 'date_20140520T000000', 'date_20140521T000000', 'date_20140522T000000', 'date_20140523T000000', 'date_20140524T000000', 'date_20140525T000000', 'date_20140526T000000', 'date_20140527T000000', 'date_20140528T000000', 'date_20140529T000000', 'date_20140530T000000', 'date_20140531T000000', 'date_20140601T000000', 'date_20140602T000000', 'date_20140603T000000', 'date_20140604T000000', 'date_20140605T000000', 'date_20140606T000000', 'date_20140607T000000', 'date_20140608T000000', 'date_20140609T000000', 'date_20140610T000000', 'date_20140611T000000', 'date_20140612T000000', 'date_20140613T000000', 'date_20140614T000000', 'date_20140615T000000', 'date_20140616T000000', 'date_20140617T000000', 'date_20140618T000000', 'date_20140619T000000', 'date_20140620T000000', 'date_20140621T000000', 'date_20140622T000000', 'date_20140623T000000', 'date_20140624T000000', 'date_20140625T000000', 'date_20140626T000000', 'date_20140627T000000', 'date_20140628T000000', 'date_20140629T000000', 'date_20140630T000000', 'date_20140701T000000', 'date_20140702T000000', 'date_20140703T000000', 'date_20140704T000000', 'date_20140705T000000', 'date_20140706T000000', 'date_20140707T000000', 'date_20140708T000000', 'date_20140709T000000', 'date_20140710T000000', 'date_20140711T000000', 'date_20140712T000000', 'date_20140713T000000', 'date_20140714T000000', 'date_20140715T000000', 'date_20140716T000000', 'date_20140717T000000', 'date_20140718T000000', 'date_20140719T000000', 'date_20140720T000000', 'date_20140721T000000', 'date_20140722T000000', 'date_20140723T000000', 'date_20140724T000000', 'date_20140725T000000', 'date_20140726T000000', 'date_20140728T000000', 'date_20140729T000000', 'date_20140730T000000', 'date_20140731T000000', 'date_20140801T000000', 'date_20140802T000000', 'date_20140804T000000', 'date_20140805T000000', 'date_20140806T000000', 'date_20140807T000000', 'date_20140808T000000', 'date_20140809T000000', 'date_20140810T000000', 'date_20140811T000000', 'date_20140812T000000', 'date_20140813T000000', 'date_20140814T000000', 'date_20140815T000000', 'date_20140816T000000', 'date_20140817T000000', 'date_20140818T000000', 'date_20140819T000000', 'date_20140820T000000', 'date_20140821T000000', 'date_20140822T000000', 'date_20140823T000000', 'date_20140824T000000', 'date_20140825T000000', 'date_20140826T000000', 'date_20140827T000000', 'date_20140828T000000', 'date_20140829T000000', 'date_20140830T000000', 'date_20140831T000000', 'date_20140901T000000', 'date_20140902T000000', 'date_20140903T000000', 'date_20140904T000000', 'date_20140905T000000', 'date_20140906T000000', 'date_20140907T000000', 'date_20140908T000000', 'date_20140909T000000', 'date_20140910T000000', 'date_20140911T000000', 'date_20140912T000000', 'date_20140913T000000', 'date_20140914T000000', 'date_20140915T000000', 'date_20140916T000000', 'date_20140917T000000', 'date_20140918T000000', 'date_20140919T000000', 'date_20140920T000000', 'date_20140921T000000', 'date_20140922T000000', 'date_20140923T000000', 'date_20140924T000000', 'date_20140925T000000', 'date_20140926T000000', 'date_20140927T000000', 'date_20140928T000000', 'date_20140929T000000', 'date_20140930T000000', 'date_20141001T000000', 'date_20141002T000000', 'date_20141003T000000', 'date_20141004T000000', 'date_20141005T000000', 'date_20141006T000000', 'date_20141007T000000', 'date_20141008T000000', 'date_20141009T000000', 'date_20141010T000000', 'date_20141011T000000', 'date_20141012T000000', 'date_20141013T000000', 'date_20141014T000000', 'date_20141015T000000', 'date_20141016T000000', 'date_20141017T000000', 'date_20141018T000000', 'date_20141019T000000', 'date_20141020T000000', 'date_20141021T000000', 'date_20141022T000000', 'date_20141023T000000', 'date_20141024T000000', 'date_20141025T000000', 'date_20141026T000000', 'date_20141027T000000', 'date_20141028T000000', 'date_20141029T000000', 'date_20141030T000000', 'date_20141031T000000', 'date_20141101T000000', 'date_20141102T000000', 'date_20141103T000000', 'date_20141104T000000', 'date_20141105T000000', 'date_20141106T000000', 'date_20141107T000000', 'date_20141108T000000', 'date_20141109T000000', 'date_20141110T000000', 'date_20141111T000000', 'date_20141112T000000', 'date_20141113T000000', 'date_20141114T000000', 'date_20141115T000000', 'date_20141116T000000', 'date_20141117T000000', 'date_20141118T000000', 'date_20141119T000000', 'date_20141120T000000', 'date_20141121T000000', 'date_20141122T000000', 'date_20141123T000000', 'date_20141124T000000', 'date_20141125T000000', 'date_20141126T000000', 'date_20141128T000000', 'date_20141129T000000', 'date_20141130T000000', 'date_20141201T000000', 'date_20141202T000000', 'date_20141203T000000', 'date_20141204T000000', 'date_20141205T000000', 'date_20141206T000000', 'date_20141207T000000', 'date_20141208T000000', 'date_20141209T000000', 'date_20141210T000000', 'date_20141211T000000', 'date_20141212T000000', 'date_20141213T000000', 'date_20141214T000000', 'date_20141215T000000', 'date_20141216T000000', 'date_20141217T000000', 'date_20141218T000000', 'date_20141219T000000', 'date_20141220T000000', 'date_20141221T000000', 'date_20141222T000000', 'date_20141223T000000', 'date_20141224T000000', 'date_20141226T000000', 'date_20141227T000000', 'date_20141229T000000', 'date_20141230T000000', 'date_20141231T000000', 'date_20150102T000000', 'date_20150105T000000', 'date_20150106T000000', 'date_20150107T000000', 'date_20150108T000000', 'date_20150109T000000', 'date_20150110T000000', 'date_20150112T000000', 'date_20150113T000000', 'date_20150114T000000', 'date_20150115T000000', 'date_20150116T000000', 'date_20150117T000000', 'date_20150119T000000', 'date_20150120T000000', 'date_20150121T000000', 'date_20150122T000000', 'date_20150123T000000', 'date_20150124T000000', 'date_20150125T000000', 'date_20150126T000000', 'date_20150127T000000', 'date_20150128T000000', 'date_20150129T000000', 'date_20150130T000000', 'date_20150201T000000', 'date_20150202T000000', 'date_20150203T000000', 'date_20150204T000000', 'date_20150205T000000', 'date_20150206T000000', 'date_20150207T000000', 'date_20150209T000000', 'date_20150210T000000', 'date_20150211T000000', 'date_20150212T000000', 'date_20150213T000000', 'date_20150214T000000', 'date_20150215T000000', 'date_20150216T000000', 'date_20150217T000000', 'date_20150218T000000', 'date_20150219T000000', 'date_20150220T000000', 'date_20150221T000000', 'date_20150222T000000', 'date_20150223T000000', 'date_20150224T000000', 'date_20150225T000000', 'date_20150226T000000', 'date_20150227T000000', 'date_20150228T000000', 'date_20150301T000000', 'date_20150302T000000', 'date_20150303T000000', 'date_20150304T000000', 'date_20150305T000000', 'date_20150306T000000', 'date_20150307T000000', 'date_20150308T000000', 'date_20150309T000000', 'date_20150310T000000', 'date_20150311T000000', 'date_20150312T000000', 'date_20150313T000000', 'date_20150314T000000', 'date_20150315T000000', 'date_20150316T000000', 'date_20150317T000000', 'date_20150318T000000', 'date_20150319T000000', 'date_20150320T000000', 'date_20150321T000000', 'date_20150322T000000', 'date_20150323T000000', 'date_20150324T000000', 'date_20150325T000000', 'date_20150326T000000', 'date_20150327T000000', 'date_20150328T000000', 'date_20150329T000000', 'date_20150330T000000', 'date_20150331T000000', 'date_20150401T000000', 'date_20150402T000000', 'date_20150403T000000', 'date_20150404T000000', 'date_20150405T000000', 'date_20150406T000000', 'date_20150407T000000', 'date_20150408T000000', 'date_20150409T000000', 'date_20150410T000000', 'date_20150411T000000', 'date_20150412T000000', 'date_20150413T000000', 'date_20150414T000000', 'date_20150415T000000', 'date_20150416T000000', 'date_20150417T000000', 'date_20150418T000000', 'date_20150419T000000', 'date_20150420T000000', 'date_20150421T000000', 'date_20150422T000000', 'date_20150423T000000', 'date_20150424T000000', 'date_20150425T000000', 'date_20150426T000000', 'date_20150427T000000', 'date_20150428T000000', 'date_20150429T000000', 'date_20150430T000000', 'date_20150501T000000', 'date_20150502T000000', 'date_20150503T000000', 'date_20150504T000000', 'date_20150505T000000', 'date_20150506T000000', 'date_20150507T000000', 'date_20150508T000000', 'date_20150509T000000', 'date_20150510T000000', 'date_20150511T000000', 'date_20150512T000000', 'date_20150513T000000', 'date_20150514T000000', 'date_20150515T000000', 'date_20150524T000000', 'waterfront_0', 'waterfront_1', 'view_0', 'view_1', 'view_2', 'view_3', 'view_4', 'condition_1', 'condition_2', 'condition_3', 'condition_4', 'condition_5']\n", - "Столбцы val_data_encoded: ['id', 'price', 'bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'grade', 'sqft_above', 'sqft_basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat', 'long', 'sqft_living15', 'sqft_lot15', 'date_20140502T000000', 'date_20140503T000000', 'date_20140505T000000', 'date_20140506T000000', 'date_20140507T000000', 'date_20140508T000000', 'date_20140509T000000', 'date_20140510T000000', 'date_20140511T000000', 'date_20140512T000000', 'date_20140513T000000', 'date_20140514T000000', 'date_20140515T000000', 'date_20140516T000000', 'date_20140518T000000', 'date_20140519T000000', 'date_20140520T000000', 'date_20140521T000000', 'date_20140522T000000', 'date_20140523T000000', 'date_20140524T000000', 'date_20140525T000000', 'date_20140526T000000', 'date_20140527T000000', 'date_20140528T000000', 'date_20140529T000000', 'date_20140530T000000', 'date_20140531T000000', 'date_20140601T000000', 'date_20140602T000000', 'date_20140603T000000', 'date_20140604T000000', 'date_20140605T000000', 'date_20140606T000000', 'date_20140607T000000', 'date_20140609T000000', 'date_20140610T000000', 'date_20140611T000000', 'date_20140612T000000', 'date_20140613T000000', 'date_20140614T000000', 'date_20140615T000000', 'date_20140616T000000', 'date_20140617T000000', 'date_20140618T000000', 'date_20140619T000000', 'date_20140620T000000', 'date_20140621T000000', 'date_20140622T000000', 'date_20140623T000000', 'date_20140624T000000', 'date_20140625T000000', 'date_20140626T000000', 'date_20140627T000000', 'date_20140628T000000', 'date_20140629T000000', 'date_20140630T000000', 'date_20140701T000000', 'date_20140702T000000', 'date_20140703T000000', 'date_20140707T000000', 'date_20140708T000000', 'date_20140709T000000', 'date_20140710T000000', 'date_20140711T000000', 'date_20140712T000000', 'date_20140713T000000', 'date_20140714T000000', 'date_20140715T000000', 'date_20140716T000000', 'date_20140717T000000', 'date_20140718T000000', 'date_20140719T000000', 'date_20140721T000000', 'date_20140722T000000', 'date_20140723T000000', 'date_20140724T000000', 'date_20140725T000000', 'date_20140727T000000', 'date_20140728T000000', 'date_20140729T000000', 'date_20140730T000000', 'date_20140731T000000', 'date_20140801T000000', 'date_20140802T000000', 'date_20140803T000000', 'date_20140804T000000', 'date_20140805T000000', 'date_20140806T000000', 'date_20140807T000000', 'date_20140808T000000', 'date_20140810T000000', 'date_20140811T000000', 'date_20140812T000000', 'date_20140813T000000', 'date_20140814T000000', 'date_20140815T000000', 'date_20140817T000000', 'date_20140818T000000', 'date_20140819T000000', 'date_20140820T000000', 'date_20140821T000000', 'date_20140822T000000', 'date_20140825T000000', 'date_20140826T000000', 'date_20140827T000000', 'date_20140828T000000', 'date_20140829T000000', 'date_20140831T000000', 'date_20140901T000000', 'date_20140902T000000', 'date_20140903T000000', 'date_20140904T000000', 'date_20140905T000000', 'date_20140907T000000', 'date_20140908T000000', 'date_20140909T000000', 'date_20140910T000000', 'date_20140911T000000', 'date_20140912T000000', 'date_20140913T000000', 'date_20140914T000000', 'date_20140915T000000', 'date_20140916T000000', 'date_20140917T000000', 'date_20140918T000000', 'date_20140919T000000', 'date_20140921T000000', 'date_20140922T000000', 'date_20140923T000000', 'date_20140924T000000', 'date_20140925T000000', 'date_20140926T000000', 'date_20140927T000000', 'date_20140929T000000', 'date_20140930T000000', 'date_20141001T000000', 'date_20141002T000000', 'date_20141003T000000', 'date_20141006T000000', 'date_20141007T000000', 'date_20141008T000000', 'date_20141009T000000', 'date_20141010T000000', 'date_20141012T000000', 'date_20141013T000000', 'date_20141014T000000', 'date_20141015T000000', 'date_20141016T000000', 'date_20141017T000000', 'date_20141018T000000', 'date_20141019T000000', 'date_20141020T000000', 'date_20141021T000000', 'date_20141022T000000', 'date_20141023T000000', 'date_20141024T000000', 'date_20141027T000000', 'date_20141028T000000', 'date_20141029T000000', 'date_20141030T000000', 'date_20141031T000000', 'date_20141101T000000', 'date_20141103T000000', 'date_20141104T000000', 'date_20141105T000000', 'date_20141106T000000', 'date_20141107T000000', 'date_20141108T000000', 'date_20141109T000000', 'date_20141110T000000', 'date_20141111T000000', 'date_20141112T000000', 'date_20141113T000000', 'date_20141114T000000', 'date_20141115T000000', 'date_20141116T000000', 'date_20141117T000000', 'date_20141118T000000', 'date_20141119T000000', 'date_20141120T000000', 'date_20141121T000000', 'date_20141122T000000', 'date_20141123T000000', 'date_20141124T000000', 'date_20141125T000000', 'date_20141126T000000', 'date_20141128T000000', 'date_20141201T000000', 'date_20141202T000000', 'date_20141203T000000', 'date_20141204T000000', 'date_20141205T000000', 'date_20141206T000000', 'date_20141208T000000', 'date_20141209T000000', 'date_20141210T000000', 'date_20141211T000000', 'date_20141212T000000', 'date_20141214T000000', 'date_20141215T000000', 'date_20141216T000000', 'date_20141217T000000', 'date_20141218T000000', 'date_20141219T000000', 'date_20141220T000000', 'date_20141222T000000', 'date_20141223T000000', 'date_20141224T000000', 'date_20141226T000000', 'date_20141227T000000', 'date_20141229T000000', 'date_20141230T000000', 'date_20141231T000000', 'date_20150102T000000', 'date_20150105T000000', 'date_20150106T000000', 'date_20150107T000000', 'date_20150108T000000', 'date_20150109T000000', 'date_20150112T000000', 'date_20150113T000000', 'date_20150114T000000', 'date_20150115T000000', 'date_20150116T000000', 'date_20150119T000000', 'date_20150120T000000', 'date_20150121T000000', 'date_20150122T000000', 'date_20150123T000000', 'date_20150124T000000', 'date_20150126T000000', 'date_20150127T000000', 'date_20150128T000000', 'date_20150129T000000', 'date_20150130T000000', 'date_20150131T000000', 'date_20150202T000000', 'date_20150203T000000', 'date_20150204T000000', 'date_20150205T000000', 'date_20150206T000000', 'date_20150207T000000', 'date_20150209T000000', 'date_20150210T000000', 'date_20150211T000000', 'date_20150212T000000', 'date_20150213T000000', 'date_20150214T000000', 'date_20150216T000000', 'date_20150217T000000', 'date_20150218T000000', 'date_20150219T000000', 'date_20150220T000000', 'date_20150221T000000', 'date_20150222T000000', 'date_20150223T000000', 'date_20150224T000000', 'date_20150225T000000', 'date_20150226T000000', 'date_20150227T000000', 'date_20150228T000000', 'date_20150301T000000', 'date_20150302T000000', 'date_20150303T000000', 'date_20150304T000000', 'date_20150305T000000', 'date_20150306T000000', 'date_20150307T000000', 'date_20150309T000000', 'date_20150310T000000', 'date_20150311T000000', 'date_20150312T000000', 'date_20150313T000000', 'date_20150315T000000', 'date_20150316T000000', 'date_20150317T000000', 'date_20150318T000000', 'date_20150319T000000', 'date_20150320T000000', 'date_20150321T000000', 'date_20150323T000000', 'date_20150324T000000', 'date_20150325T000000', 'date_20150326T000000', 'date_20150327T000000', 'date_20150328T000000', 'date_20150329T000000', 'date_20150330T000000', 'date_20150331T000000', 'date_20150401T000000', 'date_20150402T000000', 'date_20150403T000000', 'date_20150404T000000', 'date_20150406T000000', 'date_20150407T000000', 'date_20150408T000000', 'date_20150409T000000', 'date_20150410T000000', 'date_20150411T000000', 'date_20150412T000000', 'date_20150413T000000', 'date_20150414T000000', 'date_20150415T000000', 'date_20150416T000000', 'date_20150417T000000', 'date_20150419T000000', 'date_20150420T000000', 'date_20150421T000000', 'date_20150422T000000', 'date_20150423T000000', 'date_20150424T000000', 'date_20150425T000000', 'date_20150426T000000', 'date_20150427T000000', 'date_20150428T000000', 'date_20150429T000000', 'date_20150430T000000', 'date_20150501T000000', 'date_20150502T000000', 'date_20150503T000000', 'date_20150504T000000', 'date_20150505T000000', 'date_20150506T000000', 'date_20150507T000000', 'date_20150508T000000', 'date_20150509T000000', 'date_20150511T000000', 'date_20150512T000000', 'date_20150513T000000', 'date_20150514T000000', 'date_20150527T000000', 'waterfront_0', 'waterfront_1', 'view_0', 'view_1', 'view_2', 'view_3', 'view_4', 'condition_1', 'condition_2', 'condition_3', 'condition_4', 'condition_5']\n", - "Столбцы test_data_encoded: ['id', 'price', 'bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'grade', 'sqft_above', 'sqft_basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat', 'long', 'sqft_living15', 'sqft_lot15', 'date_20140502T000000', 'date_20140503T000000', 'date_20140505T000000', 'date_20140506T000000', 'date_20140507T000000', 'date_20140508T000000', 'date_20140509T000000', 'date_20140510T000000', 'date_20140511T000000', 'date_20140512T000000', 'date_20140513T000000', 'date_20140514T000000', 'date_20140515T000000', 'date_20140516T000000', 'date_20140518T000000', 'date_20140519T000000', 'date_20140520T000000', 'date_20140521T000000', 'date_20140522T000000', 'date_20140523T000000', 'date_20140524T000000', 'date_20140525T000000', 'date_20140526T000000', 'date_20140527T000000', 'date_20140528T000000', 'date_20140529T000000', 'date_20140530T000000', 'date_20140531T000000', 'date_20140601T000000', 'date_20140602T000000', 'date_20140603T000000', 'date_20140604T000000', 'date_20140605T000000', 'date_20140606T000000', 'date_20140607T000000', 'date_20140609T000000', 'date_20140610T000000', 'date_20140611T000000', 'date_20140612T000000', 'date_20140613T000000', 'date_20140614T000000', 'date_20140615T000000', 'date_20140616T000000', 'date_20140617T000000', 'date_20140618T000000', 'date_20140619T000000', 'date_20140620T000000', 'date_20140621T000000', 'date_20140622T000000', 'date_20140623T000000', 'date_20140624T000000', 'date_20140625T000000', 'date_20140626T000000', 'date_20140627T000000', 'date_20140628T000000', 'date_20140629T000000', 'date_20140630T000000', 'date_20140701T000000', 'date_20140702T000000', 'date_20140703T000000', 'date_20140707T000000', 'date_20140708T000000', 'date_20140709T000000', 'date_20140710T000000', 'date_20140711T000000', 'date_20140712T000000', 'date_20140713T000000', 'date_20140714T000000', 'date_20140715T000000', 'date_20140716T000000', 'date_20140717T000000', 'date_20140718T000000', 'date_20140719T000000', 'date_20140721T000000', 'date_20140722T000000', 'date_20140723T000000', 'date_20140724T000000', 'date_20140725T000000', 'date_20140727T000000', 'date_20140728T000000', 'date_20140729T000000', 'date_20140730T000000', 'date_20140731T000000', 'date_20140801T000000', 'date_20140802T000000', 'date_20140803T000000', 'date_20140804T000000', 'date_20140805T000000', 'date_20140806T000000', 'date_20140807T000000', 'date_20140808T000000', 'date_20140810T000000', 'date_20140811T000000', 'date_20140812T000000', 'date_20140813T000000', 'date_20140814T000000', 'date_20140815T000000', 'date_20140817T000000', 'date_20140818T000000', 'date_20140819T000000', 'date_20140820T000000', 'date_20140821T000000', 'date_20140822T000000', 'date_20140825T000000', 'date_20140826T000000', 'date_20140827T000000', 'date_20140828T000000', 'date_20140829T000000', 'date_20140831T000000', 'date_20140901T000000', 'date_20140902T000000', 'date_20140903T000000', 'date_20140904T000000', 'date_20140905T000000', 'date_20140907T000000', 'date_20140908T000000', 'date_20140909T000000', 'date_20140910T000000', 'date_20140911T000000', 'date_20140912T000000', 'date_20140913T000000', 'date_20140914T000000', 'date_20140915T000000', 'date_20140916T000000', 'date_20140917T000000', 'date_20140918T000000', 'date_20140919T000000', 'date_20140921T000000', 'date_20140922T000000', 'date_20140923T000000', 'date_20140924T000000', 'date_20140925T000000', 'date_20140926T000000', 'date_20140927T000000', 'date_20140929T000000', 'date_20140930T000000', 'date_20141001T000000', 'date_20141002T000000', 'date_20141003T000000', 'date_20141006T000000', 'date_20141007T000000', 'date_20141008T000000', 'date_20141009T000000', 'date_20141010T000000', 'date_20141012T000000', 'date_20141013T000000', 'date_20141014T000000', 'date_20141015T000000', 'date_20141016T000000', 'date_20141017T000000', 'date_20141018T000000', 'date_20141019T000000', 'date_20141020T000000', 'date_20141021T000000', 'date_20141022T000000', 'date_20141023T000000', 'date_20141024T000000', 'date_20141027T000000', 'date_20141028T000000', 'date_20141029T000000', 'date_20141030T000000', 'date_20141031T000000', 'date_20141101T000000', 'date_20141103T000000', 'date_20141104T000000', 'date_20141105T000000', 'date_20141106T000000', 'date_20141107T000000', 'date_20141108T000000', 'date_20141109T000000', 'date_20141110T000000', 'date_20141111T000000', 'date_20141112T000000', 'date_20141113T000000', 'date_20141114T000000', 'date_20141115T000000', 'date_20141116T000000', 'date_20141117T000000', 'date_20141118T000000', 'date_20141119T000000', 'date_20141120T000000', 'date_20141121T000000', 'date_20141122T000000', 'date_20141123T000000', 'date_20141124T000000', 'date_20141125T000000', 'date_20141126T000000', 'date_20141128T000000', 'date_20141201T000000', 'date_20141202T000000', 'date_20141203T000000', 'date_20141204T000000', 'date_20141205T000000', 'date_20141206T000000', 'date_20141208T000000', 'date_20141209T000000', 'date_20141210T000000', 'date_20141211T000000', 'date_20141212T000000', 'date_20141214T000000', 'date_20141215T000000', 'date_20141216T000000', 'date_20141217T000000', 'date_20141218T000000', 'date_20141219T000000', 'date_20141220T000000', 'date_20141222T000000', 'date_20141223T000000', 'date_20141224T000000', 'date_20141226T000000', 'date_20141227T000000', 'date_20141229T000000', 'date_20141230T000000', 'date_20141231T000000', 'date_20150102T000000', 'date_20150105T000000', 'date_20150106T000000', 'date_20150107T000000', 'date_20150108T000000', 'date_20150109T000000', 'date_20150112T000000', 'date_20150113T000000', 'date_20150114T000000', 'date_20150115T000000', 'date_20150116T000000', 'date_20150119T000000', 'date_20150120T000000', 'date_20150121T000000', 'date_20150122T000000', 'date_20150123T000000', 'date_20150124T000000', 'date_20150126T000000', 'date_20150127T000000', 'date_20150128T000000', 'date_20150129T000000', 'date_20150130T000000', 'date_20150131T000000', 'date_20150202T000000', 'date_20150203T000000', 'date_20150204T000000', 'date_20150205T000000', 'date_20150206T000000', 'date_20150207T000000', 'date_20150209T000000', 'date_20150210T000000', 'date_20150211T000000', 'date_20150212T000000', 'date_20150213T000000', 'date_20150214T000000', 'date_20150216T000000', 'date_20150217T000000', 'date_20150218T000000', 'date_20150219T000000', 'date_20150220T000000', 'date_20150221T000000', 'date_20150222T000000', 'date_20150223T000000', 'date_20150224T000000', 'date_20150225T000000', 'date_20150226T000000', 'date_20150227T000000', 'date_20150228T000000', 'date_20150301T000000', 'date_20150302T000000', 'date_20150303T000000', 'date_20150304T000000', 'date_20150305T000000', 'date_20150306T000000', 'date_20150307T000000', 'date_20150309T000000', 'date_20150310T000000', 'date_20150311T000000', 'date_20150312T000000', 'date_20150313T000000', 'date_20150315T000000', 'date_20150316T000000', 'date_20150317T000000', 'date_20150318T000000', 'date_20150319T000000', 'date_20150320T000000', 'date_20150321T000000', 'date_20150323T000000', 'date_20150324T000000', 'date_20150325T000000', 'date_20150326T000000', 'date_20150327T000000', 'date_20150328T000000', 'date_20150329T000000', 'date_20150330T000000', 'date_20150331T000000', 'date_20150401T000000', 'date_20150402T000000', 'date_20150403T000000', 'date_20150404T000000', 'date_20150406T000000', 'date_20150407T000000', 'date_20150408T000000', 'date_20150409T000000', 'date_20150410T000000', 'date_20150411T000000', 'date_20150412T000000', 'date_20150413T000000', 'date_20150414T000000', 'date_20150415T000000', 'date_20150416T000000', 'date_20150417T000000', 'date_20150419T000000', 'date_20150420T000000', 'date_20150421T000000', 'date_20150422T000000', 'date_20150423T000000', 'date_20150424T000000', 'date_20150425T000000', 'date_20150426T000000', 'date_20150427T000000', 'date_20150428T000000', 'date_20150429T000000', 'date_20150430T000000', 'date_20150501T000000', 'date_20150502T000000', 'date_20150503T000000', 'date_20150504T000000', 'date_20150505T000000', 'date_20150506T000000', 'date_20150507T000000', 'date_20150508T000000', 'date_20150509T000000', 'date_20150511T000000', 'date_20150512T000000', 'date_20150513T000000', 'date_20150514T000000', 'date_20150527T000000', 'waterfront_0', 'waterfront_1', 'view_0', 'view_1', 'view_2', 'view_3', 'view_4', 'condition_1', 'condition_2', 'condition_3', 'condition_4', 'condition_5']\n" + "Столбцы train_data_encoded: ['work_year', 'job_title', 'salary', 'salary_currency', 'salary_in_usd', 'remote_ratio', 'salary_category', 'experience_level_EN', 'experience_level_EX', 'experience_level_MI', 'experience_level_SE', 'employment_type_CT', 'employment_type_FL', 'employment_type_FT', 'employment_type_PT', 'employee_residence_AE', 'employee_residence_AM', 'employee_residence_AR', 'employee_residence_AT', 'employee_residence_AU', 'employee_residence_BA', 'employee_residence_BE', 'employee_residence_BG', 'employee_residence_BO', 'employee_residence_BR', 'employee_residence_CA', 'employee_residence_CF', 'employee_residence_CH', 'employee_residence_CL', 'employee_residence_CN', 'employee_residence_CO', 'employee_residence_CR', 'employee_residence_CY', 'employee_residence_CZ', 'employee_residence_DE', 'employee_residence_DK', 'employee_residence_DZ', 'employee_residence_EE', 'employee_residence_EG', 'employee_residence_ES', 'employee_residence_FI', 'employee_residence_FR', 'employee_residence_GB', 'employee_residence_GH', 'employee_residence_GR', 'employee_residence_HK', 'employee_residence_HN', 'employee_residence_HR', 'employee_residence_HU', 'employee_residence_ID', 'employee_residence_IE', 'employee_residence_IL', 'employee_residence_IN', 'employee_residence_IQ', 'employee_residence_IR', 'employee_residence_IT', 'employee_residence_JP', 'employee_residence_KE', 'employee_residence_KW', 'employee_residence_LT', 'employee_residence_LU', 'employee_residence_LV', 'employee_residence_MA', 'employee_residence_MD', 'employee_residence_MK', 'employee_residence_MX', 'employee_residence_MY', 'employee_residence_NG', 'employee_residence_NL', 'employee_residence_NZ', 'employee_residence_PH', 'employee_residence_PK', 'employee_residence_PL', 'employee_residence_PR', 'employee_residence_PT', 'employee_residence_RO', 'employee_residence_RS', 'employee_residence_RU', 'employee_residence_SE', 'employee_residence_SG', 'employee_residence_SI', 'employee_residence_SK', 'employee_residence_TH', 'employee_residence_TN', 'employee_residence_TR', 'employee_residence_UA', 'employee_residence_US', 'employee_residence_UZ', 'employee_residence_VN', 'company_location_AE', 'company_location_AL', 'company_location_AM', 'company_location_AR', 'company_location_AS', 'company_location_AT', 'company_location_AU', 'company_location_BA', 'company_location_BE', 'company_location_BO', 'company_location_BR', 'company_location_CA', 'company_location_CF', 'company_location_CH', 'company_location_CL', 'company_location_CO', 'company_location_CR', 'company_location_CZ', 'company_location_DE', 'company_location_DK', 'company_location_DZ', 'company_location_EE', 'company_location_EG', 'company_location_ES', 'company_location_FI', 'company_location_FR', 'company_location_GB', 'company_location_GH', 'company_location_GR', 'company_location_HN', 'company_location_HR', 'company_location_HU', 'company_location_ID', 'company_location_IE', 'company_location_IL', 'company_location_IN', 'company_location_IQ', 'company_location_IR', 'company_location_IT', 'company_location_JP', 'company_location_KE', 'company_location_LT', 'company_location_LU', 'company_location_LV', 'company_location_MA', 'company_location_MD', 'company_location_MK', 'company_location_MX', 'company_location_MY', 'company_location_NG', 'company_location_NL', 'company_location_NZ', 'company_location_PH', 'company_location_PK', 'company_location_PL', 'company_location_PR', 'company_location_PT', 'company_location_RO', 'company_location_RU', 'company_location_SE', 'company_location_SG', 'company_location_SI', 'company_location_SK', 'company_location_TH', 'company_location_TR', 'company_location_UA', 'company_location_US', 'company_location_VN', 'company_size_L', 'company_size_M', 'company_size_S']\n", + "Столбцы val_data_encoded: ['work_year', 'job_title', 'salary', 'salary_currency', 'salary_in_usd', 'remote_ratio', 'experience_level_EN', 'experience_level_EX', 'experience_level_MI', 'experience_level_SE', 'employment_type_FL', 'employment_type_FT', 'employment_type_PT', 'employee_residence_AE', 'employee_residence_AR', 'employee_residence_AS', 'employee_residence_AT', 'employee_residence_AU', 'employee_residence_BE', 'employee_residence_BR', 'employee_residence_CA', 'employee_residence_CF', 'employee_residence_CH', 'employee_residence_CO', 'employee_residence_DE', 'employee_residence_DO', 'employee_residence_ES', 'employee_residence_FR', 'employee_residence_GB', 'employee_residence_GH', 'employee_residence_GR', 'employee_residence_HK', 'employee_residence_HR', 'employee_residence_IE', 'employee_residence_IN', 'employee_residence_IT', 'employee_residence_JE', 'employee_residence_JP', 'employee_residence_LV', 'employee_residence_MT', 'employee_residence_MX', 'employee_residence_NG', 'employee_residence_NL', 'employee_residence_PK', 'employee_residence_PR', 'employee_residence_PT', 'employee_residence_RU', 'employee_residence_TH', 'employee_residence_TR', 'employee_residence_UA', 'employee_residence_US', 'employee_residence_UZ', 'company_location_AE', 'company_location_AS', 'company_location_AT', 'company_location_AU', 'company_location_BE', 'company_location_BS', 'company_location_CA', 'company_location_CF', 'company_location_CH', 'company_location_CN', 'company_location_CO', 'company_location_DE', 'company_location_ES', 'company_location_FI', 'company_location_FR', 'company_location_GB', 'company_location_GH', 'company_location_GR', 'company_location_HK', 'company_location_HR', 'company_location_IE', 'company_location_IN', 'company_location_JP', 'company_location_LU', 'company_location_LV', 'company_location_MT', 'company_location_MX', 'company_location_NG', 'company_location_NL', 'company_location_PK', 'company_location_PR', 'company_location_PT', 'company_location_RU', 'company_location_SG', 'company_location_TH', 'company_location_TR', 'company_location_UA', 'company_location_US', 'company_size_L', 'company_size_M', 'company_size_S']\n", + "Столбцы test_data_encoded: ['work_year', 'job_title', 'salary', 'salary_currency', 'salary_in_usd', 'remote_ratio', 'experience_level_EN', 'experience_level_EX', 'experience_level_MI', 'experience_level_SE', 'employment_type_FL', 'employment_type_FT', 'employment_type_PT', 'employee_residence_AE', 'employee_residence_AR', 'employee_residence_AS', 'employee_residence_AT', 'employee_residence_AU', 'employee_residence_BE', 'employee_residence_BR', 'employee_residence_CA', 'employee_residence_CF', 'employee_residence_CH', 'employee_residence_CO', 'employee_residence_DE', 'employee_residence_DO', 'employee_residence_ES', 'employee_residence_FR', 'employee_residence_GB', 'employee_residence_GH', 'employee_residence_GR', 'employee_residence_HK', 'employee_residence_HR', 'employee_residence_IE', 'employee_residence_IN', 'employee_residence_IT', 'employee_residence_JE', 'employee_residence_JP', 'employee_residence_LV', 'employee_residence_MT', 'employee_residence_MX', 'employee_residence_NG', 'employee_residence_NL', 'employee_residence_PK', 'employee_residence_PR', 'employee_residence_PT', 'employee_residence_RU', 'employee_residence_TH', 'employee_residence_TR', 'employee_residence_UA', 'employee_residence_US', 'employee_residence_UZ', 'company_location_AE', 'company_location_AS', 'company_location_AT', 'company_location_AU', 'company_location_BE', 'company_location_BS', 'company_location_CA', 'company_location_CF', 'company_location_CH', 'company_location_CN', 'company_location_CO', 'company_location_DE', 'company_location_ES', 'company_location_FI', 'company_location_FR', 'company_location_GB', 'company_location_GH', 'company_location_GR', 'company_location_HK', 'company_location_HR', 'company_location_IE', 'company_location_IN', 'company_location_JP', 'company_location_LU', 'company_location_LV', 'company_location_MT', 'company_location_MX', 'company_location_NG', 'company_location_NL', 'company_location_PK', 'company_location_PR', 'company_location_PT', 'company_location_RU', 'company_location_SG', 'company_location_TH', 'company_location_TR', 'company_location_UA', 'company_location_US', 'company_size_L', 'company_size_M', 'company_size_S']\n", + "Столбцы train_data_encoded после дискретизации: ['work_year', 'job_title', 'salary', 'salary_currency', 'salary_in_usd', 'remote_ratio', 'salary_category', 'experience_level_EN', 'experience_level_EX', 'experience_level_MI', 'experience_level_SE', 'employment_type_CT', 'employment_type_FL', 'employment_type_FT', 'employment_type_PT', 'employee_residence_AE', 'employee_residence_AM', 'employee_residence_AR', 'employee_residence_AT', 'employee_residence_AU', 'employee_residence_BA', 'employee_residence_BE', 'employee_residence_BG', 'employee_residence_BO', 'employee_residence_BR', 'employee_residence_CA', 'employee_residence_CF', 'employee_residence_CH', 'employee_residence_CL', 'employee_residence_CN', 'employee_residence_CO', 'employee_residence_CR', 'employee_residence_CY', 'employee_residence_CZ', 'employee_residence_DE', 'employee_residence_DK', 'employee_residence_DZ', 'employee_residence_EE', 'employee_residence_EG', 'employee_residence_ES', 'employee_residence_FI', 'employee_residence_FR', 'employee_residence_GB', 'employee_residence_GH', 'employee_residence_GR', 'employee_residence_HK', 'employee_residence_HN', 'employee_residence_HR', 'employee_residence_HU', 'employee_residence_ID', 'employee_residence_IE', 'employee_residence_IL', 'employee_residence_IN', 'employee_residence_IQ', 'employee_residence_IR', 'employee_residence_IT', 'employee_residence_JP', 'employee_residence_KE', 'employee_residence_KW', 'employee_residence_LT', 'employee_residence_LU', 'employee_residence_LV', 'employee_residence_MA', 'employee_residence_MD', 'employee_residence_MK', 'employee_residence_MX', 'employee_residence_MY', 'employee_residence_NG', 'employee_residence_NL', 'employee_residence_NZ', 'employee_residence_PH', 'employee_residence_PK', 'employee_residence_PL', 'employee_residence_PR', 'employee_residence_PT', 'employee_residence_RO', 'employee_residence_RS', 'employee_residence_RU', 'employee_residence_SE', 'employee_residence_SG', 'employee_residence_SI', 'employee_residence_SK', 'employee_residence_TH', 'employee_residence_TN', 'employee_residence_TR', 'employee_residence_UA', 'employee_residence_US', 'employee_residence_UZ', 'employee_residence_VN', 'company_location_AE', 'company_location_AL', 'company_location_AM', 'company_location_AR', 'company_location_AS', 'company_location_AT', 'company_location_AU', 'company_location_BA', 'company_location_BE', 'company_location_BO', 'company_location_BR', 'company_location_CA', 'company_location_CF', 'company_location_CH', 'company_location_CL', 'company_location_CO', 'company_location_CR', 'company_location_CZ', 'company_location_DE', 'company_location_DK', 'company_location_DZ', 'company_location_EE', 'company_location_EG', 'company_location_ES', 'company_location_FI', 'company_location_FR', 'company_location_GB', 'company_location_GH', 'company_location_GR', 'company_location_HN', 'company_location_HR', 'company_location_HU', 'company_location_ID', 'company_location_IE', 'company_location_IL', 'company_location_IN', 'company_location_IQ', 'company_location_IR', 'company_location_IT', 'company_location_JP', 'company_location_KE', 'company_location_LT', 'company_location_LU', 'company_location_LV', 'company_location_MA', 'company_location_MD', 'company_location_MK', 'company_location_MX', 'company_location_MY', 'company_location_NG', 'company_location_NL', 'company_location_NZ', 'company_location_PH', 'company_location_PK', 'company_location_PL', 'company_location_PR', 'company_location_PT', 'company_location_RO', 'company_location_RU', 'company_location_SE', 'company_location_SG', 'company_location_SI', 'company_location_SK', 'company_location_TH', 'company_location_TR', 'company_location_UA', 'company_location_US', 'company_location_VN', 'company_size_L', 'company_size_M', 'company_size_S']\n", + "Столбцы val_data_encoded после дискретизации: ['work_year', 'job_title', 'salary', 'salary_currency', 'salary_in_usd', 'remote_ratio', 'experience_level_EN', 'experience_level_EX', 'experience_level_MI', 'experience_level_SE', 'employment_type_FL', 'employment_type_FT', 'employment_type_PT', 'employee_residence_AE', 'employee_residence_AR', 'employee_residence_AS', 'employee_residence_AT', 'employee_residence_AU', 'employee_residence_BE', 'employee_residence_BR', 'employee_residence_CA', 'employee_residence_CF', 'employee_residence_CH', 'employee_residence_CO', 'employee_residence_DE', 'employee_residence_DO', 'employee_residence_ES', 'employee_residence_FR', 'employee_residence_GB', 'employee_residence_GH', 'employee_residence_GR', 'employee_residence_HK', 'employee_residence_HR', 'employee_residence_IE', 'employee_residence_IN', 'employee_residence_IT', 'employee_residence_JE', 'employee_residence_JP', 'employee_residence_LV', 'employee_residence_MT', 'employee_residence_MX', 'employee_residence_NG', 'employee_residence_NL', 'employee_residence_PK', 'employee_residence_PR', 'employee_residence_PT', 'employee_residence_RU', 'employee_residence_TH', 'employee_residence_TR', 'employee_residence_UA', 'employee_residence_US', 'employee_residence_UZ', 'company_location_AE', 'company_location_AS', 'company_location_AT', 'company_location_AU', 'company_location_BE', 'company_location_BS', 'company_location_CA', 'company_location_CF', 'company_location_CH', 'company_location_CN', 'company_location_CO', 'company_location_DE', 'company_location_ES', 'company_location_FI', 'company_location_FR', 'company_location_GB', 'company_location_GH', 'company_location_GR', 'company_location_HK', 'company_location_HR', 'company_location_IE', 'company_location_IN', 'company_location_JP', 'company_location_LU', 'company_location_LV', 'company_location_MT', 'company_location_MX', 'company_location_NG', 'company_location_NL', 'company_location_PK', 'company_location_PR', 'company_location_PT', 'company_location_RU', 'company_location_SG', 'company_location_TH', 'company_location_TR', 'company_location_UA', 'company_location_US', 'company_size_L', 'company_size_M', 'company_size_S', 'salary_category']\n", + "Столбцы test_data_encoded после дискретизации: ['work_year', 'job_title', 'salary', 'salary_currency', 'salary_in_usd', 'remote_ratio', 'experience_level_EN', 'experience_level_EX', 'experience_level_MI', 'experience_level_SE', 'employment_type_FL', 'employment_type_FT', 'employment_type_PT', 'employee_residence_AE', 'employee_residence_AR', 'employee_residence_AS', 'employee_residence_AT', 'employee_residence_AU', 'employee_residence_BE', 'employee_residence_BR', 'employee_residence_CA', 'employee_residence_CF', 'employee_residence_CH', 'employee_residence_CO', 'employee_residence_DE', 'employee_residence_DO', 'employee_residence_ES', 'employee_residence_FR', 'employee_residence_GB', 'employee_residence_GH', 'employee_residence_GR', 'employee_residence_HK', 'employee_residence_HR', 'employee_residence_IE', 'employee_residence_IN', 'employee_residence_IT', 'employee_residence_JE', 'employee_residence_JP', 'employee_residence_LV', 'employee_residence_MT', 'employee_residence_MX', 'employee_residence_NG', 'employee_residence_NL', 'employee_residence_PK', 'employee_residence_PR', 'employee_residence_PT', 'employee_residence_RU', 'employee_residence_TH', 'employee_residence_TR', 'employee_residence_UA', 'employee_residence_US', 'employee_residence_UZ', 'company_location_AE', 'company_location_AS', 'company_location_AT', 'company_location_AU', 'company_location_BE', 'company_location_BS', 'company_location_CA', 'company_location_CF', 'company_location_CH', 'company_location_CN', 'company_location_CO', 'company_location_DE', 'company_location_ES', 'company_location_FI', 'company_location_FR', 'company_location_GB', 'company_location_GH', 'company_location_GR', 'company_location_HK', 'company_location_HR', 'company_location_IE', 'company_location_IN', 'company_location_JP', 'company_location_LU', 'company_location_LV', 'company_location_MT', 'company_location_MX', 'company_location_NG', 'company_location_NL', 'company_location_PK', 'company_location_PR', 'company_location_PT', 'company_location_RU', 'company_location_SG', 'company_location_TH', 'company_location_TR', 'company_location_UA', 'company_location_US', 'company_size_L', 'company_size_M', 'company_size_S', 'salary_category']\n" ] } ], @@ -890,7 +715,7 @@ "# Унитарное кодирование категориальных признаков (применение one-hot encoding)\n", "\n", "# Пример категориальных признаков\n", - "categorical_features = ['date', 'waterfront', 'view', 'condition']\n", + "categorical_features = ['experience_level', 'employment_type', 'employee_residence', 'company_location', 'company_size']\n", "\n", "# Применение one-hot encoding\n", "train_data_encoded = pd.get_dummies(train_data, columns=categorical_features)\n", @@ -903,14 +728,15 @@ "print(\"Столбцы test_data_encoded:\", test_data_encoded.columns.tolist())\n", "\n", "\n", - "# Дискретизация числовых признаков (цены). Например, можно разделить площадь жилья на категории\n", - "# Пример дискретизации признака 'Общая площадь'\n", - "train_data_encoded['sqtf'] = pd.cut(train_data_encoded['sqft_living'], bins=5, labels=False)\n", - "val_data_encoded['sqtf'] = pd.cut(val_data_encoded['sqft_living'], bins=5, labels=False)\n", - "test_data_encoded['sqtf'] = pd.cut(test_data_encoded['sqft_living'], bins=5, labels=False)\n", + "# Пример дискретизации признака 'salary_in_usd' на 5 категорий\n", + "train_data_encoded['salary_category'] = pd.cut(train_data_encoded['salary_in_usd'], bins=5, labels=False)\n", + "val_data_encoded['salary_category'] = pd.cut(val_data_encoded['salary_in_usd'], bins=5, labels=False)\n", + "test_data_encoded['salary_category'] = pd.cut(test_data_encoded['salary_in_usd'], bins=5, labels=False)\n", + "df_encoded['salary_category'] = pd.cut(df_encoded['salary_in_usd'], bins=5, labels=False)\n", "\n", - "# Пример дискретизации признака 'sqft_living' на 5 категорий\n", - "df_encoded['sqtf'] = pd.cut(df_encoded['sqft_living'], bins=5, labels=False)" + "print(\"Столбцы train_data_encoded после дискретизации:\", train_data_encoded.columns.tolist())\n", + "print(\"Столбцы val_data_encoded после дискретизации:\", val_data_encoded.columns.tolist())\n", + "print(\"Столбцы test_data_encoded после дискретизации:\", test_data_encoded.columns.tolist())\n" ] }, { @@ -918,22 +744,48 @@ "metadata": {}, "source": [ "### Ручной синтез\n", - "Создание новых признаков на основе экспертных знаний и логики предметной области. К примеру, для данных о продаже домов можно создать признак цена за квадратный фут." + "Создание новых признаков на основе экспертных знаний и логики предметной области. К примеру, для данных о заработной плате в Data Science можно создать признак \"зарплата в месяц\"." ] }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 34, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " work_year experience_level employment_type job_title \\\n", + "0 2023 SE FT Principal Data Scientist \n", + "1 2023 MI CT ML Engineer \n", + "2 2023 MI CT ML Engineer \n", + "3 2023 SE FT Data Scientist \n", + "4 2023 SE FT Data Scientist \n", + "\n", + " salary salary_currency salary_in_usd employee_residence remote_ratio \\\n", + "0 80000 EUR 85847 ES 100 \n", + "1 30000 USD 30000 US 100 \n", + "2 25500 USD 25500 US 100 \n", + "3 175000 USD 175000 CA 100 \n", + "4 120000 USD 120000 CA 100 \n", + "\n", + " company_location company_size Salary in month \n", + "0 ES L 6666 \n", + "1 US S 2500 \n", + "2 US S 2125 \n", + "3 CA M 14583 \n", + "4 CA M 10000 \n" + ] + } + ], "source": [ - "# Ручной синтез признаков\n", - "train_data_encoded['price_per_sqft'] = df['price'] / df['sqft_living']\n", - "val_data_encoded['price_per_sqft'] = df['price'] / df['sqft_living']\n", - "test_data_encoded['price_per_sqft'] = df['price'] / df['sqft_living']\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", + "# Создание нового признака 'Salary in month'\n", + "df['Salary in month'] = df['salary'] // 12\n", "\n", - "# Пример создания нового признака - цена за квадратный фут\n", - "df_encoded['price_per_sqft'] = df_encoded['price'] / df_encoded['sqft_living']" + "# Вывод первых нескольких строк датафрейма для проверки\n", + "print(df.head())" ] }, { @@ -945,19 +797,19 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import StandardScaler, MinMaxScaler\n", "\n", "# Пример масштабирования числовых признаков\n", - "numerical_features = ['bedrooms', 'sqft_living']\n", + "numerical_features = ['work_year', 'salary']\n", "\n", "scaler = StandardScaler()\n", "train_data_encoded[numerical_features] = scaler.fit_transform(train_data_encoded[numerical_features])\n", "val_data_encoded[numerical_features] = scaler.transform(val_data_encoded[numerical_features])\n", - "test_data_encoded[numerical_features] = scaler.transform(test_data_encoded[numerical_features])" + "test_data_encoded[numerical_features] = scaler.transform(test_data_encoded[numerical_features])\n" ] }, { @@ -969,166 +821,163 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 43, "metadata": {}, "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " id price bedrooms bathrooms sqft_living sqft_lot \\\n", - "9876 1219000473 164950.0 -0.395263 1.75 -0.555396 15330 \n", - "14982 6308000010 585000.0 -0.395263 2.50 0.238192 5089 \n", - "1464 3630120700 757000.0 -0.395263 3.25 1.230177 5283 \n", - "19209 1901600090 359000.0 1.752138 1.75 -0.147580 6654 \n", - "2039 3395040550 320000.0 -0.395263 2.50 -0.599484 2890 \n", - "... ... ... ... ... ... ... \n", - "13184 1523049207 220000.0 0.678437 2.00 -0.412109 8043 \n", - "5759 1954420170 580000.0 -0.395263 2.50 0.083883 7484 \n", - "8433 1721801010 225000.0 -0.395263 1.00 -0.312911 6120 \n", - "10253 2422049104 85000.0 -1.468964 1.00 -1.371028 9000 \n", - "11363 7701960990 870000.0 0.678437 2.50 1.230177 14565 \n", - "\n", - " floors grade sqft_above sqft_basement ... view_2 view_3 view_4 \\\n", - "9876 1.0 7 1080 490 ... False False False \n", - "14982 2.0 9 2290 0 ... False False False \n", - "1464 2.0 9 3190 0 ... False False False \n", - "19209 1.5 7 1940 0 ... False False False \n", - "2039 2.0 7 1530 0 ... False False False \n", - "... ... ... ... ... ... ... ... ... \n", - "13184 1.0 7 850 850 ... False False False \n", - "5759 2.0 8 2150 0 ... False False False \n", - "8433 1.0 6 1790 0 ... False False False \n", - "10253 1.0 6 830 0 ... False False False \n", - "11363 2.0 11 3190 0 ... False False False \n", - "\n", - " condition_1 condition_2 condition_3 condition_4 condition_5 sqtf \\\n", - "9876 False False True False False 0 \n", - "14982 False False True False False 0 \n", - "1464 False False True False False 1 \n", - "19209 False False False True False 0 \n", - "2039 False False True False False 0 \n", - "... ... ... ... ... ... ... \n", - "13184 False False True False False 0 \n", - "5759 False False True False False 0 \n", - "8433 False False True False False 0 \n", - "10253 False False True False False 0 \n", - "11363 False False True False False 1 \n", - "\n", - " price_per_sqft \n", - "9876 105.063694 \n", - "14982 255.458515 \n", - "1464 237.304075 \n", - "19209 185.051546 \n", - "2039 209.150327 \n", - "... ... \n", - "13184 129.411765 \n", - "5759 269.767442 \n", - "8433 125.698324 \n", - "10253 102.409639 \n", - "11363 272.727273 \n", - "\n", - "[224 rows x 400 columns]\n" - ] - }, { "name": "stderr", "output_type": "stream", "text": [ - "e:\\MII\\laboratory\\mai\\Lib\\site-packages\\featuretools\\synthesis\\deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created\n", - " warnings.warn(\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " price bedrooms bathrooms sqft_living sqft_lot floors \\\n", - "id \n", - "7129300520 221900.0 3 1.00 1180 5650 1.0 \n", - "6414100192 538000.0 3 2.25 2570 7242 2.0 \n", - "5631500400 180000.0 2 1.00 770 10000 1.0 \n", - "2487200875 604000.0 4 3.00 1960 5000 1.0 \n", - "1954400510 510000.0 3 2.00 1680 8080 1.0 \n", - "\n", - " grade sqft_above sqft_basement yr_built ... view_2 view_3 \\\n", - "id ... \n", - "7129300520 7 1180 0 1955 ... False False \n", - "6414100192 7 2170 400 1951 ... False False \n", - "5631500400 6 770 0 1933 ... False False \n", - "2487200875 7 1050 910 1965 ... False False \n", - "1954400510 8 1680 0 1987 ... False False \n", - "\n", - " view_4 condition_1 condition_2 condition_3 condition_4 \\\n", - "id \n", - "7129300520 False False False True False \n", - "6414100192 False False False True False \n", - "5631500400 False False False True False \n", - "2487200875 False False False False False \n", - "1954400510 False False False True False \n", - "\n", - " condition_5 sqtf price_per_sqft \n", - "id \n", - "7129300520 False 0 188.050847 \n", - "6414100192 False 0 209.338521 \n", - "5631500400 False 0 233.766234 \n", - "2487200875 True 0 308.163265 \n", - "1954400510 False 0 303.571429 \n", - "\n", - "[5 rows x 402 columns]\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "e:\\MII\\laboratory\\mai\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", " pd.to_datetime(\n", - "e:\\MII\\laboratory\\mai\\Lib\\site-packages\\featuretools\\synthesis\\deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created\n", - " warnings.warn(\n" + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " work_year experience_level employment_type job_title \\\n", + "id \n", + "1 2023 SE FT Principal Data Scientist \n", + "2 2023 MI CT ML Engineer \n", + "3 2023 MI CT ML Engineer \n", + "4 2023 SE FT Data Scientist \n", + "5 2023 SE FT Data Scientist \n", + "\n", + " salary salary_currency salary_in_usd employee_residence remote_ratio \\\n", + "id \n", + "1 80000 EUR 85847 ES 100 \n", + "2 30000 USD 30000 US 100 \n", + "3 25500 USD 25500 US 100 \n", + "4 175000 USD 175000 CA 100 \n", + "5 120000 USD 120000 CA 100 \n", + "\n", + " company_location company_size \n", + "id \n", + "1 ES L \n", + "2 US S \n", + "3 US S \n", + "4 CA M \n", + "5 CA M \n", + " work_year experience_level employment_type job_title salary \\\n", + "id \n", + "2385 2022 SE FT Data Engineer 175000 \n", + "941 2023 SE FT Analytics Engineer 150000 \n", + "1617 2023 MI FT Data Analyst 65000 \n", + "1443 2023 MI FT Data Analyst 61200 \n", + "416 2023 SE FT Data Scientist 175000 \n", + "\n", + " salary_currency salary_in_usd employee_residence remote_ratio \\\n", + "id \n", + "2385 USD 175000 US 100 \n", + "941 USD 150000 US 0 \n", + "1617 GBP 78990 GB 0 \n", + "1443 USD 61200 US 0 \n", + "416 USD 175000 US 100 \n", + "\n", + " company_location company_size \n", + "id \n", + "2385 US M \n", + "941 US M \n", + "1617 GB M \n", + "1443 US M \n", + "416 US M \n", + " work_year experience_level employment_type job_title salary \\\n", + "id \n", + "2321 2022 SE FT Analytics Engineer 116250 \n", + "473 2023 EX FT Data Engineer 286000 \n", + "2269 2022 EN FT Data Engineer 135000 \n", + "430 2023 SE FT Data Analyst 208450 \n", + "3574 2020 MI FT Data Engineer 88000 \n", + "\n", + " salary_currency salary_in_usd employee_residence remote_ratio \\\n", + "id \n", + "2321 USD 116250 US 100 \n", + "473 USD 286000 US 100 \n", + "2269 USD 135000 US 0 \n", + "430 USD 208450 US 100 \n", + "3574 GBP 112872 GB 50 \n", + "\n", + " company_location company_size \n", + "id \n", + "2321 US M \n", + "473 US M \n", + "2269 US M \n", + "430 US M \n", + "3574 GB L \n" ] } ], "source": [ + "import pandas as pd\n", "import featuretools as ft\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# Загрузка данных\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", + "\n", + "# Создание уникального идентификатора для каждой строки\n", + "df['id'] = range(1, len(df) + 1)\n", "\n", "# Предобработка данных (например, кодирование категориальных признаков, удаление дубликатов)\n", - "# Удаление дубликатов по идентификатору\n", - "df = df.drop_duplicates(subset='id')\n", - "duplicates = train_data_encoded[train_data_encoded['id'].duplicated(keep=False)]\n", - "\n", - "# Удаление дубликатов из столбца \"id\", сохранив первое вхождение\n", - "df_encoded = df_encoded.drop_duplicates(subset='id', keep='first')\n", - "\n", - "print(duplicates)\n", - "\n", + "# Удаление дубликатов по всем столбцам\n", + "df = df.drop_duplicates()\n", "\n", "# Создание EntitySet\n", - "es = ft.EntitySet(id='house_data')\n", + "es = ft.EntitySet(id='data_science_jobs')\n", "\n", - "# Добавление датафрейма с домами\n", - "es = es.add_dataframe(dataframe_name='houses', dataframe=df_encoded, index='id')\n", + "# Добавление датафрейма с данными о рабочих местах\n", + "es = es.add_dataframe(\n", + " dataframe_name='jobs',\n", + " dataframe=df,\n", + " index='id'\n", + ")\n", "\n", "# Генерация признаков с помощью глубокой синтезы признаков\n", - "feature_matrix, feature_defs = ft.dfs(entityset=es, target_dataframe_name='houses', max_depth=2)\n", + "feature_matrix, feature_defs = ft.dfs(entityset=es, target_dataframe_name='jobs', max_depth=1)\n", "\n", "# Выводим первые 5 строк сгенерированного набора признаков\n", "print(feature_matrix.head())\n", "\n", - "train_data_encoded = train_data_encoded.drop_duplicates(subset='id')\n", - "train_data_encoded = train_data_encoded.drop_duplicates(subset='id', keep='first') # or keep='last'\n", + "# Разделение данных на обучающую и тестовую выборки\n", + "train_data, test_data = train_test_split(df, test_size=0.3, random_state=42)\n", "\n", - "# Определение сущностей (Создание EntitySet)\n", - "es = ft.EntitySet(id='house_data')\n", - "\n", - "es = es.add_dataframe(dataframe_name='houses', dataframe=train_data_encoded, index='id')\n", - "\n", - "# Генерация признаков\n", - "feature_matrix, feature_defs = ft.dfs(entityset=es, target_dataframe_name='houses', max_depth=2)\n", + "# Разделение оставшейся части на валидационную и тестовую выборки\n", + "val_data, test_data = train_test_split(test_data, test_size=0.5, random_state=42)\n", "\n", "# Преобразование признаков для контрольной и тестовой выборок\n", - "val_feature_matrix = ft.calculate_feature_matrix(features=feature_defs, entityset=es, instance_ids=val_data_encoded.index)\n", - "test_feature_matrix = ft.calculate_feature_matrix(features=feature_defs, entityset=es, instance_ids=test_data_encoded.index)" + "val_feature_matrix = ft.calculate_feature_matrix(features=feature_defs, entityset=es, instance_ids=val_data['id'])\n", + "test_feature_matrix = ft.calculate_feature_matrix(features=feature_defs, entityset=es, instance_ids=test_data['id'])\n", + "\n", + "# Вывод первых 5 строк сгенерированных признаков для валидационной и тестовой выборок\n", + "print(val_feature_matrix.head())\n", + "print(test_feature_matrix.head())" ] }, { @@ -1152,34 +1001,176 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Время обучения модели: 5.18 секунд\n", - "Среднеквадратичная ошибка: 125198557176601739264.00\n" + "Время обучения модели: 1.81 секунд\n", + "Среднеквадратичная ошибка (RMSE): 49834.60\n", + "Средняя абсолютная ошибка (MAE): 37776.22\n", + "Коэффициент детерминации (R²): 0.37\n", + "Кросс-валидация RMSE: 51653687796568.14 (± 37705548691705.71)\n", + "Корреляционная матрица признаков:\n", + " work_year remote_ratio experience_level_EX \\\n", + "work_year 1.000000 -0.236430 0.003156 \n", + "remote_ratio -0.236430 1.000000 0.007190 \n", + "experience_level_EX 0.003156 0.007190 1.000000 \n", + "experience_level_MI -0.128381 -0.000650 -0.092433 \n", + "experience_level_SE 0.194923 -0.035201 -0.252152 \n", + "... ... ... ... \n", + "company_location_UA 0.005969 -0.005896 -0.005778 \n", + "company_location_US 0.267002 -0.077706 0.022562 \n", + "company_location_VN 0.014787 -0.015545 -0.002888 \n", + "company_size_M 0.421975 -0.154550 -0.003061 \n", + "company_size_S -0.257948 0.108512 0.012020 \n", + "\n", + " experience_level_MI experience_level_SE \\\n", + "work_year -0.128381 0.194923 \n", + "remote_ratio -0.000650 -0.035201 \n", + "experience_level_EX -0.092433 -0.252152 \n", + "experience_level_MI 1.000000 -0.744400 \n", + "experience_level_SE -0.744400 1.000000 \n", + "... ... ... \n", + "company_location_UA -0.017059 0.005553 \n", + "company_location_US -0.255712 0.324686 \n", + "company_location_VN -0.008526 -0.023258 \n", + "company_size_M -0.097174 0.236746 \n", + "company_size_S 0.060936 -0.163489 \n", + "\n", + " employment_type_FL employment_type_FT \\\n", + "work_year -0.050350 0.116310 \n", + "remote_ratio 0.025238 -0.068702 \n", + "experience_level_EX -0.009144 0.001938 \n", + "experience_level_MI 0.035964 -0.033295 \n", + "experience_level_SE -0.040667 0.113486 \n", + "... ... ... \n", + "company_location_UA 0.156722 -0.079394 \n", + "company_location_US -0.053906 0.082093 \n", + "company_location_VN -0.000843 0.001628 \n", + "company_size_M -0.047840 0.125424 \n", + "company_size_S 0.095761 -0.173783 \n", + "\n", + " employment_type_PT job_title_AI Developer \\\n", + "work_year -0.093825 0.027726 \n", + "remote_ratio 0.041919 -0.016126 \n", + "experience_level_EX -0.011933 -0.009591 \n", + "experience_level_MI -0.006230 -0.004301 \n", + "experience_level_SE -0.096100 -0.045802 \n", + "... ... ... \n", + "company_location_UA -0.002202 0.300345 \n", + "company_location_US -0.078434 -0.099216 \n", + "company_location_VN -0.001101 -0.000885 \n", + "company_size_M -0.100277 -0.043467 \n", + "company_size_S 0.108664 0.064994 \n", + "\n", + " job_title_AI Programmer ... company_location_SG \\\n", + "work_year 0.004219 ... -0.021620 \n", + "remote_ratio 0.001772 ... 0.016794 \n", + "experience_level_EX -0.004085 ... -0.007079 \n", + "experience_level_MI -0.012059 ... 0.044089 \n", + "experience_level_SE -0.032896 ... -0.042828 \n", + "... ... ... ... \n", + "company_location_UA -0.000754 ... -0.001306 \n", + "company_location_US -0.047600 ... -0.082490 \n", + "company_location_VN -0.000377 ... -0.000653 \n", + "company_size_M -0.021372 ... -0.055210 \n", + "company_size_S -0.004676 ... -0.008104 \n", + "\n", + " company_location_SI company_location_SK \\\n", + "work_year -0.017648 -0.008821 \n", + "remote_ratio 0.027712 0.018050 \n", + "experience_level_EX -0.005778 -0.002888 \n", + "experience_level_MI 0.042620 -0.008526 \n", + "experience_level_SE -0.029172 0.011453 \n", + "... ... ... \n", + "company_location_UA -0.001066 -0.000533 \n", + "company_location_US -0.067335 -0.033654 \n", + "company_location_VN -0.000533 -0.000266 \n", + "company_size_M -0.030233 -0.037352 \n", + "company_size_S -0.006615 0.080574 \n", + "\n", + " company_location_TH company_location_TR \\\n", + "work_year -0.001648 -0.051424 \n", + "remote_ratio 0.011871 -0.004714 \n", + "experience_level_EX -0.005003 -0.006461 \n", + "experience_level_MI 0.008196 0.052106 \n", + "experience_level_SE -0.020249 -0.036503 \n", + "... ... ... \n", + "company_location_UA -0.000923 -0.001192 \n", + "company_location_US -0.058306 -0.075293 \n", + "company_location_VN -0.000462 -0.000596 \n", + "company_size_M -0.039024 -0.003949 \n", + "company_size_S -0.005728 -0.007397 \n", + "\n", + " company_location_UA company_location_US \\\n", + "work_year 0.005969 0.267002 \n", + "remote_ratio -0.005896 -0.077706 \n", + "experience_level_EX -0.005778 0.022562 \n", + "experience_level_MI -0.017059 -0.255712 \n", + "experience_level_SE 0.005553 0.324686 \n", + "... ... ... \n", + "company_location_UA 1.000000 -0.067335 \n", + "company_location_US -0.067335 1.000000 \n", + "company_location_VN -0.000533 -0.033654 \n", + "company_size_M -0.030233 0.314961 \n", + "company_size_S 0.035342 -0.229439 \n", + "\n", + " company_location_VN company_size_M company_size_S \n", + "work_year 0.014787 0.421975 -0.257948 \n", + "remote_ratio -0.015545 -0.154550 0.108512 \n", + "experience_level_EX -0.002888 -0.003061 0.012020 \n", + "experience_level_MI -0.008526 -0.097174 0.060936 \n", + "experience_level_SE -0.023258 0.236746 -0.163489 \n", + "... ... ... ... \n", + "company_location_UA -0.000533 -0.030233 0.035342 \n", + "company_location_US -0.033654 0.314961 -0.229439 \n", + "company_location_VN 1.000000 -0.037352 -0.003306 \n", + "company_size_M -0.037352 1.000000 -0.463577 \n", + "company_size_S -0.003306 -0.463577 1.000000 \n", + "\n", + "[250 rows x 250 columns]\n", + "Коэффициенты модели:\n", + " Feature Coefficient\n", + "0 work_year 3996.696898\n", + "1 remote_ratio 5.199270\n", + "2 experience_level_EX 88740.552288\n", + "3 experience_level_MI 20170.854874\n", + "4 experience_level_SE 44093.474726\n", + ".. ... ...\n", + "245 company_location_UA -64984.628104\n", + "246 company_location_US 40574.678578\n", + "247 company_location_VN 24478.024917\n", + "248 company_size_M -2895.244061\n", + "249 company_size_S -23506.811439\n", + "\n", + "[250 rows x 2 columns]\n" ] } ], "source": [ "import time\n", - "from sklearn.model_selection import train_test_split\n", + "import pandas as pd\n", + "from sklearn.model_selection import train_test_split, cross_val_score\n", "from sklearn.linear_model import LinearRegression\n", - "from sklearn.metrics import mean_squared_error\n", + "from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score\n", "\n", - "# Разделение данных на обучающую и валидационную выборки. Удаляем целевую переменную\n", - "X = feature_matrix.drop('price', axis=1)\n", - "y = feature_matrix['price']\n", + "# Загрузка данных\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", "\n", - "# One-hot encoding для категориальных переменных (преобразование категориальных объектов в числовые)\n", + "# Разделение данных на признаки и целевую переменную\n", + "X = df.drop(['salary_in_usd', 'salary', 'salary_currency'], axis=1) # Удаляем целевую переменную и ненужные столбцы\n", + "y = df['salary_in_usd']\n", + "\n", + "# One-hot encoding для категориальных переменных\n", "X = pd.get_dummies(X, drop_first=True)\n", "\n", "# Проверяем, есть ли пропущенные значения, и заполняем их медианой или другим подходящим значением\n", "X.fillna(X.median(), inplace=True)\n", "\n", + "# Разделение данных на обучающую и валидационную выборки\n", "X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)\n", "\n", "# Обучение модели\n", @@ -1192,24 +1183,72 @@ "# Время обучения модели\n", "train_time = time.time() - start_time\n", "\n", - "# Предсказания и оценка модели и вычисляем среднеквадратичную ошибку\n", + "# Предсказания и оценка модели\n", "predictions = model.predict(X_val)\n", "mse = mean_squared_error(y_val, predictions)\n", + "mae = mean_absolute_error(y_val, predictions)\n", + "r2 = r2_score(y_val, predictions)\n", "\n", "print(f'Время обучения модели: {train_time:.2f} секунд')\n", - "print(f'Среднеквадратичная ошибка: {mse:.2f}')\n" + "print(f'Среднеквадратичная ошибка (RMSE): {mse**0.5:.2f}')\n", + "print(f'Средняя абсолютная ошибка (MAE): {mae:.2f}')\n", + "print(f'Коэффициент детерминации (R²): {r2:.2f}')\n", + "\n", + "# Кросс-валидация\n", + "cv_scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error')\n", + "cv_rmse_scores = (-cv_scores)**0.5\n", + "print(f'Кросс-валидация RMSE: {cv_rmse_scores.mean():.2f} (± {cv_rmse_scores.std():.2f})')\n", + "\n", + "# Анализ корреляции\n", + "correlation_matrix = X.corr()\n", + "print(\"Корреляционная матрица признаков:\")\n", + "print(correlation_matrix)\n", + "\n", + "# Цельность: Проверка логической связи между признаками и целевой переменной\n", + "# В данном случае, мы можем проанализировать коэффициенты модели\n", + "coefficients = pd.DataFrame({'Feature': X.columns, 'Coefficient': model.coef_})\n", + "print(\"Коэффициенты модели:\")\n", + "print(coefficients)" ] }, { "cell_type": "code", - "execution_count": 35, + "execution_count": null, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "e:\\MII\\laboratory\\mai\\Lib\\site-packages\\sklearn\\metrics\\_regression.py:492: FutureWarning: 'squared' is deprecated in version 1.4 and will be removed in 1.6. To calculate the root mean squared error, use the function'root_mean_squared_error'.\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\woodwork\\type_sys\\utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.\n", + " pd.to_datetime(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\metrics\\_regression.py:492: FutureWarning: 'squared' is deprecated in version 1.4 and will be removed in 1.6. To calculate the root mean squared error, use the function'root_mean_squared_error'.\n", " warnings.warn(\n" ] }, @@ -1217,16 +1256,15 @@ "name": "stdout", "output_type": "stream", "text": [ + "RMSE: 8277.602700993119\n", + "R²: 0.9826437806135544\n", + "MAE: 1270.2934354194408 \n", "\n", - "RMSE: 17870.38470608543\n", - "R²: 0.9973762630189477\n", - "MAE: 5924.569330616996 \n", + "Кросс-валидация RMSE: 13606.980806552549 \n", "\n", - "Кросс-валидация RMSE: 34577.766841359786 \n", - "\n", - "Train RMSE: 12930.759734777745\n", - "Train R²: 0.9987426148033223\n", - "Train MAE: 2495.3698282637165\n", + "Train RMSE: 4839.006207438376\n", + "Train R²: 0.9941174224388726\n", + "Train MAE: 664.4994041278297\n", "\n" ] }, @@ -1234,13 +1272,13 @@ "name": "stderr", "output_type": "stream", "text": [ - "e:\\MII\\laboratory\\mai\\Lib\\site-packages\\sklearn\\metrics\\_regression.py:492: FutureWarning: 'squared' is deprecated in version 1.4 and will be removed in 1.6. To calculate the root mean squared error, use the function'root_mean_squared_error'.\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\metrics\\_regression.py:492: FutureWarning: 'squared' is deprecated in version 1.4 and will be removed in 1.6. To calculate the root mean squared error, use the function'root_mean_squared_error'.\n", " warnings.warn(\n" ] }, { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "
" ] @@ -1250,31 +1288,47 @@ } ], "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", "from sklearn.ensemble import RandomForestRegressor\n", - "from sklearn.metrics import r2_score, mean_absolute_error\n", - "from sklearn.model_selection import cross_val_score\n", + "from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error\n", + "from sklearn.model_selection import train_test_split, cross_val_score\n", "\n", + "# Загрузка данных\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", + "\n", + "# Создание уникального идентификатора для каждой строки\n", + "df['id'] = range(1, len(df) + 1)\n", + "\n", + "# Предобработка данных (например, кодирование категориальных признаков, удаление дубликатов)\n", + "# Удаление дубликатов по всем столбцам\n", + "df = df.drop_duplicates()\n", + "\n", + "# Создание EntitySet\n", + "es = ft.EntitySet(id='data_science_jobs')\n", + "\n", + "# Добавление датафрейма с данными о рабочих местах\n", + "es = es.add_dataframe(\n", + " dataframe_name='jobs',\n", + " dataframe=df,\n", + " index='id'\n", + ")\n", + "\n", + "# Генерация признаков с помощью глубокой синтезы признаков\n", + "feature_matrix, feature_defs = ft.dfs(entityset=es, target_dataframe_name='jobs', max_depth=1)\n", "\n", "# Удаление строк с NaN\n", "feature_matrix = feature_matrix.dropna()\n", - "val_feature_matrix = val_feature_matrix.dropna()\n", - "test_feature_matrix = test_feature_matrix.dropna()\n", "\n", "# Разделение данных на обучающую и тестовую выборки\n", - "X_train = feature_matrix.drop('price', axis=1)\n", - "y_train = feature_matrix['price']\n", - "X_val = val_feature_matrix.drop('price', axis=1)\n", - "y_val = val_feature_matrix['price']\n", - "X_test = test_feature_matrix.drop('price', axis=1)\n", - "y_test = test_feature_matrix['price']\n", - "\n", - "X_test = X_test.reindex(columns=X_train.columns, fill_value=0) \n", + "X_train = feature_matrix.drop('salary_in_usd', axis=1)\n", + "y_train = feature_matrix['salary_in_usd']\n", "\n", "# Кодирования категориальных переменных с использованием одноразового кодирования\n", - "X = pd.get_dummies(X, drop_first=True)\n", + "X_train = pd.get_dummies(X_train, drop_first=True)\n", "\n", "# Разобьём тренировочный тест и примерку модели\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", + "X_train, X_test, y_train, y_test = train_test_split(X_train, y_train, test_size=0.2, random_state=42)\n", "\n", "# Выбор модели\n", "model = RandomForestRegressor(random_state=42)\n", @@ -1289,7 +1343,6 @@ "r2 = r2_score(y_test, y_pred)\n", "mae = mean_absolute_error(y_test, y_pred)\n", "\n", - "print()\n", "print(f\"RMSE: {rmse}\")\n", "print(f\"R²: {r2}\")\n", "print(f\"MAE: {mae} \\n\")\n", @@ -1319,9 +1372,9 @@ "plt.figure(figsize=(10, 6))\n", "plt.scatter(y_test, y_pred, alpha=0.5)\n", "plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2)\n", - "plt.xlabel('Фактическая цена')\n", - "plt.ylabel('Прогнозируемая цена')\n", - "plt.title('Фактическая цена по сравнению с прогнозируемой')\n", + "plt.xlabel('Фактическая зарплата (USD)')\n", + "plt.ylabel('Прогнозируемая зарплата (USD)')\n", + "plt.title('Фактическая зарплата по сравнению с прогнозируемой')\n", "plt.show()" ] }, @@ -1329,25 +1382,22 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Выводы и итог \n", + "# Выводы и итог\n", + "Модель случайного леса (RandomForestRegressor) показала удовлетворительные результаты при прогнозировании зарплат в области Data Science. Метрики качества и кросс-валидация позволяют предположить, что модель не сильно переобучена и может быть использована для практических целей.\n", "\n", - "**Модель случайного леса (RandomForestRegressor)** показала удовлетворительные результаты при прогнозировании цен на недвижимость. Метрики качества и кросс-валидация позволяют предположить, что модель не сильно переобучена и может быть использована для практических целей. \n", + "Точность предсказаний: Модель показывает довольно высокий R² (0.8029), что указывает на хорошее объяснение вариации зарплат. Однако, значения RMSE и MAE довольно высоки, что говорит о том, что модель не очень точно предсказывает зарплаты, особенно для высоких значений.\n", "\n", - "*Точность предсказаний:* Модель демонстрирует довольно высокий R² (0.9987), что указывает на большую часть вариации целевого признака (цены недвижимости). Однако, значения RMSE и MAE остаются высоки (12930 и 2495), что свидетельствует о том, что модель не всегда точно предсказывает значения, особенно для объектов с высокими или низкими ценами. \n", + "Переобучение: Разница между RMSE на обучающей и тестовой выборках не очень большая, что указывает на то, что переобучение не является критическим. Однако, стоит быть осторожным и продолжать мониторинг этого показателя.\n", "\n", - "*Переобучение:* Разница между RMSE на обучающей и тестовой выборках незначительна, что указывает на то, что модель не склонна к переобучению. Однако в будущем стоит следить за этой метрикой при добавлении новых признаков или усложнении модели, чтобы избежать излишней подгонки под тренировочные данные. Также стоит быть осторожным и продолжать мониторинг этого показателя. \n", + "Кросс-валидация: Значение RMSE после кросс-валидации немного выше, чем на тестовой выборке, что может указывать на некоторую нестабильность модели.\n", "\n", - "*Кросс-валидация:* При кросс-валидации наблюдается небольшое увеличение ошибки RMSE по сравнению с тестовой выборкой (рост на 2-3%). Это может указывать на небольшую нестабильность модели при использовании разных подвыборок данных. Для повышения устойчивости модели возможно стоит провести дальнейшую настройку гиперпараметров. \n", - "\n", - "*Рекомендации:* Следует уделить внимание дополнительной обработке категориальных признаков, улучшению метода feature engineering, а также возможной оптимизации модели (например, через подбор гиперпараметров) для повышения точности предсказаний на экстремальных значениях.\n", - "\n", - "Кажется на этом закончили :)" + "Рекомендации: Следует уделить внимание дополнительной обработке категориальных признаков, улучшению метода feature engineering, а также возможной оптимизации модели (например, через подбор гиперпараметров) для повышения точности предсказаний на экстремальных значениях. Также стоит рассмотреть возможность использования других моделей, таких как градиентный бустинг или нейронные сети, для сравнения результатов и выбора наиболее эффективной модели." ] } ], "metadata": { "kernelspec": { - "display_name": "mai", + "display_name": "aimenv", "language": "python", "name": "python3" }, @@ -1361,7 +1411,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.6" + "version": "3.12.5" } }, "nbformat": 4, diff --git a/lab_4/Lab4.ipynb b/lab_4/Lab4.ipynb new file mode 100644 index 0000000..0a08a96 --- /dev/null +++ b/lab_4/Lab4.ipynb @@ -0,0 +1,66 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Начало лабораторной работы**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "ename": "KeyboardInterrupt", + "evalue": "", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[2], line 4\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mmatplotlib\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mpyplot\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mplt\u001b[39;00m\n\u001b[0;32m 3\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mmatplotlib\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mticker\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mticker\u001b[39;00m\n\u001b[1;32m----> 4\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mseaborn\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01msns\u001b[39;00m\n\u001b[0;32m 6\u001b[0m \u001b[38;5;66;03m# Подключим датафрейм и выгрузим данные\u001b[39;00m\n\u001b[0;32m 7\u001b[0m df \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mread_csv(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m.//static//csv//kc_house_data.csv\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\seaborn\\__init__.py:5\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mutils\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;241m*\u001b[39m \u001b[38;5;66;03m# noqa: F401,F403\u001b[39;00m\n\u001b[0;32m 4\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mpalettes\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;241m*\u001b[39m \u001b[38;5;66;03m# noqa: F401,F403\u001b[39;00m\n\u001b[1;32m----> 5\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mrelational\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;241m*\u001b[39m \u001b[38;5;66;03m# noqa: F401,F403\u001b[39;00m\n\u001b[0;32m 6\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mregression\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;241m*\u001b[39m \u001b[38;5;66;03m# noqa: F401,F403\u001b[39;00m\n\u001b[0;32m 7\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mcategorical\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;241m*\u001b[39m \u001b[38;5;66;03m# noqa: F401,F403\u001b[39;00m\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\seaborn\\relational.py:21\u001b[0m\n\u001b[0;32m 13\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mutils\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[0;32m 14\u001b[0m adjust_legend_subtitles,\n\u001b[0;32m 15\u001b[0m _default_color,\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 18\u001b[0m _scatter_legend_artist,\n\u001b[0;32m 19\u001b[0m )\n\u001b[0;32m 20\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01m_compat\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m groupby_apply_include_groups\n\u001b[1;32m---> 21\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01m_statistics\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m EstimateAggregator, WeightedAggregator\n\u001b[0;32m 22\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01maxisgrid\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m FacetGrid, _facet_docs\n\u001b[0;32m 23\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01m_docstrings\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m DocstringComponents, _core_docs\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\seaborn\\_statistics.py:32\u001b[0m\n\u001b[0;32m 30\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mpandas\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mpd\u001b[39;00m\n\u001b[0;32m 31\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m---> 32\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mscipy\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mstats\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m gaussian_kde\n\u001b[0;32m 33\u001b[0m _no_scipy \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mFalse\u001b[39;00m\n\u001b[0;32m 34\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m:\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\scipy\\__init__.py:99\u001b[0m\n\u001b[0;32m 94\u001b[0m \u001b[38;5;66;03m# This is the first import of an extension module within SciPy. If there's\u001b[39;00m\n\u001b[0;32m 95\u001b[0m \u001b[38;5;66;03m# a general issue with the install, such that extension modules are missing\u001b[39;00m\n\u001b[0;32m 96\u001b[0m \u001b[38;5;66;03m# or cannot be imported, this is where we'll get a failure - so give an\u001b[39;00m\n\u001b[0;32m 97\u001b[0m \u001b[38;5;66;03m# informative error message.\u001b[39;00m\n\u001b[0;32m 98\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m---> 99\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mscipy\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01m_lib\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01m_ccallback\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m LowLevelCallable\n\u001b[0;32m 100\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[0;32m 101\u001b[0m msg \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mThe `scipy` install you are using seems to be broken, \u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m+\u001b[39m \\\n\u001b[0;32m 102\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m(extension modules cannot be imported), \u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m+\u001b[39m \\\n\u001b[0;32m 103\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mplease try reinstalling.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\scipy\\_lib\\_ccallback.py:1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m _ccallback_c\n\u001b[0;32m 3\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mctypes\u001b[39;00m\n\u001b[0;32m 5\u001b[0m PyCFuncPtr \u001b[38;5;241m=\u001b[39m ctypes\u001b[38;5;241m.\u001b[39mCFUNCTYPE(ctypes\u001b[38;5;241m.\u001b[39mc_void_p)\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__bases__\u001b[39m[\u001b[38;5;241m0\u001b[39m]\n", + "File \u001b[1;32m:645\u001b[0m, in \u001b[0;36mparent\u001b[1;34m(self)\u001b[0m\n", + "\u001b[1;31mKeyboardInterrupt\u001b[0m: " + ] + } + ], + "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "import matplotlib.ticker as ticker\n", + "import seaborn as sns\n", + "\n", + "# Подключим датафрейм и выгрузим данные\n", + "df = pd.read_csv(\".//static//csv//ds_salaries.csv\")\n", + "print(df.columns)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "aimenv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} -- 2.25.1 From e64d5b2980098ce7cddcc6d78b4ccc5e34d57c4b Mon Sep 17 00:00:00 2001 From: kaznacheeva Date: Sat, 23 Nov 2024 12:17:48 +0400 Subject: [PATCH 2/3] =?UTF-8?q?=D0=BB=D0=B0=D0=B1=D0=B0=204?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- lab_4/Lab4.ipynb | 1917 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 1896 insertions(+), 21 deletions(-) diff --git a/lab_4/Lab4.ipynb b/lab_4/Lab4.ipynb index 0a08a96..814ef70 100644 --- a/lab_4/Lab4.ipynb +++ b/lab_4/Lab4.ipynb @@ -9,37 +9,1912 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": {}, "outputs": [ { - "ename": "KeyboardInterrupt", - "evalue": "", - "output_type": "error", - "traceback": [ - "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[1;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", - "Cell \u001b[1;32mIn[2], line 4\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mmatplotlib\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mpyplot\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mplt\u001b[39;00m\n\u001b[0;32m 3\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mmatplotlib\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mticker\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mticker\u001b[39;00m\n\u001b[1;32m----> 4\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mseaborn\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01msns\u001b[39;00m\n\u001b[0;32m 6\u001b[0m \u001b[38;5;66;03m# Подключим датафрейм и выгрузим данные\u001b[39;00m\n\u001b[0;32m 7\u001b[0m df \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mread_csv(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m.//static//csv//kc_house_data.csv\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\seaborn\\__init__.py:5\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mutils\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;241m*\u001b[39m \u001b[38;5;66;03m# noqa: F401,F403\u001b[39;00m\n\u001b[0;32m 4\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mpalettes\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;241m*\u001b[39m \u001b[38;5;66;03m# noqa: F401,F403\u001b[39;00m\n\u001b[1;32m----> 5\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mrelational\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;241m*\u001b[39m \u001b[38;5;66;03m# noqa: F401,F403\u001b[39;00m\n\u001b[0;32m 6\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mregression\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;241m*\u001b[39m \u001b[38;5;66;03m# noqa: F401,F403\u001b[39;00m\n\u001b[0;32m 7\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mcategorical\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;241m*\u001b[39m \u001b[38;5;66;03m# noqa: F401,F403\u001b[39;00m\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\seaborn\\relational.py:21\u001b[0m\n\u001b[0;32m 13\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mutils\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[0;32m 14\u001b[0m adjust_legend_subtitles,\n\u001b[0;32m 15\u001b[0m _default_color,\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 18\u001b[0m _scatter_legend_artist,\n\u001b[0;32m 19\u001b[0m )\n\u001b[0;32m 20\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01m_compat\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m groupby_apply_include_groups\n\u001b[1;32m---> 21\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01m_statistics\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m EstimateAggregator, WeightedAggregator\n\u001b[0;32m 22\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01maxisgrid\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m FacetGrid, _facet_docs\n\u001b[0;32m 23\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01m_docstrings\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m DocstringComponents, _core_docs\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\seaborn\\_statistics.py:32\u001b[0m\n\u001b[0;32m 30\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mpandas\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mpd\u001b[39;00m\n\u001b[0;32m 31\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m---> 32\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mscipy\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mstats\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m gaussian_kde\n\u001b[0;32m 33\u001b[0m _no_scipy \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mFalse\u001b[39;00m\n\u001b[0;32m 34\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m:\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\scipy\\__init__.py:99\u001b[0m\n\u001b[0;32m 94\u001b[0m \u001b[38;5;66;03m# This is the first import of an extension module within SciPy. If there's\u001b[39;00m\n\u001b[0;32m 95\u001b[0m \u001b[38;5;66;03m# a general issue with the install, such that extension modules are missing\u001b[39;00m\n\u001b[0;32m 96\u001b[0m \u001b[38;5;66;03m# or cannot be imported, this is where we'll get a failure - so give an\u001b[39;00m\n\u001b[0;32m 97\u001b[0m \u001b[38;5;66;03m# informative error message.\u001b[39;00m\n\u001b[0;32m 98\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m---> 99\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mscipy\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01m_lib\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01m_ccallback\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m LowLevelCallable\n\u001b[0;32m 100\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[0;32m 101\u001b[0m msg \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mThe `scipy` install you are using seems to be broken, \u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m+\u001b[39m \\\n\u001b[0;32m 102\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m(extension modules cannot be imported), \u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m+\u001b[39m \\\n\u001b[0;32m 103\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mplease try reinstalling.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\scipy\\_lib\\_ccallback.py:1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m _ccallback_c\n\u001b[0;32m 3\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mctypes\u001b[39;00m\n\u001b[0;32m 5\u001b[0m PyCFuncPtr \u001b[38;5;241m=\u001b[39m ctypes\u001b[38;5;241m.\u001b[39mCFUNCTYPE(ctypes\u001b[38;5;241m.\u001b[39mc_void_p)\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__bases__\u001b[39m[\u001b[38;5;241m0\u001b[39m]\n", - "File \u001b[1;32m:645\u001b[0m, in \u001b[0;36mparent\u001b[1;34m(self)\u001b[0m\n", - "\u001b[1;31mKeyboardInterrupt\u001b[0m: " + "name": "stdout", + "output_type": "stream", + "text": [ + "Index(['work_year', 'experience_level', 'employment_type', 'job_title',\n", + " 'salary', 'salary_currency', 'salary_in_usd', 'employee_residence',\n", + " 'remote_ratio', 'company_location', 'company_size'],\n", + " dtype='object')\n" ] } ], "source": [ "import pandas as pd\n", - "import matplotlib.pyplot as plt\n", - "import matplotlib.ticker as ticker\n", - "import seaborn as sns\n", - "\n", - "# Подключим датафрейм и выгрузим данные\n", - "df = pd.read_csv(\".//static//csv//ds_salaries.csv\")\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", "print(df.columns)" ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
work_yearexperience_levelemployment_typejob_titlesalarysalary_currencysalary_in_usdemployee_residenceremote_ratiocompany_locationcompany_size
02023SEFTPrincipal Data Scientist80000EUR85847ES100ESL
12023MICTML Engineer30000USD30000US100USS
22023MICTML Engineer25500USD25500US100USS
32023SEFTData Scientist175000USD175000CA100CAM
42023SEFTData Scientist120000USD120000CA100CAM
\n", + "
" + ], + "text/plain": [ + " work_year experience_level employment_type job_title \\\n", + "0 2023 SE FT Principal Data Scientist \n", + "1 2023 MI CT ML Engineer \n", + "2 2023 MI CT ML Engineer \n", + "3 2023 SE FT Data Scientist \n", + "4 2023 SE FT Data Scientist \n", + "\n", + " salary salary_currency salary_in_usd employee_residence remote_ratio \\\n", + "0 80000 EUR 85847 ES 100 \n", + "1 30000 USD 30000 US 100 \n", + "2 25500 USD 25500 US 100 \n", + "3 175000 USD 175000 CA 100 \n", + "4 120000 USD 120000 CA 100 \n", + "\n", + " company_location company_size \n", + "0 ES L \n", + "1 US S \n", + "2 US S \n", + "3 CA M \n", + "4 CA M " + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
work_yearsalarysalary_in_usdremote_ratio
count3755.0000003.755000e+033755.0000003755.000000
mean2022.3736351.906956e+05137570.38988046.271638
std0.6914486.716765e+0563055.62527848.589050
min2020.0000006.000000e+035132.0000000.000000
25%2022.0000001.000000e+0595000.0000000.000000
50%2022.0000001.380000e+05135000.0000000.000000
75%2023.0000001.800000e+05175000.000000100.000000
max2023.0000003.040000e+07450000.000000100.000000
\n", + "
" + ], + "text/plain": [ + " work_year salary salary_in_usd remote_ratio\n", + "count 3755.000000 3.755000e+03 3755.000000 3755.000000\n", + "mean 2022.373635 1.906956e+05 137570.389880 46.271638\n", + "std 0.691448 6.716765e+05 63055.625278 48.589050\n", + "min 2020.000000 6.000000e+03 5132.000000 0.000000\n", + "25% 2022.000000 1.000000e+05 95000.000000 0.000000\n", + "50% 2022.000000 1.380000e+05 135000.000000 0.000000\n", + "75% 2023.000000 1.800000e+05 175000.000000 100.000000\n", + "max 2023.000000 3.040000e+07 450000.000000 100.000000" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.describe()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "work_year 0\n", + "experience_level 0\n", + "employment_type 0\n", + "job_title 0\n", + "salary 0\n", + "salary_currency 0\n", + "salary_in_usd 0\n", + "employee_residence 0\n", + "remote_ratio 0\n", + "company_location 0\n", + "company_size 0\n", + "dtype: int64\n", + "work_year False\n", + "experience_level False\n", + "employment_type False\n", + "job_title False\n", + "salary False\n", + "salary_currency False\n", + "salary_in_usd False\n", + "employee_residence False\n", + "remote_ratio False\n", + "company_location False\n", + "company_size False\n", + "dtype: bool\n" + ] + } + ], + "source": [ + "# Процент пропущенных значений признаков\n", + "for i in df.columns:\n", + " null_rate = df[i].isnull().sum() / len(df) * 100\n", + " if null_rate > 0:\n", + " print(f'{i} Процент пустых значений: %{null_rate:.2f}')\n", + "\n", + "print(df.isnull().sum())\n", + "\n", + "print(df.isnull().any())" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'X_train'" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
work_yearexperience_levelemployment_typejob_titlesalarysalary_currencysalary_in_usdemployee_residenceremote_ratiocompany_locationcompany_sizeabove_median_salarysalary_category
18092023SEFTData Engineer182000USD182000US100USM11
10822023SEFTMachine Learning Engineer126000USD126000US0USM01
16862023SEFTBI Developer140000USD140000US100USM11
16002023SEFTData Scientist140000USD140000US0USM11
13762023SEFTData Engineer226700USD226700US0USM12
..........................................
27062022SEFTData Engineer160000USD160000US100USM11
9282023MIFTData Engineer200000USD200000US0USM11
5642023MIFTData Engineer140000USD140000US0USM11
7162023SEFTData Scientist297300USD297300US100USM12
12992023SEFTData Engineer133832USD133832US0USM01
\n", + "

3004 rows × 13 columns

\n", + "
" + ], + "text/plain": [ + " work_year experience_level employment_type job_title \\\n", + "1809 2023 SE FT Data Engineer \n", + "1082 2023 SE FT Machine Learning Engineer \n", + "1686 2023 SE FT BI Developer \n", + "1600 2023 SE FT Data Scientist \n", + "1376 2023 SE FT Data Engineer \n", + "... ... ... ... ... \n", + "2706 2022 SE FT Data Engineer \n", + "928 2023 MI FT Data Engineer \n", + "564 2023 MI FT Data Engineer \n", + "716 2023 SE FT Data Scientist \n", + "1299 2023 SE FT Data Engineer \n", + "\n", + " salary salary_currency salary_in_usd employee_residence remote_ratio \\\n", + "1809 182000 USD 182000 US 100 \n", + "1082 126000 USD 126000 US 0 \n", + "1686 140000 USD 140000 US 100 \n", + "1600 140000 USD 140000 US 0 \n", + "1376 226700 USD 226700 US 0 \n", + "... ... ... ... ... ... \n", + "2706 160000 USD 160000 US 100 \n", + "928 200000 USD 200000 US 0 \n", + "564 140000 USD 140000 US 0 \n", + "716 297300 USD 297300 US 100 \n", + "1299 133832 USD 133832 US 0 \n", + "\n", + " company_location company_size above_median_salary salary_category \n", + "1809 US M 1 1 \n", + "1082 US M 0 1 \n", + "1686 US M 1 1 \n", + "1600 US M 1 1 \n", + "1376 US M 1 2 \n", + "... ... ... ... ... \n", + "2706 US M 1 1 \n", + "928 US M 1 1 \n", + "564 US M 1 1 \n", + "716 US M 1 2 \n", + "1299 US M 0 1 \n", + "\n", + "[3004 rows x 13 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'y_train'" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
above_median_salary
18091
10820
16861
16001
13761
......
27061
9281
5641
7161
12990
\n", + "

3004 rows × 1 columns

\n", + "
" + ], + "text/plain": [ + " above_median_salary\n", + "1809 1\n", + "1082 0\n", + "1686 1\n", + "1600 1\n", + "1376 1\n", + "... ...\n", + "2706 1\n", + "928 1\n", + "564 1\n", + "716 1\n", + "1299 0\n", + "\n", + "[3004 rows x 1 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'X_test'" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
work_yearexperience_levelemployment_typejob_titlesalarysalary_currencysalary_in_usdemployee_residenceremote_ratiocompany_locationcompany_sizeabove_median_salarysalary_category
34592022MIFTResearch Scientist59000EUR61989AT0ATL00
37242021ENFTBusiness Data Analyst50000EUR59102LU100LUL00
17952023SEFTData Engineer180000USD180000US0USM11
35352021MIFTData Scientist50000USD50000NG100NGL00
32552022MIFTData Analyst106260USD106260US0USM01
..........................................
19432022MIFTData Engineer120000USD120000US100USM01
5732023ENFTAutonomous Vehicle Technician7000USD7000GH0GHS00
30132022SEFTMachine Learning Engineer129300USD129300US0USM01
3272023ENFTData Scientist70000CAD51753CA100CAL00
15652023SEFTData Analyst48000EUR51508ES0ESM00
\n", + "

751 rows × 13 columns

\n", + "
" + ], + "text/plain": [ + " work_year experience_level employment_type \\\n", + "3459 2022 MI FT \n", + "3724 2021 EN FT \n", + "1795 2023 SE FT \n", + "3535 2021 MI FT \n", + "3255 2022 MI FT \n", + "... ... ... ... \n", + "1943 2022 MI FT \n", + "573 2023 EN FT \n", + "3013 2022 SE FT \n", + "327 2023 EN FT \n", + "1565 2023 SE FT \n", + "\n", + " job_title salary salary_currency salary_in_usd \\\n", + "3459 Research Scientist 59000 EUR 61989 \n", + "3724 Business Data Analyst 50000 EUR 59102 \n", + "1795 Data Engineer 180000 USD 180000 \n", + "3535 Data Scientist 50000 USD 50000 \n", + "3255 Data Analyst 106260 USD 106260 \n", + "... ... ... ... ... \n", + "1943 Data Engineer 120000 USD 120000 \n", + "573 Autonomous Vehicle Technician 7000 USD 7000 \n", + "3013 Machine Learning Engineer 129300 USD 129300 \n", + "327 Data Scientist 70000 CAD 51753 \n", + "1565 Data Analyst 48000 EUR 51508 \n", + "\n", + " employee_residence remote_ratio company_location company_size \\\n", + "3459 AT 0 AT L \n", + "3724 LU 100 LU L \n", + "1795 US 0 US M \n", + "3535 NG 100 NG L \n", + "3255 US 0 US M \n", + "... ... ... ... ... \n", + "1943 US 100 US M \n", + "573 GH 0 GH S \n", + "3013 US 0 US M \n", + "327 CA 100 CA L \n", + "1565 ES 0 ES M \n", + "\n", + " above_median_salary salary_category \n", + "3459 0 0 \n", + "3724 0 0 \n", + "1795 1 1 \n", + "3535 0 0 \n", + "3255 0 1 \n", + "... ... ... \n", + "1943 0 1 \n", + "573 0 0 \n", + "3013 0 1 \n", + "327 0 0 \n", + "1565 0 0 \n", + "\n", + "[751 rows x 13 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "'y_test'" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
above_median_salary
34590
37240
17951
35350
32550
......
19430
5730
30130
3270
15650
\n", + "

751 rows × 1 columns

\n", + "
" + ], + "text/plain": [ + " above_median_salary\n", + "3459 0\n", + "3724 0\n", + "1795 1\n", + "3535 0\n", + "3255 0\n", + "... ...\n", + "1943 0\n", + "573 0\n", + "3013 0\n", + "327 0\n", + "1565 0\n", + "\n", + "[751 rows x 1 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "work_year int64\n", + "experience_level object\n", + "employment_type object\n", + "job_title object\n", + "salary int64\n", + "salary_currency object\n", + "salary_in_usd int64\n", + "employee_residence object\n", + "remote_ratio int64\n", + "company_location object\n", + "company_size object\n", + "above_median_salary int64\n", + "salary_category category\n", + "dtype: object\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "from typing import Tuple\n", + "import pandas as pd\n", + "from pandas import DataFrame\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# Загрузка данных\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", + "\n", + "# Создание целевого признака\n", + "median_salary = df['salary_in_usd'].median()\n", + "df['above_median_salary'] = np.where(df['salary_in_usd'] > median_salary, 1, 0)\n", + "\n", + "# Разделение на признаки и целевую переменную\n", + "X = df.drop(columns=['salary_in_usd', 'above_median_salary'])\n", + "y = df['above_median_salary']\n", + "\n", + "# Примерная категоризация\n", + "df['salary_category'] = pd.cut(df['salary_in_usd'], bins=[0, 100000, 200000, np.inf], labels=[0, 1, 2])\n", + "\n", + "# Выбор признаков и целевых переменных\n", + "X = df.drop(columns=['salary_in_usd', 'salary_category'])\n", + "\n", + "def split_stratified_into_train_val_test(\n", + " df_input,\n", + " stratify_colname=\"y\",\n", + " frac_train=0.6,\n", + " frac_val=0.15,\n", + " frac_test=0.25,\n", + " random_state=None,\n", + ") -> Tuple[DataFrame, DataFrame, DataFrame, DataFrame, DataFrame, DataFrame]:\n", + " \n", + " if frac_train + frac_val + frac_test != 1.0:\n", + " raise ValueError(\n", + " \"fractions %f, %f, %f do not add up to 1.0\"\n", + " % (frac_train, frac_val, frac_test)\n", + " )\n", + " \n", + " if stratify_colname not in df_input.columns:\n", + " raise ValueError(\"%s is not a column in the dataframe\" % (stratify_colname))\n", + " X = df_input # Contains all columns.\n", + " y = df_input[\n", + " [stratify_colname]\n", + " ] # Dataframe of just the column on which to stratify.\n", + " \n", + " # Split original dataframe into train and temp dataframes.\n", + " df_train, df_temp, y_train, y_temp = train_test_split(\n", + " X, y, stratify=y, test_size=(1.0 - frac_train), random_state=random_state\n", + " )\n", + "\n", + " if frac_val <= 0:\n", + " assert len(df_input) == len(df_train) + len(df_temp)\n", + " return df_train, pd.DataFrame(), df_temp, y_train, pd.DataFrame(), y_temp\n", + " # Split the temp dataframe into val and test dataframes.\n", + " relative_frac_test = frac_test / (frac_val + frac_test)\n", + "\n", + " df_val, df_test, y_val, y_test = train_test_split(\n", + " df_temp,\n", + " y_temp,\n", + " stratify=y_temp,\n", + " test_size=relative_frac_test,\n", + " random_state=random_state,\n", + " )\n", + "\n", + " assert len(df_input) == len(df_train) + len(df_val) + len(df_test)\n", + " return df_train, df_val, df_test, y_train, y_val, y_test\n", + "\n", + "X_train, X_val, X_test, y_train, y_val, y_test = split_stratified_into_train_val_test(\n", + " df, stratify_colname=\"above_median_salary\", frac_train=0.80, frac_val=0, frac_test=0.20, random_state=42\n", + ")\n", + "\n", + "display(\"X_train\", X_train)\n", + "display(\"y_train\", y_train)\n", + "\n", + "display(\"X_test\", X_test)\n", + "display(\"y_test\", y_test)\n", + "\n", + "# Проверка преобразования\n", + "print(df.dtypes)\n", + "\n", + "# Визуализация распределения зарплат\n", + "plt.figure(figsize=(10, 6))\n", + "sns.histplot(df['salary_in_usd'], bins=50, kde=True)\n", + "plt.title('Распределение зарплат')\n", + "plt.xlabel('Зарплата (USD)')\n", + "plt.ylabel('Частота')\n", + "plt.show()\n", + "\n", + "# Визуализация зависимости между зарплатой и уровнем опыта\n", + "plt.figure(figsize=(10, 6))\n", + "sns.boxplot(x='experience_level', y='salary_in_usd', data=df)\n", + "plt.title('Зависимость зарплаты от уровня опыта')\n", + "plt.xlabel('Уровень опыта')\n", + "plt.ylabel('Зарплата (USD)')\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "ename": "IndexError", + "evalue": "Index dimension must be 1 or 2", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mIndexError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[14], line 71\u001b[0m\n\u001b[0;32m 62\u001b[0m pipeline_end \u001b[38;5;241m=\u001b[39m Pipeline(\n\u001b[0;32m 63\u001b[0m [\n\u001b[0;32m 64\u001b[0m (\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfeatures_preprocessing\u001b[39m\u001b[38;5;124m\"\u001b[39m, features_preprocessing),\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 67\u001b[0m ]\n\u001b[0;32m 68\u001b[0m )\n\u001b[0;32m 70\u001b[0m \u001b[38;5;66;03m# Демонстрация работы конвейера для предобработки данных при классификации\u001b[39;00m\n\u001b[1;32m---> 71\u001b[0m preprocessing_result \u001b[38;5;241m=\u001b[39m \u001b[43mpipeline_end\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfit_transform\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX_train\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 72\u001b[0m preprocessed_df \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mDataFrame(\n\u001b[0;32m 73\u001b[0m preprocessing_result,\n\u001b[0;32m 74\u001b[0m columns\u001b[38;5;241m=\u001b[39mpipeline_end\u001b[38;5;241m.\u001b[39mget_feature_names_out(),\n\u001b[0;32m 75\u001b[0m )\n\u001b[0;32m 77\u001b[0m preprocessed_df\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\base.py:1473\u001b[0m, in \u001b[0;36m_fit_context..decorator..wrapper\u001b[1;34m(estimator, *args, **kwargs)\u001b[0m\n\u001b[0;32m 1466\u001b[0m estimator\u001b[38;5;241m.\u001b[39m_validate_params()\n\u001b[0;32m 1468\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m config_context(\n\u001b[0;32m 1469\u001b[0m skip_parameter_validation\u001b[38;5;241m=\u001b[39m(\n\u001b[0;32m 1470\u001b[0m prefer_skip_nested_validation \u001b[38;5;129;01mor\u001b[39;00m global_skip_validation\n\u001b[0;32m 1471\u001b[0m )\n\u001b[0;32m 1472\u001b[0m ):\n\u001b[1;32m-> 1473\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfit_method\u001b[49m\u001b[43m(\u001b[49m\u001b[43mestimator\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\pipeline.py:533\u001b[0m, in \u001b[0;36mPipeline.fit_transform\u001b[1;34m(self, X, y, **params)\u001b[0m\n\u001b[0;32m 490\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Fit the model and transform with the final estimator.\u001b[39;00m\n\u001b[0;32m 491\u001b[0m \n\u001b[0;32m 492\u001b[0m \u001b[38;5;124;03mFit all the transformers one after the other and sequentially transform\u001b[39;00m\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 530\u001b[0m \u001b[38;5;124;03m Transformed samples.\u001b[39;00m\n\u001b[0;32m 531\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m 532\u001b[0m routed_params \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_check_method_params(method\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfit_transform\u001b[39m\u001b[38;5;124m\"\u001b[39m, props\u001b[38;5;241m=\u001b[39mparams)\n\u001b[1;32m--> 533\u001b[0m Xt \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_fit\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrouted_params\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 535\u001b[0m last_step \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_final_estimator\n\u001b[0;32m 536\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m _print_elapsed_time(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mPipeline\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_log_message(\u001b[38;5;28mlen\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39msteps) \u001b[38;5;241m-\u001b[39m \u001b[38;5;241m1\u001b[39m)):\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\pipeline.py:406\u001b[0m, in \u001b[0;36mPipeline._fit\u001b[1;34m(self, X, y, routed_params)\u001b[0m\n\u001b[0;32m 404\u001b[0m cloned_transformer \u001b[38;5;241m=\u001b[39m clone(transformer)\n\u001b[0;32m 405\u001b[0m \u001b[38;5;66;03m# Fit or load from cache the current transformer\u001b[39;00m\n\u001b[1;32m--> 406\u001b[0m X, fitted_transformer \u001b[38;5;241m=\u001b[39m \u001b[43mfit_transform_one_cached\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m 407\u001b[0m \u001b[43m \u001b[49m\u001b[43mcloned_transformer\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 408\u001b[0m \u001b[43m \u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 409\u001b[0m \u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 410\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[0;32m 411\u001b[0m \u001b[43m \u001b[49m\u001b[43mmessage_clsname\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mPipeline\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[0;32m 412\u001b[0m \u001b[43m \u001b[49m\u001b[43mmessage\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_log_message\u001b[49m\u001b[43m(\u001b[49m\u001b[43mstep_idx\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 413\u001b[0m \u001b[43m \u001b[49m\u001b[43mparams\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrouted_params\u001b[49m\u001b[43m[\u001b[49m\u001b[43mname\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 414\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 415\u001b[0m \u001b[38;5;66;03m# Replace the transformer of the step with the fitted\u001b[39;00m\n\u001b[0;32m 416\u001b[0m \u001b[38;5;66;03m# transformer. This is necessary when loading the transformer\u001b[39;00m\n\u001b[0;32m 417\u001b[0m \u001b[38;5;66;03m# from the cache.\u001b[39;00m\n\u001b[0;32m 418\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39msteps[step_idx] \u001b[38;5;241m=\u001b[39m (name, fitted_transformer)\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\joblib\\memory.py:312\u001b[0m, in \u001b[0;36mNotMemorizedFunc.__call__\u001b[1;34m(self, *args, **kwargs)\u001b[0m\n\u001b[0;32m 311\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m__call__\u001b[39m(\u001b[38;5;28mself\u001b[39m, \u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[1;32m--> 312\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\pipeline.py:1310\u001b[0m, in \u001b[0;36m_fit_transform_one\u001b[1;34m(transformer, X, y, weight, message_clsname, message, params)\u001b[0m\n\u001b[0;32m 1308\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m _print_elapsed_time(message_clsname, message):\n\u001b[0;32m 1309\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(transformer, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfit_transform\u001b[39m\u001b[38;5;124m\"\u001b[39m):\n\u001b[1;32m-> 1310\u001b[0m res \u001b[38;5;241m=\u001b[39m \u001b[43mtransformer\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfit_transform\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mparams\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mfit_transform\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m{\u001b[49m\u001b[43m}\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 1311\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m 1312\u001b[0m res \u001b[38;5;241m=\u001b[39m transformer\u001b[38;5;241m.\u001b[39mfit(X, y, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mparams\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfit\u001b[39m\u001b[38;5;124m\"\u001b[39m, {}))\u001b[38;5;241m.\u001b[39mtransform(\n\u001b[0;32m 1313\u001b[0m X, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mparams\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtransform\u001b[39m\u001b[38;5;124m\"\u001b[39m, {})\n\u001b[0;32m 1314\u001b[0m )\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\utils\\_set_output.py:316\u001b[0m, in \u001b[0;36m_wrap_method_output..wrapped\u001b[1;34m(self, X, *args, **kwargs)\u001b[0m\n\u001b[0;32m 314\u001b[0m \u001b[38;5;129m@wraps\u001b[39m(f)\n\u001b[0;32m 315\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mwrapped\u001b[39m(\u001b[38;5;28mself\u001b[39m, X, \u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[1;32m--> 316\u001b[0m data_to_wrap \u001b[38;5;241m=\u001b[39m \u001b[43mf\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 317\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(data_to_wrap, \u001b[38;5;28mtuple\u001b[39m):\n\u001b[0;32m 318\u001b[0m \u001b[38;5;66;03m# only wrap the first output for cross decomposition\u001b[39;00m\n\u001b[0;32m 319\u001b[0m return_tuple \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m 320\u001b[0m _wrap_data_with_container(method, data_to_wrap[\u001b[38;5;241m0\u001b[39m], X, \u001b[38;5;28mself\u001b[39m),\n\u001b[0;32m 321\u001b[0m \u001b[38;5;241m*\u001b[39mdata_to_wrap[\u001b[38;5;241m1\u001b[39m:],\n\u001b[0;32m 322\u001b[0m )\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\base.py:1098\u001b[0m, in \u001b[0;36mTransformerMixin.fit_transform\u001b[1;34m(self, X, y, **fit_params)\u001b[0m\n\u001b[0;32m 1083\u001b[0m warnings\u001b[38;5;241m.\u001b[39mwarn(\n\u001b[0;32m 1084\u001b[0m (\n\u001b[0;32m 1085\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mThis object (\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__class__\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m) has a `transform`\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 1093\u001b[0m \u001b[38;5;167;01mUserWarning\u001b[39;00m,\n\u001b[0;32m 1094\u001b[0m )\n\u001b[0;32m 1096\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m y \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m 1097\u001b[0m \u001b[38;5;66;03m# fit method of arity 1 (unsupervised transformation)\u001b[39;00m\n\u001b[1;32m-> 1098\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfit\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mfit_params\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtransform\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 1099\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m 1100\u001b[0m \u001b[38;5;66;03m# fit method of arity 2 (supervised transformation)\u001b[39;00m\n\u001b[0;32m 1101\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mfit(X, y, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mfit_params)\u001b[38;5;241m.\u001b[39mtransform(X)\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\utils\\_set_output.py:316\u001b[0m, in \u001b[0;36m_wrap_method_output..wrapped\u001b[1;34m(self, X, *args, **kwargs)\u001b[0m\n\u001b[0;32m 314\u001b[0m \u001b[38;5;129m@wraps\u001b[39m(f)\n\u001b[0;32m 315\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mwrapped\u001b[39m(\u001b[38;5;28mself\u001b[39m, X, \u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[1;32m--> 316\u001b[0m data_to_wrap \u001b[38;5;241m=\u001b[39m \u001b[43mf\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 317\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(data_to_wrap, \u001b[38;5;28mtuple\u001b[39m):\n\u001b[0;32m 318\u001b[0m \u001b[38;5;66;03m# only wrap the first output for cross decomposition\u001b[39;00m\n\u001b[0;32m 319\u001b[0m return_tuple \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m 320\u001b[0m _wrap_data_with_container(method, data_to_wrap[\u001b[38;5;241m0\u001b[39m], X, \u001b[38;5;28mself\u001b[39m),\n\u001b[0;32m 321\u001b[0m \u001b[38;5;241m*\u001b[39mdata_to_wrap[\u001b[38;5;241m1\u001b[39m:],\n\u001b[0;32m 322\u001b[0m )\n", + "Cell \u001b[1;32mIn[14], line 18\u001b[0m, in \u001b[0;36mSalaryFeatures.transform\u001b[1;34m(self, X, y)\u001b[0m\n\u001b[0;32m 15\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mtransform\u001b[39m(\u001b[38;5;28mself\u001b[39m, X, y\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m):\n\u001b[0;32m 16\u001b[0m \u001b[38;5;66;03m# Создание новых признаков\u001b[39;00m\n\u001b[0;32m 17\u001b[0m X \u001b[38;5;241m=\u001b[39m X\u001b[38;5;241m.\u001b[39mcopy()\n\u001b[1;32m---> 18\u001b[0m X[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mwork_year_to_remote_ratio\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[43mX\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mwork_year\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m \u001b[38;5;241m/\u001b[39m X[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mremote_ratio\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n\u001b[0;32m 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m X\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\scipy\\sparse\\_csr.py:24\u001b[0m, in \u001b[0;36m_csr_base.__getitem__\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m 22\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m__getitem__\u001b[39m(\u001b[38;5;28mself\u001b[39m, key):\n\u001b[0;32m 23\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mndim \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m2\u001b[39m:\n\u001b[1;32m---> 24\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;21;43m__getitem__\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 26\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(key, \u001b[38;5;28mtuple\u001b[39m) \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(key) \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[0;32m 27\u001b[0m key \u001b[38;5;241m=\u001b[39m key[\u001b[38;5;241m0\u001b[39m]\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\scipy\\sparse\\_index.py:52\u001b[0m, in \u001b[0;36mIndexMixin.__getitem__\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m 51\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m__getitem__\u001b[39m(\u001b[38;5;28mself\u001b[39m, key):\n\u001b[1;32m---> 52\u001b[0m row, col \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_validate_indices\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 54\u001b[0m \u001b[38;5;66;03m# Dispatch to specialized methods.\u001b[39;00m\n\u001b[0;32m 55\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(row, INT_TYPES):\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\scipy\\sparse\\_index.py:186\u001b[0m, in \u001b[0;36mIndexMixin._validate_indices\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m 184\u001b[0m row \u001b[38;5;241m=\u001b[39m _validate_bool_idx(bool_row, M, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrow\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m 185\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(row, \u001b[38;5;28mslice\u001b[39m):\n\u001b[1;32m--> 186\u001b[0m row \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_asindices\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrow\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mM\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 188\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m isintlike(col):\n\u001b[0;32m 189\u001b[0m col \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mint\u001b[39m(col)\n", + "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\scipy\\sparse\\_index.py:212\u001b[0m, in \u001b[0;36mIndexMixin._asindices\u001b[1;34m(self, idx, length)\u001b[0m\n\u001b[0;32m 209\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mIndexError\u001b[39;00m(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124minvalid index\u001b[39m\u001b[38;5;124m'\u001b[39m) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01me\u001b[39;00m\n\u001b[0;32m 211\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m x\u001b[38;5;241m.\u001b[39mndim \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m (\u001b[38;5;241m1\u001b[39m, \u001b[38;5;241m2\u001b[39m):\n\u001b[1;32m--> 212\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mIndexError\u001b[39;00m(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mIndex dimension must be 1 or 2\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m 214\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m x\u001b[38;5;241m.\u001b[39msize \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m0\u001b[39m:\n\u001b[0;32m 215\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m x\n", + "\u001b[1;31mIndexError\u001b[0m: Index dimension must be 1 or 2" + ] + } + ], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn.base import BaseEstimator, TransformerMixin\n", + "from sklearn.compose import ColumnTransformer\n", + "from sklearn.impute import SimpleImputer\n", + "from sklearn.preprocessing import OneHotEncoder, StandardScaler\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# Загрузка данных\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", + "\n", + "# Создание целевого признака\n", + "median_salary = df['salary_in_usd'].median()\n", + "df['above_median_salary'] = np.where(df['salary_in_usd'] > median_salary, 1, 0)\n", + "\n", + "# Разделение на признаки и целевую переменную\n", + "X = df.drop(columns=['salary_in_usd', 'above_median_salary'])\n", + "y = df['above_median_salary']\n", + "\n", + "# Разделение данных на тренировочный и тестовый наборы\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)\n", + "\n", + "# Построение конвейеров предобработки\n", + "\n", + "class SalaryFeatures(BaseEstimator, TransformerMixin):\n", + " def __init__(self):\n", + " pass\n", + " def fit(self, X, y=None):\n", + " return self\n", + " def transform(self, X, y=None):\n", + " # Создание новых признаков\n", + " X = X.copy()\n", + " X[\"work_year_to_remote_ratio\"] = X[\"work_year\"] / X[\"remote_ratio\"]\n", + " return X\n", + " def get_feature_names_out(self, features_in):\n", + " # Добавление имен новых признаков\n", + " new_features = [\"work_year_to_remote_ratio\"]\n", + " return np.append(features_in, new_features, axis=0)\n", + "\n", + "# Обработка числовых данных. Числовой конвейер: заполнение пропущенных значений медианой и стандартизация\n", + "preprocessing_num_class = Pipeline(steps=[\n", + " ('imputer', SimpleImputer(strategy='median')),\n", + " ('scaler', StandardScaler())\n", + "])\n", + "\n", + "# Обработка категориальных данных: заполнение пропущенных значений наиболее частым значением и one-hot encoding\n", + "preprocessing_cat_class = Pipeline(steps=[\n", + " ('imputer', SimpleImputer(strategy='most_frequent')),\n", + " ('onehot', OneHotEncoder(handle_unknown='ignore'))\n", + "])\n", + "\n", + "# Определение столбцов\n", + "numeric_columns = [\"work_year\", \"salary\", \"salary_in_usd\", \"remote_ratio\"]\n", + "cat_columns = [\"experience_level\", \"employment_type\", \"job_title\", \"salary_currency\", \"employee_residence\", \"company_location\", \"company_size\"]\n", + "\n", + "# Предобработка признаков\n", + "features_preprocessing = ColumnTransformer(\n", + " verbose_feature_names_out=False,\n", + " transformers=[\n", + " (\"prepocessing_num\", preprocessing_num_class, numeric_columns),\n", + " (\"prepocessing_cat\", preprocessing_cat_class, cat_columns),\n", + " ],\n", + " remainder=\"passthrough\"\n", + ")\n", + "\n", + "# Удаление колонок\n", + "columns_to_drop = [] # Укажите столбцы, которые нужно удалить, если они есть\n", + "drop_columns = ColumnTransformer(\n", + " verbose_feature_names_out=False,\n", + " transformers=[\n", + " (\"drop_columns\", \"drop\", columns_to_drop),\n", + " ],\n", + " remainder=\"passthrough\",\n", + ")\n", + "\n", + "# Основной конвейер предобработки данных и конструирования признаков\n", + "pipeline_end = Pipeline(\n", + " [\n", + " (\"features_preprocessing\", features_preprocessing),\n", + " (\"custom_features\", SalaryFeatures()),\n", + " (\"drop_columns\", drop_columns),\n", + " ]\n", + ")\n", + "\n", + "# Демонстрация работы конвейера для предобработки данных при классификации\n", + "preprocessing_result = pipeline_end.fit_transform(X_train)\n", + "\n", + "# Получение имен столбцов после преобразования\n", + "feature_names = pipeline_end.named_steps['features_preprocessing'].get_feature_names_out(numeric_columns + cat_columns)\n", + "feature_names = np.append(feature_names, [\"work_year_to_remote_ratio\"])\n", + "\n", + "# Создание DataFrame с преобразованными данными\n", + "preprocessed_df = pd.DataFrame(\n", + " preprocessing_result,\n", + " columns=feature_names,\n", + ")\n", + "\n", + "preprocessed_df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Бизнес-цели**\n", + "\n", + "1. Предсказание заработной платы (Регрессия)\n", + "\n", + " Цель: Предсказать заработную плату (salary_in_usd) на основе других характеристик, таких как уровень опыта (experience_level), тип занятости (employment_type), должность (job_title), место проживания сотрудника (employee_residence), размер компании (company_size) и другие факторы.\n", + "\n", + " Применение: Это может быть полезно для HR-отделов, которые хотят оценить справедливую зарплату для новых сотрудников или для анализа рынка труда.\n", + "\n", + "2. Классификация уровня опыта по зарплате (Классификация)\n", + "\n", + " Цель: Классифицировать уровень опыта (experience_level) на основе заработной платы (salary_in_usd) и других факторов.\n", + "\n", + " Применение: Это может помочь в оценке, на каком уровне опыта находится сотрудник, основываясь на его зарплате, что может быть полезно для оценки карьерного роста." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Прогнозирование зарплаты" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " work_year experience_level employment_type job_title \\\n", + "0 2023 SE FT Principal Data Scientist \n", + "1 2023 MI CT ML Engineer \n", + "2 2023 MI CT ML Engineer \n", + "3 2023 SE FT Data Scientist \n", + "4 2023 SE FT Data Scientist \n", + "\n", + " salary salary_currency salary_in_usd employee_residence remote_ratio \\\n", + "0 80000 EUR 85847 ES 100 \n", + "1 30000 USD 30000 US 100 \n", + "2 25500 USD 25500 US 100 \n", + "3 175000 USD 175000 CA 100 \n", + "4 120000 USD 120000 CA 100 \n", + "\n", + " company_location company_size \n", + "0 ES L \n", + "1 US S \n", + "2 US S \n", + "3 CA M \n", + "4 CA M \n", + "\n", + "RangeIndex: 3755 entries, 0 to 3754\n", + "Data columns (total 11 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 work_year 3755 non-null int64 \n", + " 1 experience_level 3755 non-null object\n", + " 2 employment_type 3755 non-null object\n", + " 3 job_title 3755 non-null object\n", + " 4 salary 3755 non-null int64 \n", + " 5 salary_currency 3755 non-null object\n", + " 6 salary_in_usd 3755 non-null int64 \n", + " 7 employee_residence 3755 non-null object\n", + " 8 remote_ratio 3755 non-null int64 \n", + " 9 company_location 3755 non-null object\n", + " 10 company_size 3755 non-null object\n", + "dtypes: int64(4), object(7)\n", + "memory usage: 322.8+ KB\n", + "None\n", + " work_year salary salary_in_usd remote_ratio\n", + "count 3755.000000 3.755000e+03 3755.000000 3755.000000\n", + "mean 2022.373635 1.906956e+05 137570.389880 46.271638\n", + "std 0.691448 6.716765e+05 63055.625278 48.589050\n", + "min 2020.000000 6.000000e+03 5132.000000 0.000000\n", + "25% 2022.000000 1.000000e+05 95000.000000 0.000000\n", + "50% 2022.000000 1.380000e+05 135000.000000 0.000000\n", + "75% 2023.000000 1.800000e+05 175000.000000 100.000000\n", + "max 2023.000000 3.040000e+07 450000.000000 100.000000\n", + "work_year 0\n", + "experience_level 0\n", + "employment_type 0\n", + "job_title 0\n", + "salary 0\n", + "salary_currency 0\n", + "salary_in_usd 0\n", + "employee_residence 0\n", + "remote_ratio 0\n", + "company_location 0\n", + "company_size 0\n", + "dtype: int64\n", + "Mean Squared Error: 2482079980.9527493\n", + "R^2 Score: 0.37127352660208646\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", + "from sklearn.compose import ColumnTransformer\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.metrics import mean_squared_error, r2_score\n", + "import seaborn as sns\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Загружаем набор данных\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", + "\n", + "# Устанавливаем случайное состояние\n", + "random_state = 42\n", + "\n", + "# Предварительный анализ данных\n", + "print(df.head())\n", + "print(df.info())\n", + "print(df.describe())\n", + "\n", + "# Проверка на пропущенные значения\n", + "print(df.isnull().sum())\n", + "\n", + "# Предобработка данных\n", + "# Определяем категориальные и числовые столбцы\n", + "categorical_features = ['experience_level', 'employment_type', 'job_title', 'employee_residence', 'company_location', 'company_size']\n", + "numeric_features = ['work_year', 'remote_ratio']\n", + "\n", + "# Создаем пайплайн для обработки данных\n", + "preprocessor = ColumnTransformer(\n", + " transformers=[\n", + " ('num', StandardScaler(), numeric_features),\n", + " ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)])\n", + "\n", + "# Определяем целевую переменную и признаки\n", + "X = df.drop('salary_in_usd', axis=1)\n", + "y = df['salary_in_usd']\n", + "\n", + "# Разделяем данные на обучающую и тестовую выборки\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=random_state)\n", + "\n", + "# Создаем и обучаем модель\n", + "model = Pipeline(steps=[\n", + " ('preprocessor', preprocessor),\n", + " ('regressor', LinearRegression())])\n", + "\n", + "model.fit(X_train, y_train)\n", + "\n", + "# Делаем предсказания на тестовой выборке\n", + "y_pred = model.predict(X_test)\n", + "\n", + "# Оцениваем качество модели\n", + "mse = mean_squared_error(y_test, y_pred)\n", + "r2 = r2_score(y_test, y_pred)\n", + "\n", + "print(f\"Mean Squared Error: {mse}\")\n", + "print(f\"R^2 Score: {r2}\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "2. Классифицировать уровень опыта" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Classification Report:\n", + " precision recall f1-score support\n", + "\n", + " EN 0.55 0.48 0.51 67\n", + " EX 0.46 0.26 0.33 23\n", + " MI 0.48 0.54 0.51 157\n", + " SE 0.83 0.83 0.83 504\n", + "\n", + " accuracy 0.72 751\n", + " macro avg 0.58 0.53 0.55 751\n", + "weighted avg 0.72 0.72 0.72 751\n", + "\n", + "Confusion Matrix:\n", + "[[ 32 0 20 15]\n", + " [ 0 6 5 12]\n", + " [ 14 0 84 59]\n", + " [ 12 7 65 420]]\n", + "Accuracy Score: 0.7217043941411452\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhsAAAHHCAYAAAAWM5p0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABVJElEQVR4nO3deXwM5x8H8M9ujs193yVuIcTR0Ij7CEEccVSVEqq0aaIIqlFH0IoGdRNtHakKbRUt6la0FURIEaSoNlQOCTlEsons/P5Q++s2QcJOZpP9vPua18s+88zMd5KSb77P88zIBEEQQERERCQSudQBEBERUfXGZIOIiIhExWSDiIiIRMVkg4iIiETFZIOIiIhExWSDiIiIRMVkg4iIiETFZIOIiIhExWSDiIiIRMVkg0hEV69eRY8ePWBtbQ2ZTIadO3dq9fx//vknZDIZNm7cqNXzVmWdO3dG586dpQ6DiP6FyQZVe9evX8fbb7+NunXrwsTEBFZWVmjXrh2WLVuGgoICUa8dFBSECxcu4OOPP8amTZvQqlUrUa9XmUaNGgWZTAYrK6syv45Xr16FTCaDTCbDokWLKnz+27dvIyIiAomJiVqIloikZCh1AERi2rNnD1599VUoFAqMHDkSTZs2RVFREX755RdMnToVSUlJ+Oyzz0S5dkFBAeLi4vDhhx8iNDRUlGvUqlULBQUFMDIyEuX8z2JoaIgHDx5g165dGDJkiMa+zZs3w8TEBIWFhc917tu3b2POnDmoXbs2WrRoUe7jDhw48FzXIyLxMNmgauvGjRsYOnQoatWqhSNHjsDV1VW9LyQkBNeuXcOePXtEu/6dO3cAADY2NqJdQyaTwcTERLTzP4tCoUC7du2wZcuWUslGbGwsAgIC8N1331VKLA8ePICZmRmMjY0r5XpEVH4cRqFqKyoqCvfv38e6des0Eo3H6tevjwkTJqg/P3z4EPPmzUO9evWgUChQu3ZtTJ8+HUqlUuO42rVro0+fPvjll1/wyiuvwMTEBHXr1sWXX36p7hMREYFatWoBAKZOnQqZTIbatWsDeDT88PjP/xYREQGZTKbRdvDgQbRv3x42NjawsLCAh4cHpk+frt7/pDkbR44cQYcOHWBubg4bGxv0798fly9fLvN6165dw6hRo2BjYwNra2uMHj0aDx48ePIX9j+GDRuGvXv3Ijs7W90WHx+Pq1evYtiwYaX63717F1OmTIGXlxcsLCxgZWWFXr164bffflP3OXr0KFq3bg0AGD16tHo45vF9du7cGU2bNkVCQgI6duwIMzMz9dflv3M2goKCYGJiUur+/f39YWtri9u3b5f7Xono+TDZoGpr165dqFu3Ltq2bVuu/m+99RZmzZqFl19+GUuWLEGnTp0QGRmJoUOHlup77do1DB48GN27d8fixYtha2uLUaNGISkpCQAwcOBALFmyBADw+uuvY9OmTVi6dGmF4k9KSkKfPn2gVCoxd+5cLF68GP369cOvv/761OMOHToEf39/ZGRkICIiAmFhYThx4gTatWuHP//8s1T/IUOGIC8vD5GRkRgyZAg2btyIOXPmlDvOgQMHQiaTYfv27eq22NhYNGrUCC+//HKp/n/88Qd27tyJPn364NNPP8XUqVNx4cIFdOrUSf2Dv3Hjxpg7dy4AYNy4cdi0aRM2bdqEjh07qs+TlZWFXr16oUWLFli6dCm6dOlSZnzLli2Do6MjgoKCUFJSAgBYu3YtDhw4gBUrVsDNza3c90pEz0kgqoZycnIEAEL//v3L1T8xMVEAILz11lsa7VOmTBEACEeOHFG31apVSwAgHD9+XN2WkZEhKBQKYfLkyeq2GzduCACEhQsXapwzKChIqFWrVqkYZs+eLfz7r+SSJUsEAMKdO3eeGPfja2zYsEHd1qJFC8HJyUnIyspSt/3222+CXC4XRo4cWep6b775psY5BwwYINjb2z/xmv++D3Nzc0EQBGHw4MFCt27dBEEQhJKSEsHFxUWYM2dOmV+DwsJCoaSkpNR9KBQKYe7cueq2+Pj4Uvf2WKdOnQQAQnR0dJn7OnXqpNG2f/9+AYDw0UcfCX/88YdgYWEhBAYGPvMeiUg7WNmgaik3NxcAYGlpWa7+P/74IwAgLCxMo33y5MkAUGpuh6enJzp06KD+7OjoCA8PD/zxxx/PHfN/PZ7r8f3330OlUpXrmNTUVCQmJmLUqFGws7NTtzdr1gzdu3dX3+e/vfPOOxqfO3TogKysLPXXsDyGDRuGo0ePIi0tDUeOHEFaWlqZQyjAo3kecvmjf3pKSkqQlZWlHiI6e/Zsua+pUCgwevTocvXt0aMH3n77bcydOxcDBw6EiYkJ1q5dW+5rEdGLYbJB1ZKVlRUAIC8vr1z9//rrL8jlctSvX1+j3cXFBTY2Nvjrr7802t3d3Uudw9bWFvfu3XvOiEt77bXX0K5dO7z11ltwdnbG0KFD8c033zw18Xgcp4eHR6l9jRs3RmZmJvLz8zXa/3svtra2AFChe+nduzcsLS3x9ddfY/PmzWjdunWpr+VjKpUKS5YsQYMGDaBQKODg4ABHR0ecP38eOTk55b7mSy+9VKHJoIsWLYKdnR0SExOxfPlyODk5lftYInoxTDaoWrKysoKbmxsuXrxYoeP+O0HzSQwMDMpsFwThua/xeD7BY6ampjh+/DgOHTqEESNG4Pz583jttdfQvXv3Un1fxIvcy2MKhQIDBw5ETEwMduzY8cSqBgDMnz8fYWFh6NixI7766ivs378fBw8eRJMmTcpdwQEefX0q4ty5c8jIyAAAXLhwoULHEtGLYbJB1VafPn1w/fp1xMXFPbNvrVq1oFKpcPXqVY329PR0ZGdnq1eWaIOtra3Gyo3H/ls9AQC5XI5u3brh008/xaVLl/Dxxx/jyJEj+Omnn8o89+M4k5OTS+27cuUKHBwcYG5u/mI38ATDhg3DuXPnkJeXV+ak2se2bduGLl26YN26dRg6dCh69OgBPz+/Ul+T8iZ+5ZGfn4/Ro0fD09MT48aNQ1RUFOLj47V2fiJ6OiYbVG29//77MDc3x1tvvYX09PRS+69fv45ly5YBeDQMAKDUipFPP/0UABAQEKC1uOrVq4ecnBycP39e3ZaamoodO3Zo9Lt7926pYx8/3Oq/y3Efc3V1RYsWLRATE6Pxw/vixYs4cOCA+j7F0KVLF8ybNw8rV66Ei4vLE/sZGBiUqpp8++23+PvvvzXaHidFZSVmFTVt2jSkpKQgJiYGn376KWrXro2goKAnfh2JSLv4UC+qturVq4fY2Fi89tpraNy4scYTRE+cOIFvv/0Wo0aNAgA0b94cQUFB+Oyzz5CdnY1OnTrh9OnTiImJQWBg4BOXVT6PoUOHYtq0aRgwYADee+89PHjwAGvWrEHDhg01JkjOnTsXx48fR0BAAGrVqoWMjAysXr0aNWrUQPv27Z94/oULF6JXr17w9fXFmDFjUFBQgBUrVsDa2hoRERFau4//ksvlmDFjxjP79enTB3PnzsXo0aPRtm1bXLhwAZs3b0bdunU1+tWrVw82NjaIjo6GpaUlzM3N4ePjgzp16lQoriNHjmD16tWYPXu2einuhg0b0LlzZ8ycORNRUVEVOh8RPQeJV8MQie73338Xxo4dK9SuXVswNjYWLC0thXbt2gkrVqwQCgsL1f2Ki4uFOXPmCHXq1BGMjIyEmjVrCuHh4Rp9BOHR0teAgIBS1/nvkssnLX0VBEE4cOCA0LRpU8HY2Fjw8PAQvvrqq1JLXw8fPiz0799fcHNzE4yNjQU3Nzfh9ddfF37//fdS1/jv8tBDhw4J7dq1E0xNTQUrKyuhb9++wqVLlzT6PL7ef5fWbtiwQQAg3Lhx44lfU0HQXPr6JE9a+jp58mTB1dVVMDU1Fdq1ayfExcWVuWT1+++/Fzw9PQVDQ0ON++zUqZPQpEmTMq/57/Pk5uYKtWrVEl5++WWhuLhYo9+kSZMEuVwuxMXFPfUeiOjFyQShArPAiIiIiCqIczaIiIhIVEw2iIiISFRMNoiIiEhUTDaIiIhIVEw2iIiISFRMNoiIiEhUTDaIiIhIVNXyCaLpucVSh0D/sDYzkjoE+peCIu29wI1ejFyL736hF2NtKv7v3aYtQ7VynoJzK7VynsrGygYRERGJqlpWNoiIiHSKTL9/t2eyQUREJDY9HzZjskFERCQ2Pa9s6PfdExERkehY2SAiIhIbh1GIiIhIVBxGISIiIhIPKxtERERi4zAKERERiYrDKERERETiYWWDiIhIbHo+jMLKBhERkdhkcu1sL2DBggWQyWSYOHGiuq2wsBAhISGwt7eHhYUFBg0ahPT0dI3jUlJSEBAQADMzMzg5OWHq1Kl4+PBhha7NZIOIiKiai4+Px9q1a9GsWTON9kmTJmHXrl349ttvcezYMdy+fRsDBw5U7y8pKUFAQACKiopw4sQJxMTEYOPGjZg1a1aFrs9kg4iISGwymXa253D//n0MHz4cn3/+OWxtbdXtOTk5WLduHT799FN07doV3t7e2LBhA06cOIGTJ08CAA4cOIBLly7hq6++QosWLdCrVy/MmzcPq1atQlFRUbljYLJBREQkNgmHUUJCQhAQEAA/Pz+N9oSEBBQXF2u0N2rUCO7u7oiLiwMAxMXFwcvLC87Ozuo+/v7+yM3NRVJSUrlj4ARRIiIisWlpgqhSqYRSqdRoUygUUCgUZfbfunUrzp49i/j4+FL70tLSYGxsDBsbG412Z2dnpKWlqfv8O9F4vP/xvvJiZYOIiKiKiIyMhLW1tcYWGRlZZt+bN29iwoQJ2Lx5M0xMTCo5Uk1MNoiIiMSmpWGU8PBw5OTkaGzh4eFlXjIhIQEZGRl4+eWXYWhoCENDQxw7dgzLly+HoaEhnJ2dUVRUhOzsbI3j0tPT4eLiAgBwcXEptTrl8efHfcqDyQYREZHYtJRsKBQKWFlZaWxPGkLp1q0bLly4gMTERPXWqlUrDB8+XP1nIyMjHD58WH1McnIyUlJS4OvrCwDw9fXFhQsXkJGRoe5z8OBBWFlZwdPTs9y3zzkbRERE1ZClpSWaNm2q0WZubg57e3t1+5gxYxAWFgY7OztYWVlh/Pjx8PX1RZs2bQAAPXr0gKenJ0aMGIGoqCikpaVhxowZCAkJeWKSUxYmG0RERGKT6+YTRJcsWQK5XI5BgwZBqVTC398fq1evVu83MDDA7t27ERwcDF9fX5ibmyMoKAhz586t0HVkgiAI2g5eaum5xVKHQP+wNjOSOgT6l4KiEqlDoH/I9fzx1brE2lT8GQWmXT/WynkKjnyolfNUNs7ZICIiIlFxGIWIiEhsel7JYrJBREQkthd8iVpVp993T0RERKJjZYOIiEhsHEYhIiIiUen5MAqTDSIiIrHpeWVDv1MtIiIiEh0rG0RERGLjMAoRERGJisMoREREROJhZYOIiEhsHEYhIiIiUXEYhYiIiEg8rGwQERGJjcMoREREJCo9Tzb0++6JiIhIdKxsSGjntq3Y+d3XSEu9DQCoU7c+gsa8gzbtOiA3JwfrP1uF+JMnkJ6eChsbW3To3BVj3hkPCwtLiSPXL1tjNyNmwzpkZt5BQ49G+GD6THg1ayZ1WNVazLrPcPTIIfz15x9QKEzg1bwFQiZMRq3addR9lEolln8ahYP7f0RxURF8fNtj6vSZsLd3kDDy6udsQjy+ilmPK5eTkHnnDqI+XYHOXf3U++fMDMeeXTs1jmnTtj2Wr/68kiPVcXo+QZTJhoQcnVzwdugk1KhZCxAE7NvzPaZPGY91X22DIAjIvJOBdydMQe26dZGWmorFC+Yi884dzPtkidSh6419e3/EoqhIzJg9B15ezbF5UwyC3x6D73fvg729vdThVVvnzp7BoNdeh2eTpih5WII1K5diQvBb2LJ9F0xNzQAASxctwIlfjmF+1BJYWFhi0YKP8MHkCfh842aJo69eCgsK0KChB/oGDsS0sPfK7OPbrgNmzvlY/dnY2Liywqs69HwYRSYIgiB1ENqWnlssdQjPLaBbWwS/Nxl9+g8qte+nQ/vx0awPsP94PAwNq0aeaG1mJHUIL2T40FfRpKkXps+YBQBQqVTo0a0TXh82AmPGjpM4uoorKCqROoTncu/uXfTq1h5rvvgSLb1b4X5eHnp2bYe58xeia3d/AMCfN/7A0IF98EXMFjRt1lziiJ9NXgV/032lReMyKxt5eXlYtHSlhJG9GGtT8RMB08DPtHKegp1V798dQOLKRmZmJtavX4+4uDikpaUBAFxcXNC2bVuMGjUKjo6OUoZXqUpKSnD08H4UFhSgqVeLMvvk38+DmblFlUk0qrrioiJcvpSEMWPfVrfJ5XK0adMW5387J2Fk+uf+/TwAgJW1NQDgyuUkPHz4EK3b+Kr71K5TFy4urrhwPrFKJBvVydkzp+HfpR0srazQ6hUfvBMyATY2tlKHRTpEsp9a8fHx8Pf3h5mZGfz8/NCwYUMAQHp6OpYvX44FCxZg//79aNWq1VPPo1QqoVQq/9Mmh0KhEC12bbp+7Xe8++ZwFBUVwdTUDB8tXIbadeuV6pedfQ8x69ai34DBEkSpn+5l30NJSUmp4RJ7e3vcuPGHRFHpH5VKhaWLFqBZi5dRr34DAEBWViaMjIxgaWml0dfO3gFZWZlShKm3fNu1R5du3eH2Ug3cupmCNSuXYmLI21j35RYYGBhIHZ7u0PNhFMmSjfHjx+PVV19FdHQ0ZP8pJwqCgHfeeQfjx49HXFzcU88TGRmJOXPmaLRN/mAGpobP0nrMYnCvVQfrNn+H/Pt5OHr4AOZHfIgVazdqJBz59+9j2sR3UbtOPYwe966E0RJVvoWR83D92lV8tuErqUOhMvToGaD+c/0GDdGgoQcG9OmBhDOn8YqP71OO1DNVcNhMmyRLtX777TdMmjSpVKIBADKZDJMmTUJiYuIzzxMeHo6cnByN7b2waSJELA4jIyPUqOkOj8ZN8HboJNRv4IFvt/7/H9UH+fmY8t7bMDMzx0cLl8HQsGrPgahKbG1sYWBggKysLI32rKwsODhwxUNlWLTgI/z68zGs/nwjnJxd1O329g4oLi5GXl6uRv+7WZlcjSKxl2rUhI2tLW7dTJE6FNIhkiUbLi4uOH369BP3nz59Gs7Ozs88j0KhgJWVlcZWVYZQyqISVCguKgLwqKIxefw4GBkZIfLTFVX6vqoiI2NjNPZsglMn/19dU6lUOHUqDs2at5QwsupPEAQsWvARjh05hJVr18PtpRoa+xs1bgJDQ0PEnzqpbvvrzxtIS0uFV7MWlRwt/Vt6ehpysrPh4KA/c+7KQyaTaWWrqiQbRpkyZQrGjRuHhIQEdOvWTZ1YpKen4/Dhw/j888+xaNEiqcKrFGtXLoFP2w5wdnHFgwf5OLRvDxIT4rFoxVp1olFYWIAZc5ch/34+8u/nAwBsbG05FlpJRgSNxszp09CkSVM09WqGrzbFoKCgAIEDBkodWrW2MHIeDuzdg6glK2Fubo6szDsAAHMLS5iYmMDC0hJ9Awdh+eJPYG1tDXNzCyz+5GN4NWvByaFa9uBBPm6l/L9KcfvvW/j9ymVYWVvDytoaX0SvRhe/7rC3d8StWylYuXQRatR0R5u27SWMWvdU5URBGyRd+vr1119jyZIlSEhIQEnJoyV5BgYG8Pb2RlhYGIYMGfJc560qS18XzJuJs/GnkJV5B+YWlqhXvyGGBb2J1j5tcS7hNCa882aZx339/X64ur1UydE+n6q+9BUAtmz+Sv1QL49GjTFt+gw0q6I/0KrK0tc2LT3LbJ8x52P06TcAwL8e6rVvD4qKiuHTth3eD58J+yryG3VVWfqaEH8awWODSrUH9A3EtA9nY+qkUPx+5TLy8vLg6OgIH992eDvkvSo1nFUZS1/NB2/Qynnyt43Wynkqm048Z6O4uBiZmY9mkDs4OMDI6MV+QFWVZEMfVIdkozqpKsmGPqgqyYY+qJRk41UtJRvfVs1kQyce2GBkZARXV1epwyAiIhKFvg+j6PfCXyIiIhKdTlQ2iIiIqjN9r2ww2SAiIhIZkw0iIiISlb4nG5yzQUREVA2tWbMGzZo1Uz/w0tfXF3v37lXv79y5c6mHhr3zzjsa50hJSUFAQADMzMzg5OSEqVOn4uHDhxWOhZUNIiIisUlQ2KhRowYWLFiABg0aQBAExMTEoH///jh37hyaNGkCABg7dizmzp2rPsbMzEz955KSEgQEBMDFxQUnTpxAamoqRo4cCSMjI8yfP79CsTDZICIiEpkUwyh9+/bV+Pzxxx9jzZo1OHnypDrZMDMzg4uLS1mH48CBA7h06RIOHToEZ2dntGjRAvPmzcO0adMQEREBY2PjcsfCYRQiIqJqrqSkBFu3bkV+fj58ff//Nt7NmzfDwcEBTZs2RXh4OB48eKDeFxcXBy8vL433lPn7+yM3NxdJSUkVuj4rG0RERCLTVmVDqVRCqVRqtCkUiie+qPPChQvw9fVFYWEhLCwssGPHDnh6PnodwLBhw1CrVi24ubnh/PnzmDZtGpKTk7F9+3YAQFpaWqkXoj7+nJaWVqG4mWwQERGJTFvJRmRkJObMmaPRNnv2bERERJTZ38PDA4mJicjJycG2bdsQFBSEY8eOwdPTE+PGjVP38/LygqurK7p164br16+jXr16Won3MSYbREREVUR4eDjCwsI02p5U1QAAY2Nj1K9fHwDg7e2N+Ph4LFu2DGvXri3V18fHBwBw7do11KtXDy4uLjh9+rRGn/T0dAB44jyPJ+GcDSIiIpH9d4np824KhUK9lPXx9rRk479UKlWpYZjHEhMTAUD9rjJfX19cuHABGRkZ6j4HDx6ElZWVeiimvFjZICIiEpsES1/Dw8PRq1cvuLu7Iy8vD7GxsTh69Cj279+P69evIzY2Fr1794a9vT3Onz+PSZMmoWPHjmjWrBkAoEePHvD09MSIESMQFRWFtLQ0zJgxAyEhIRVKcAAmG0RERNVSRkYGRo4cidTUVFhbW6NZs2bYv38/unfvjps3b+LQoUNYunQp8vPzUbNmTQwaNAgzZsxQH29gYIDdu3cjODgYvr6+MDc3R1BQkMZzOcpLJgiCoM2b0wXpucVSh0D/sDYzkjoE+peCohKpQ6B/yPX88dW6xNpU/BkFDqO2auU8mRuHauU8lY2VDSIiIpHp+7tRmGwQERGJTN+TDa5GISIiIlGxskFERCQ2/S5sMNkgIiISG4dRiIiIiETEygYREZHI9L2ywWSDiIhIZPqebHAYhYiIiETFygYREZHI9L2ywWSDiIhIbPqda3AYhYiIiMTFygYREZHIOIxCREREomKyQURERKLS92SDczaIiIhIVKxsEBERiU2/CxtMNoiIiMTGYRQiIiIiEbGyQUREJDJ9r2ww2SAiIhKZvicbHEYhIiIiUbGyQUREJDJ9r2ww2SAiIhKbfucaHEYhIiIicVXLyoa1mZHUIdA/VCpB6hDoXxSG/P1CV5Tw74Ze4TAKERERiYrJBhEREYlKz3MNztkgIiIicbGyQUREJDIOoxAREZGo9DzX4DAKERERiYuVDSIiIpHp+zAKKxtEREQik8m0s1XEmjVr0KxZM1hZWcHKygq+vr7Yu3even9hYSFCQkJgb28PCwsLDBo0COnp6RrnSElJQUBAAMzMzODk5ISpU6fi4cOHFb5/JhtERETVUI0aNbBgwQIkJCTgzJkz6Nq1K/r374+kpCQAwKRJk7Br1y58++23OHbsGG7fvo2BAweqjy8pKUFAQACKiopw4sQJxMTEYOPGjZg1a1aFY5EJglDtHmNXWPGki0TCJ4gSlY1PENUdlibi/97tOf2AVs5zaX6PFzrezs4OCxcuxODBg+Ho6IjY2FgMHjwYAHDlyhU0btwYcXFxaNOmDfbu3Ys+ffrg9u3bcHZ2BgBER0dj2rRpuHPnDoyNjct9XVY2iIiIRCbFMMq/lZSUYOvWrcjPz4evry8SEhJQXFwMPz8/dZ9GjRrB3d0dcXFxAIC4uDh4eXmpEw0A8Pf3R25urro6Ul6cIEpERFRFKJVKKJVKjTaFQgGFQlFm/wsXLsDX1xeFhYWwsLDAjh074OnpicTERBgbG8PGxkajv7OzM9LS0gAAaWlpGonG4/2P91UEKxtEREQik8lkWtkiIyNhbW2tsUVGRj7xuh4eHkhMTMSpU6cQHByMoKAgXLp0qRLv/BFWNoiIiESmrZWv4eHhCAsL02h7UlUDAIyNjVG/fn0AgLe3N+Lj47Fs2TK89tprKCoqQnZ2tkZ1Iz09HS4uLgAAFxcXnD59WuN8j1erPO5TXqxsEBERiUxblQ2FQqFeyvp4e1qy8V8qlQpKpRLe3t4wMjLC4cOH1fuSk5ORkpICX19fAICvry8uXLiAjIwMdZ+DBw/CysoKnp6eFbp/VjaIiIiqofDwcPTq1Qvu7u7Iy8tDbGwsjh49iv3798Pa2hpjxoxBWFgY7OzsYGVlhfHjx8PX1xdt2rQBAPTo0QOenp4YMWIEoqKikJaWhhkzZiAkJKRCCQ7AZIOIiEh0UjxBNCMjAyNHjkRqaiqsra3RrFkz7N+/H927dwcALFmyBHK5HIMGDYJSqYS/vz9Wr16tPt7AwAC7d+9GcHAwfH19YW5ujqCgIMydO7fCsfA5GyQqPmeDqGx8zobuqIznbLSIOPzsTuWQGNFNK+epbJyzQURERKLiMAoREZHI9P1FbEw2iIiIRKbnuQaHUYiIiEhcrGwQERGJjMMoREREJCo9zzU4jEJERETiYmWDiIhIZBxGISIiIlHpea7BZIOIiEhs+l7Z4JwNIiIiEhUrG0RERCLT88IGkw0iIiKxcRiFiIiISESsbBAREYlMzwsbTDaIiIjExmEUIiIiIhGxskFERCQyPS9sMNkgIiISG4dRiIiIiETEygYREZHIWNkgnbM1djN6de+K1i29MHzoq7hw/rzUIemtjPR0fPjBVHRu74M2rZrj1QF9kZR0Qeqw9E706hVo6dVIYxvQt5fUYemFswnxmDQ+GD39OqJV88Y4euSQet/D4mIsX7IIrw3qh/Y+L6OnX0fM+nAa7mRkSBixbpLJtLNVVaxs6Jh9e3/EoqhIzJg9B15ezbF5UwyC3x6D73fvg729vdTh6ZXcnByMGvk6Wrf2wco1n8PW1g4pKX/Cyspa6tD0Ur36DRD9+Xr1ZwMD/vNVGQoKCtDAwwP9Agdiath7GvsKCwtx5colvDUuGA08GiEvNweLPolE2IR3sWnLNoki1k36Xtng31YdsylmAwYOHoLAAYMAADNmz8Hx40exc/t3GDN2nMTR6ZcN67+Ai4sr5nwUqW57qUYNCSPSbwYGBnBwcJQ6DL3Trn1HtGvfscx9FpaWWL12vUbb++EzEDR8CNJSb8PF1a0yQqQqgMMoOqS4qAiXLyWhjW9bdZtcLkebNm1x/rdzEkamn44dPQJPz6aYGjYBXTu1xdBXB2D7tm+kDktvpaT8he5dO6BPTz9MnzYFqam3pQ6JynD/fh5kMhksLK2kDkWn6Pswik4nGzdv3sSbb74pdRiV5l72PZSUlJQaLrG3t0dmZqZEUemvv2/dxLffbIF7rVpYHf0FXh0yFFELPsYP3++QOjS909SrOebOi8SqNV9g+szZ+PvvW3gz6A3k59+XOjT6F6VSiRVLF8O/VwAsLCykDkenyGQyrWxVlU4Po9y9excxMTFYv379E/solUoolUqNNsFAAYVCIXZ4VM2pVAI8mzTB+AlhAIBGjT1x7dpVbPtmK/r1HyBxdPqlfYf/l/EbenjAy6s5evt3xYH9+zBg4GAJI6PHHhYX44OpkyAIAj74cLbU4ZCOkTTZ+OGHH566/48//njmOSIjIzFnzhyNtg9nzsaMWREvEpokbG1sYWBggKysLI32rKwsODg4SBSV/nJwdETdevU12urUrYfDhw5IFBE9ZmllBfdatXEz5S+pQyH8P9FIS72NNZ9vYFWjDFW4KKEVkiYbgYGBkMlkEAThiX2eVTYKDw9HWFiYRptgUDWrGkbGxmjs2QSnTsahazc/AIBKpcKpU3EY+vobEkenf1q0aIm//ryh0Zby559w5aQ3yT14kI9bN28ioG8/qUPRe48TjZSUv7D2ixjY2NhKHZJOkut5tiHpnA1XV1ds374dKpWqzO3s2bPPPIdCoYCVlZXGVpWHUEYEjcb2bd/gh5078Mf16/hobgQKCgoQOGCg1KHpnTdGjsKF879h3efRSEn5C3v37MJ3332D14YOlzo0vfPpok9wJv40bv99C4mJZxE2YTzkBnL07NVH6tCqvQcP8pF85TKSr1wGAPz99y0kX7mMtNTbeFhcjPenTMTlS0n4KHIhSlQlyMy8g8zMOyguLpI4ctIlklY2vL29kZCQgP79+5e5/1lVj+qoZ6/euHf3LlavXI7MzDvwaNQYq9d+AXsOo1S6Jk29sHjpCqxY+ik+i16Nl16qganvh6N3n75Sh6Z30tPTET5tMnKys2Fra4cWL3vjy81fw87OTurQqr1LSUl4560g9ecliz4BAPTpF4hx74Ti+NEjAIBhQzTnMUV/EYNWrV+pvEB1nJ4XNiATJPxp/vPPPyM/Px89e/Ysc39+fj7OnDmDTp06Vei8hQ+1ER1pg0qlX8kiUXmV8O+GzrA0Eb/I77/6lFbOs/9dH62cp7JJWtno0KHDU/ebm5tXONEgIiLSNXI9r2zo9HM2iIiI6PlERkaidevWsLS0hJOTEwIDA5GcnKzRp3PnzqWe5fHOO+9o9ElJSUFAQADMzMzg5OSEqVOn4uHDig0h6PRzNoiIiKoDKR7IdezYMYSEhKB169Z4+PAhpk+fjh49euDSpUswNzdX9xs7dizmzp2r/mxmZqb+c0lJCQICAuDi4oITJ04gNTUVI0eOhJGREebPn1/uWJhsEBERiUyKCaL79u3T+Lxx40Y4OTkhISEBHTv+/0F5ZmZmcHFxKfMcBw4cwKVLl3Do0CE4OzujRYsWmDdvHqZNm4aIiAgYGxuXKxYOoxAREemBnJwcACi1imvz5s1wcHBA06ZNER4ejgcPHqj3xcXFwcvLC87Ozuo2f39/5ObmIikpqdzXZmWDiIhIZDJop7RR1is6FIpnv6JDpVJh4sSJaNeuHZo2bapuHzZsGGrVqgU3NzecP38e06ZNQ3JyMrZv3w4ASEtL00g0AKg/p6WllTtuJhtEREQi09ZqlLJe0TF79mxEREQ89biQkBBcvHgRv/zyi0b7uHHj1H/28vKCq6srunXrhuvXr6NevXraCRocRiEiIqoywsPDkZOTo7GFh4c/9ZjQ0FDs3r0bP/30E2rUqPHUvj4+j57jce3aNQCAi4sL0tPTNfo8/vykeR5lYbJBREQkMm29Yr4ir+gQBAGhoaHYsWMHjhw5gjp16jwzzsTERACPXicCAL6+vrhw4QIyMjLUfQ4ePAgrKyt4enqW+/45jEJERCQyKVajhISEIDY2Ft9//z0sLS3Vcyysra1hamqK69evIzY2Fr1794a9vT3Onz+PSZMmoWPHjmjWrBkAoEePHvD09MSIESMQFRWFtLQ0zJgxAyEhIRV6D5mkjysXCx9Xrjv4uHKisvFx5bqjMh5XHvjFGa2cZ+dbrcrd90nP9tiwYQNGjRqFmzdv4o033sDFixeRn5+PmjVrYsCAAZgxYwasrKzU/f/66y8EBwfj6NGjMDc3R1BQEBYsWABDw/LXK5hskKiYbBCVjcmG7qiMZGPgugStnGf7GG+tnKeycRiFiIhIZPr+1lcmG0RERCKT4nHluoSrUYiIiEhUrGwQERGJTM8LG0w2iIiIxCbX82yDwyhEREQkKlY2iIiIRKbfdQ0mG0RERKLjahQiIiIiEbGyQUREJDJtvWK+qipXsvHDDz+U+4T9+vV77mCIiIiqI30fRilXshEYGFiuk8lkMpSUlLxIPERERFTNlCvZUKlUYsdBRERUbel5YYNzNoiIiMTGYZTnkJ+fj2PHjiElJQVFRUUa+9577z2tBEZERFRdcIJoBZ07dw69e/fGgwcPkJ+fDzs7O2RmZsLMzAxOTk5MNoiIiEhDhZ+zMWnSJPTt2xf37t2DqakpTp48ib/++gve3t5YtGiRGDESERFVaTKZTCtbVVXhZCMxMRGTJ0+GXC6HgYEBlEolatasiaioKEyfPl2MGImIiKo0mZa2qqrCyYaRkRHk8keHOTk5ISUlBQBgbW2Nmzdvajc6IiIiqvIqPGejZcuWiI+PR4MGDdCpUyfMmjULmZmZ2LRpE5o2bSpGjERERFUaXzFfQfPnz4erqysA4OOPP4atrS2Cg4Nx584dfPbZZ1oPkIiIqKqTybSzVVUVrmy0atVK/WcnJyfs27dPqwERERFR9cKHehEREYmsKq8k0YYKJxt16tR56hftjz/+eKGAiIiIqhs9zzUqnmxMnDhR43NxcTHOnTuHffv2YerUqdqKi4iIiKqJCicbEyZMKLN91apVOHPmzAsHREREVN1wNYqW9OrVC9999522TkdERFRtcDWKlmzbtg12dnbaOh0REVG1wQmiFdSyZUuNL5ogCEhLS8OdO3ewevVqrQZHREREVV+Fk43+/ftrJBtyuRyOjo7o3LkzGjVqpNXgnlfRQ5XUIdA/jA21NlJHWpB8O0/qEOgfFiZ88oCusDQxFf0a+v4vYYX/b4+IiBAhDCIioupL34dRKpxsGRgYICMjo1R7VlYWDAwMtBIUERERVR8VrmwIglBmu1KphLGx8QsHREREVN3I9buwUf5kY/ny5QAelYK++OILWFhYqPeVlJTg+PHjOjNng4iISJdIkWxERkZi+/btuHLlCkxNTdG2bVt88skn8PDwUPcpLCzE5MmTsXXrViiVSvj7+2P16tVwdnZW90lJSUFwcDB++uknWFhYICgoCJGRkTA0LH+9otw9lyxZAuBRZSM6OlpjyMTY2Bi1a9dGdHR0uS9MRERE4jl27BhCQkLQunVrPHz4ENOnT0ePHj1w6dIlmJubAwAmTZqEPXv24Ntvv4W1tTVCQ0MxcOBA/PrrrwAeFRMCAgLg4uKCEydOIDU1FSNHjoSRkRHmz59f7lhkwpPGRZ6gS5cu2L59O2xtbStyWKXKLeRqFF3B1Si6hatRdAdXo+iOek7ir0aZvCtZK+dZ3Nfj2Z2e4M6dO3BycsKxY8fQsWNH5OTkwNHREbGxsRg8eDAA4MqVK2jcuDHi4uLQpk0b7N27F3369MHt27fV1Y7o6GhMmzYNd+7cKff0iQr/JPjpp590OtEgIiLSNXKZdrYXkZOTAwDqB3AmJCSguLgYfn5+6j6NGjWCu7s74uLiAABxcXHw8vLSGFbx9/dHbm4ukpKSyn//FQ120KBB+OSTT0q1R0VF4dVXX63o6YiIiKiclEolcnNzNTalUvnM41QqFSZOnIh27dqhadOmAIC0tDQYGxvDxsZGo6+zszPS0tLUff6daDze/3hfeVU42Th+/Dh69+5dqr1Xr144fvx4RU9HRERU7Wnr3SiRkZGwtrbW2CIjI595/ZCQEFy8eBFbt26thLstrcKDhvfv3y9zjMbIyAi5ublaCYqIiKg60dZbX8PDwxEWFqbRplAonnpMaGgodu/ejePHj6NGjRrqdhcXFxQVFSE7O1ujupGeng4XFxd1n9OnT2ucLz09Xb2vvCpc2fDy8sLXX39dqn3r1q3w9PSs6OmIiIiqPbmWNoVCASsrK43tScmGIAgIDQ3Fjh07cOTIEdSpU0djv7e3N4yMjHD48GF1W3JyMlJSUuDr6wsA8PX1xYULFzQe5nnw4EFYWVlV6Gd+hSsbM2fOxMCBA3H9+nV07doVAHD48GHExsZi27ZtFT0dERERiSAkJASxsbH4/vvvYWlpqZ5jYW1tDVNTU1hbW2PMmDEICwuDnZ0drKysMH78ePj6+qJNmzYAgB49esDT0xMjRoxAVFQU0tLSMGPGDISEhDyzovJvFU42+vbti507d2L+/PnYtm0bTE1N0bx5cxw5coSvmCciIiqDFK9GWbNmDQCgc+fOGu0bNmzAqFGjADx6hpZcLsegQYM0Hur1mIGBAXbv3o3g4GD4+vrC3NwcQUFBmDt3boViqfBzNv4rNzcXW7Zswbp165CQkICSkpIXOZ1W8DkbuoPP2dAtfM6G7uBzNnRHZTxnY+a+q1o5z7yeDbRynsr23D8Jjh8/jqCgILi5uWHx4sXo2rUrTp48qc3YiIiIqBqoUGqdlpaGjRs3Yt26dcjNzcWQIUOgVCqxc+dOTg4lIiJ6Aj1/w3z5Kxt9+/aFh4cHzp8/j6VLl+L27dtYsWKFmLERERFVC7rwBFEplbuysXfvXrz33nsIDg5GgwZVc8yIiIiIKl+5Kxu//PIL8vLy4O3tDR8fH6xcuRKZmZlixkZERFQtyGUyrWxVVbmTjTZt2uDzzz9Hamoq3n77bWzduhVubm5QqVQ4ePAg8vI4y52IiKgs2npceVVV4dUo5ubmePPNN/HLL7/gwoULmDx5MhYsWAAnJyf069dPjBiJiIioCnuhhyB4eHggKioKt27dwpYtW7QVExERUbXCCaJaYGBggMDAQAQGBmrjdERERNWKDFU4U9ACPsKOiIhIZFW5KqENfJY0ERERiYqVDSIiIpHpe2WDyQYREZHIZFV53aoWcBiFiIiIRMXKBhERkcg4jEJERESi0vNRFA6jEBERkbhY2SAiIhJZVX6JmjYw2SAiIhKZvs/Z4DAKERERiYqVDSIiIpHp+SgKkw0iIiKxyfkiNiIiIhKTvlc2OGeDiIiIRMXKBhERkci4GoUkdTYhHpPGB6OXX0e0bt4YR48cemLfyHkRaN28MWK/iqnECGlr7Gb06t4VrVt6YfjQV3Hh/HmpQ6rWVCUl2LpxDUJG9MPwgHYYP7I/tn31BQRBKLP/Z0vnY0j3VtizPbaSI9UPX61fg94dWmhs44YHqven/n0T86ZPwtA+XTDIvx3mz5qKe3ezpAtYR8llMq1sVRWTDYkVFBSgoYcH3g+f+dR+Px0+iAsXfoOjo1MlRUYAsG/vj1gUFYm33w3B1m93wMOjEYLfHoOsLP5jKpadX8fg4K5tGBP6Ppas+xbD3xqPH775Ent3fl2q7+lffsLVyxdha+8oQaT6o1adevhq5yH1tnDVBgBAYUEBPgwLhkwmQ+Syz7Bo9UY8LC7GnA/eg0qlkjhq0iVMNiTWrn1HBIdORJdu3Z/YJyM9HYsWfIx586NgaMSRr8q0KWYDBg4egsABg1Cvfn3MmD0HJiYm2Ln9O6lDq7Z+v3Qerdp2wss+7eHk4oY2Hf3QzNsH15KTNPrdzczA+lUL8V74PBga8u+FmAwMDGBn76DerG1sAQCXLpxDRtpthE2fizr1GqBOvQaY/OE8XL1yCb+dPS1x1LpFJtPOVlUx2dBxKpUKsz+chjdGvYl69RtIHY5eKS4qwuVLSWjj21bdJpfL0aZNW5z/7ZyEkVVvDT2b4eK5eNy+9RcA4M/rvyP54m9o2fr/3weVSoUVn8xCv1dHoGbtelKFqjf+vpWCNwK7480hAYiaG46M9FQAQHFxMSCTwcjIWN3X2FgBmVyOpPP8O/Jv+j6Mwl8HdFzMhi9gYGCAocNGSB2K3rmXfQ8lJSWwt7fXaLe3t8eNG39IFFX1Fzh0FAoe5GPSm4Mhl8uhUqkwdPS76NCtl7rP91/HwEBugF4DhkoYqX7w8PRC2PS5qFGzNu5mZSJ2YzSmhryJNV9uQyNPL5iYmGJ99FIEjRsPCMCG6GVQlZTgXlam1KGTDpE82SgoKEBCQgLs7Ozg6empsa+wsBDffPMNRo4c+cTjlUollEqlZptgBIVCIUq8lenypSRs3bwJX239DrIqnNESVUTcsYP45cg+vBf+EWrWroc/ryVj45pPYWvviM49+uCP3y/jxx1b8cnqr/j3ohK0btNe/ec69RvCw7MpRr3aGz8fOQD/PgMwfW4UVi6ejx+2bYFMLkenbj1Rv2FjyGQsnP+bvv+vKmmy8fvvv6NHjx5ISUmBTCZD+/btsXXrVri6ugIAcnJyMHr06KcmG5GRkZgzZ45G2wcfzkL4jNmixl4Zzp09g3t3s9C3Z1d1W0lJCZYtjsLWzV/ih72HJYyu+rO1sYWBgUGpyaBZWVlwcHCQKKrq76vPl6P/a0Fo18UfAOBepz7uZKRi59YN6NyjDy5fPIfc7Lt4d3gf9TEqVQm+XLsUP27fglVf7ZIqdL1gYWmFl2q64/atmwCAl19pi/Vf70ZO9j0YGBjAwtIKw/t3g4vbSxJHqlv0PfWSNNmYNm0amjZtijNnziA7OxsTJ05Eu3btcPToUbi7u5frHOHh4QgLC9NoUwpGYoRb6Xr36YdXfHw12t4LHoteffqhb+BAiaLSH0bGxmjs2QSnTsahazc/AI/mCpw6FYehr78hcXTVl7KwEHK55j/NcrkBBNWjpa8d/XrDq+UrGvs/Dh+Pjn690cW/b6XFqa8KHjxA6t+30NVfM+F+PGk0MeE0su/dRZv2nSWIjnSVpMnGiRMncOjQITg4OMDBwQG7du3Cu+++iw4dOuCnn36Cubn5M8+hUChKDZnkFladJVcPHuTjZkqK+vPtv28h+cplWFtbw8XVDTb//AV+zNDIEPYODqhdu05lh6qXRgSNxszp09CkSVM09WqGrzbFoKCgAIEDmOyJxbtNB2yPXQ8HJxfUqFUXf15Lxu7vNqOLfz8AgKWVDSytbDSOMTQ0hI2dPdxq1q78gKu5L1Z9Cp+2HeHk4oqszDv4av0ayOUG6NytJwDgwJ6dcK9dF9Y2trh88TzWLo9C4JA3UMO9trSB6xiphvyOHz+OhQsXIiEhAampqdixYwcCAwPV+0eNGoWYGM1nN/n7+2Pfvn3qz3fv3sX48eOxa9cuyOVyDBo0CMuWLYOFhUW545A02SgoKNBYsiaTybBmzRqEhoaiU6dOiI2t/g/puZyUhHfeClJ/XrLoEwBAQL9ARMyLlCos+kfPXr1x7+5drF65HJmZd+DRqDFWr/0C9hxGEc2boVPx9cZofLF8AXKy78HO3gHdAwZi8BtjpQ5NL2VmpOOTOeHIzc2GtY0tmni1xJK1X8La1g4A8PfNvxDz2Qrk5ebAycUNr414CwNeY+Xvv6SaspGfn4/mzZvjzTffxMCBZf+S1LNnT2zYsEH9+b+/wA8fPhypqak4ePAgiouLMXr0aIwbN65CP6NlwpMey1cJXnnlFYwfPx4jRpReaREaGorNmzcjNzcXJSUlFTpvVapsVHfGhvo+Uqlbkm/nSR0C/cPCRPL5+fSPek6mol/jq4RbWjnPG941nvtYmUxWZmUjOzsbO3fuLPOYy5cvw9PTE/Hx8WjVqhUAYN++fejduzdu3boFNze3cl1b0p8EAwYMwJYtW8rct3LlSrz++utPfEQxERERvbijR4/CyckJHh4eCA4O1pgUHxcXBxsbG3WiAQB+fn6Qy+U4depUua8habIRHh6OH3/88Yn7V69ezUfeEhFRlSfT0qZUKpGbm6ux/ffxDxXRs2dPfPnllzh8+DA++eQTHDt2DL169VKPKKSlpcHJSfM1GYaGhrCzs0NaWlq5r8MaNxERkci09bjyyMhIWFtba2yRkc8/v2/o0KHo168fvLy8EBgYiN27dyM+Ph5Hjx7V3s2DyQYREVGVER4ejpycHI0tPDxca+evW7cuHBwccO3aNQCAi4sLMjIyNPo8fPgQd+/ehYuLS7nPyxlKREREItPW0teyHvegTbdu3UJWVpb64Zq+vr7Izs5GQkICvL29AQBHjhyBSqWCj49Puc/LZIOIiEhkUg0j3L9/X12lAIAbN24gMTERdnZ2sLOzw5w5czBo0CC4uLjg+vXreP/991G/fn34+z96gm/jxo3Rs2dPjB07FtHR0SguLkZoaCiGDh1a7pUoAIdRiIiIqq0zZ86gZcuWaNmyJQAgLCwMLVu2xKxZs2BgYIDz58+jX79+aNiwIcaMGQNvb2/8/PPPGtWTzZs3o1GjRujWrRt69+6N9u3b47PPPqtQHJI+Z0MsfM6G7uBzNnQLn7OhO/icDd1RGc/Z+CbxtlbOM6RF+asJuoT/txMREYlMz1/6ymEUIiIiEhcrG0RERCKT6kVsuoLJBhERkcj0fRiByQYREZHI9L2yoe/JFhEREYmMlQ0iIiKR6Xddg8kGERGR6PR8FIXDKERERCQuVjaIiIhEJtfzgRQmG0RERCLjMAoRERGRiFjZICIiEpmMwyhEREQkJg6jEBEREYmIlQ0iIiKRcTUKERERiUrfh1GYbBAREYlM35MNztkgIiIiUbGyQUREJDIufSUiIiJRyfU71+AwChEREYmLlQ0iIiKRcRiFiIiIRMXVKEREREQiYmWDiIhIZBxGISIiIlFxNQoRERGRiFjZICIiEhmHUYiIiEhU+r4ahckGERGRyPQ81+CcDSIiIhIXKxtEREQik+v5OEq1TDb0+1uqW0pUgtQh0L+YGhtIHQL9o6n/VKlDoH8UnFsp+jWk+rl0/PhxLFy4EAkJCUhNTcWOHTsQGBio3i8IAmbPno3PP/8c2dnZaNeuHdasWYMGDRqo+9y9exfjx4/Hrl27IJfLMWjQICxbtgwWFhbljoPDKERERNVUfn4+mjdvjlWrVpW5PyoqCsuXL0d0dDROnToFc3Nz+Pv7o7CwUN1n+PDhSEpKwsGDB7F7924cP34c48aNq1AcMkEQqt2vnnmFKqlDoH/I9f1JNjrm77sFUodA/2je632pQ6B/VEZl4+T1bK2cp009m+c+ViaTaVQ2BEGAm5sbJk+ejClTpgAAcnJy4OzsjI0bN2Lo0KG4fPkyPD09ER8fj1atWgEA9u3bh969e+PWrVtwc3Mr17VZ2SAiIhKZTEv/adONGzeQlpYGPz8/dZu1tTV8fHwQFxcHAIiLi4ONjY060QAAPz8/yOVynDp1qtzXqpZzNoiIiKojpVIJpVKp0aZQKKBQKCp8rrS0NACAs7OzRruzs7N6X1paGpycnDT2Gxoaws7OTt2nPFjZICIiEplMpp0tMjIS1tbWGltkZKTUt/dMrGwQERGJTFsDIOHh4QgLC9Noe56qBgC4uLgAANLT0+Hq6qpuT09PR4sWLdR9MjIyNI57+PAh7t69qz6+PFjZICIiqiIUCgWsrKw0tudNNurUqQMXFxccPnxY3Zabm4tTp07B19cXAODr64vs7GwkJCSo+xw5cgQqlQo+Pj7lvhYrG0RERGKTaGHe/fv3ce3aNfXnGzduIDExEXZ2dnB3d8fEiRPx0UcfoUGDBqhTpw5mzpwJNzc39YqVxo0bo2fPnhg7diyio6NRXFyM0NBQDB06tNwrUQAmG0RERKKT6q2vZ86cQZcuXdSfHw/BBAUFYePGjXj//feRn5+PcePGITs7G+3bt8e+fftgYmKiPmbz5s0IDQ1Ft27d1A/1Wr58eYXi4HM2SFR8zoZu4XM2dAefs6E7KuM5Gwl/5mrlPN61rbRynsrGORtEREQkKg6jEBERiUzfa7xMNoiIiMSm59kGh1GIiIhIVKxsEBERiUyq1Si6gskGERGRyGT6nWtwGIWIiIjExcoGERGRyPS8sMFkg4iISHR6nm1wGIWIiIhExcoGERGRyLgahYiIiESl76tRmGwQERGJTM9zDc7ZICIiInGxskFERCQ2PS9tMNkgIiISmb5PEOUwChEREYmKlQ0iIiKRcTUKERERiUrPcw0OoxAREZG4WNkgIiISm56XNphsEBERiYyrUYiIiIhExMoGERGRyLgahYiIiESl57kGkw0iIiLR6Xm2wTkbREREJCpWNoiIiESm76tRmGwQERGJTN8niHIYhYiIiETFyobEzibEY9PG9bh8OQmZd+5g0ZIV6NzVDwDwsLgYq1cuw6+/HMfft27BwtICr/j4YvyEyXB0cpI48uovwL8rUm/fLtX+6mvDED5jlgQR6Y+sOxnYuHYZEk79CmVhIVxfqokJH0SgQaMmAIAlkbNwZN8ujWNefqUt5ixcJUW41daU0d0x773+WLn5J0xd9B1srcwwMzgA3do0Qk0XW2Teu49dR89jzurdyL1fqD6upostlk1/DZ1aNcT9AiU27zqFmSt+QEmJSsK7kZaeFzaYbEitoKAADTw80C9wIKaGvaexr7CwEFeuXMJb44LRwKMR8nJzsOiTSIRNeBebtmyTKGL98dWWbShRlag/X796FcHj3kR3f38Jo6r+7ufl4v3QUfBq0RoRUSthZWOL27dSYGFppdHv5VfaYuIHc9SfjYyNKzvUas3b0x1jBrXD+d9vqdtcHa3h6miN8CU7cPmPNLi72mHFh0Ph6miNYVPXAQDkchm2Lw9GelYuuoxaDBdHa3wxbwSKH5Zg9spdT7pc9afn2QaTDYm1a98R7dp3LHOfhaUlVq9dr9H2fvgMBA0fgrTU23BxdauMEPWWrZ2dxucN6z5HjZru8G71ikQR6YdtsRvg4OiCieH/TyRcXF8q1c/I2Bi29g6VGZreMDc1xob5o/DuvC344K2e6vZL11Px+pQv1J9v3MpExMpdWP/xSBgYyFFSooKfb2M0ruuCgHdWIONuHs7//jfmrt6Dj97rj4+if0Txw5KyLknVHOdsVDH37+dBJpOV+i2PxFVcXIS9u39A/wEDIdP3mV4iO/3rMdRv5IkFs6bijf5dMWHMUOzftb1Uv4uJZ/BG/654541ArF78MXJzsis/2Gpqafhr2PfzRfx0KvmZfa0sTZCbX6geIvFpVgcXr91Gxt08dZ+DJy7D2tIUnvVcRYtZ18m09F9FREREQCaTaWyNGjVS7y8sLERISAjs7e1hYWGBQYMGIT09Xdu3DkAHKhuXL1/GyZMn4evri0aNGuHKlStYtmwZlEol3njjDXTt2lXqEHWGUqnEiqWL4d8rABYWFlKHo1d+OnwYeXl56Nd/gNShVHtpqX9j7/ffIvDVN/DqG2Nw9UoSPlseBUMjQ3Tr2Q8A4P1KW7Tt2BXOLi8h9fYtbPp8BSLeD8XC1TEwMDCQ+A6qtlf9vdGiUU20fyPqmX3tbcwRPrYX1n93Qt3mbG+FjKw8jX4Zd3Mf7XOwAp6dv1RLUv2O0qRJExw6dEj92dDw/z/2J02ahD179uDbb7+FtbU1QkNDMXDgQPz6669aj0PSZGPfvn3o378/LCws8ODBA+zYsQMjR45E8+bNoVKp0KNHDxw4cOCpCYdSqYRSqdRoKxKMoFAoxA6/Uj0sLsYHUydBEAR88OFsqcPROzt3bEPb9h3g6OQsdSjVnqBSob6HJ0aOGw8AqNewEf66cQ17v9+mTjY6dvt/ab92vQaoU68Bxr7eFxcTz6C5t48kcVcHNZxtsHDqIPQJXgll0cOn9rU0N8GO5cG4/EcqPlq7p5IipIoyNDSEi4tLqfacnBysW7cOsbGx6p+xGzZsQOPGjXHy5Em0adNGq3FIOowyd+5cTJ06FVlZWdiwYQOGDRuGsWPH4uDBgzh8+DCmTp2KBQsWPPUckZGRsLa21tgWL3z6MVXN40QjLfU2Vq1dx6pGJbt9+2+cPhmHAQNflToUvWBr74CatetqtNWsVQd3MtKeeIyLWw1YWdvg9t83xQ6vWmvZ2B3O9laIi52GvPhlyItfho6tGuDd1zshL34Z5PJHv55bmCnww6p3kfegEK+FfY6HD/+/yiQ9KxdO9pYa53WyezTsm56ZW3k3o2NkWtoq6urVq3Bzc0PdunUxfPhwpKSkAAASEhJQXFwMPz8/dd9GjRrB3d0dcXFxz3eTTyFpZSMpKQlffvklAGDIkCEYMWIEBg8erN4/fPhwbNiw4annCA8PR1hYmEZbkWCk/WAl8jjRSEn5C2u/iIGNja3UIemdH3Zuh52dPdp37CR1KHqhcdMW+DvlL422v2+lwMn5yeP9mRnpyMvNgR0njL6Qn04nw3vwxxptn815A8k30rF440GoVAIszU2wa3UIlEUPMXji2lIVkFPnb2DaGH842lrgzr37AIBubRohJ68Al/94csJY7WlpGKWsar5CoSizmu/j44ONGzfCw8MDqampmDNnDjp06ICLFy8iLS0NxsbGsLGx0TjG2dkZaWna/z5JPmfj8WQ7uVwOExMTWFtbq/dZWloiJyfnqceX9UXOK6w6a7kfPMjHzX8yTQD4++9bSL5yGdbW1nBwcMT7UyYi+fIlLFmxBiWqEmRm3gEAWFtbw8iIS/3EplKp8MPOHejTL1BjrJPE0//VN/B+yCh8s2kd2nfpjt8vJ2H/ru8QOmUmAKDgwQNsiVmLth27wdbOAWm3b2JD9DK4vlQTL7duK3H0Vdv9B0pcup6q0ZZfUIS7Ofm4dD0VluYm2L06BKYmxhj9YQyszE1gZW4CALhz7z5UKgGH4i7j8h9pWPdRED5cthPO9laYHdIHa785jqLipw/NVGfaelx5ZGQk5syZo9E2e/ZsRERElOrbq1cv9Z+bNWsGHx8f1KpVC9988w1MTU21Ek95SfqvZ+3atXH16lXUq1cPABAXFwd3d3f1/pSUFLi6Vu/Zy5eSkvDOW0Hqz0sWfQIA6NMvEOPeCcXxo0cAAMOGaE5MjP4iBq1acwmm2E6dPIG01NvoP2Cg1KHojYaNm2D6R4vx5WcrsPXLz+Ds8hLGhk5F5+69AQByAzn+vH4VR/btQv79PNg5OKJlK18MH/Mun7UhshaNauKVZnUAAJd2RWjs8+g9Cympd6FSCRg0YQ2WTR+KoxsnI79Qic27TmPuGs7r0IayqvnlnaNoY2ODhg0b4tq1a+jevTuKioqQnZ2tUd1IT08vc47Hi5IJgiBo/azlFB0djZo1ayIgIKDM/dOnT0dGRga++OKLMvc/SVWqbFR3j8d4STf8fbdA6hDoH817vS91CPSPgnMrRb9Gyl3lszuVg7vd8y9+uH//Ptzd3REREYGgoCA4Ojpiy5YtGDRoEAAgOTkZjRo1QlxcnNYniEqabIiFyYbuYLKhW5hs6A4mG7qjMpKNm1pKNmpWINmYMmUK+vbti1q1auH27duYPXs2EhMTcenSJTg6OiI4OBg//vgjNm7cCCsrK4wf/2gF2IkTJ55x5orjIDQREVE1dOvWLbz++uvIysqCo6Mj2rdvj5MnT8LR0REAsGTJEsjlcgwaNAhKpRL+/v5YvXq1KLGwskGiYmVDt7CyoTtY2dAdlVHZuHVPO5WNGrZV8xlSrGwQERGJTr9/8eK7UYiIiEhUrGwQERGJTN/f38hkg4iISGR6nmtwGIWIiIjExcoGERGRyDiMQkRERKLS1rtRqiomG0RERGLT71yDczaIiIhIXKxsEBERiUzPCxtMNoiIiMSm7xNEOYxCREREomJlg4iISGRcjUJERETi0u9cg8MoREREJC5WNoiIiESm54UNJhtERERi42oUIiIiIhGxskFERCQyrkYhIiIiUXEYhYiIiEhETDaIiIhIVBxGISIiEpm+D6Mw2SAiIhKZvk8Q5TAKERERiYqVDSIiIpFxGIWIiIhEpee5BodRiIiISFysbBAREYlNz0sbTDaIiIhExtUoRERERCJiZYOIiEhkXI1CREREotLzXIPDKERERKKTaWl7DqtWrULt2rVhYmICHx8fnD59+oVu5Xkw2SAiIqqmvv76a4SFhWH27Nk4e/YsmjdvDn9/f2RkZFRqHEw2iIiIRCbT0n8V9emnn2Ls2LEYPXo0PD09ER0dDTMzM6xfv16Eu3wyJhtEREQik8m0s1VEUVEREhIS4Ofnp26Ty+Xw8/NDXFyclu/w6ThBlIiIqIpQKpVQKpUabQqFAgqFolTfzMxMlJSUwNnZWaPd2dkZV65cETXO/6qWyYalSdUv2CiVSkRGRiI8PLzM/4mo8lSn70VDFzOpQ3gh1el7UXBupdQhvJDq9L2oDCZa+mkb8VEk5syZo9E2e/ZsREREaOcCIpEJgiBIHQSVlpubC2tra+Tk5MDKykrqcPQavxe6g98L3cHvhTQqUtkoKiqCmZkZtm3bhsDAQHV7UFAQsrOz8f3334sdrlrVLwEQERHpCYVCASsrK43tSZUlY2NjeHt74/Dhw+o2lUqFw4cPw9fXt7JCBlBNh1GIiIgICAsLQ1BQEFq1aoVXXnkFS5cuRX5+PkaPHl2pcTDZICIiqqZee+013LlzB7NmzUJaWhpatGiBffv2lZo0KjYmGzpKoVBg9uzZnHilA/i90B38XugOfi+qjtDQUISGhkoaAyeIEhERkag4QZSIiIhExWSDiIiIRMVkg4iIiETFZIOIiIhExWRDB61atQq1a9eGiYkJfHx8cPr0aalD0kvHjx9H37594ebmBplMhp07d0odkt6KjIxE69atYWlpCScnJwQGBiI5OVnqsPTSmjVr0KxZM/UDpXx9fbF3716pwyIdx2RDx3z99dcICwvD7NmzcfbsWTRv3hz+/v7IyMiQOjS9k5+fj+bNm2PVqlVSh6L3jh07hpCQEJw8eRIHDx5EcXExevTogfz8fKlD0zs1atTAggULkJCQgDNnzqBr167o378/kpKSpA6NdBiXvuoYHx8ftG7dGitXPnpJk0qlQs2aNTF+/Hh88MEHEkenv2QyGXbs2KHxfgGSzp07d+Dk5IRjx46hY8eOUoej9+zs7LBw4UKMGTNG6lBIR7GyoUOKioqQkJAAPz8/dZtcLoefnx/i4uIkjIxIt+Tk5AB49EOOpFNSUoKtW7ciPz+/0t+1QVULnyCqQzIzM1FSUlLqMbLOzs64cuWKRFER6RaVSoWJEyeiXbt2aNq0qdTh6KULFy7A19cXhYWFsLCwwI4dO+Dp6Sl1WKTDmGwQUZUSEhKCixcv4pdffpE6FL3l4eGBxMRE5OTkYNu2bQgKCsKxY8eYcNATMdnQIQ4ODjAwMEB6erpGe3p6OlxcXCSKikh3hIaGYvfu3Th+/Dhq1KghdTh6y9jYGPXr1wcAeHt7Iz4+HsuWLcPatWsljox0Feds6BBjY2N4e3vj8OHD6jaVSoXDhw9zPJT0miAICA0NxY4dO3DkyBHUqVNH6pDoX1QqFZRKpdRhkA5jZUPHhIWFISgoCK1atcIrr7yCpUuXIj8/H6NHj5Y6NL1z//59XLt2Tf35xo0bSExMhJ2dHdzd3SWMTP+EhIQgNjYW33//PSwtLZGWlgYAsLa2hqmpqcTR6Zfw8HD06tUL7u7uyMvLQ2xsLI4ePYr9+/dLHRrpMC591UErV67EwoULkZaWhhYtWmD58uXw8fGROiy9c/ToUXTp0qVUe1BQEDZu3Fj5AekxmUxWZvuGDRswatSoyg1Gz40ZMwaHDx9GamoqrK2t0axZM0ybNg3du3eXOjTSYUw2iIiISFScs0FERESiYrJBREREomKyQURERKJiskFERESiYrJBREREomKyQURERKJiskFERESiYrJBVA2NGjUKgYGB6s+dO3fGxIkTKz2Oo0ePQiaTITs7u9KvTUS6g8kGUSUaNWoUZDIZZDKZ+mVWc+fOxcOHD0W97vbt2zFv3rxy9WWCQETaxnejEFWynj17YsOGDVAqlfjxxx8REhICIyMjhIeHa/QrKiqCsbGxVq5pZ2enlfMQET0PVjaIKplCoYCLiwtq1aqF4OBg+Pn54YcfflAPfXz88cdwc3ODh4cHAODmzZsYMmQIbGxsYGdnh/79++PPP/9Un6+kpARhYWGwsbGBvb093n//ffz3LQT/HUZRKpWYNm0aatasCYVCgfr162PdunX4888/1e+DsbW1hUwmU797RKVSITIyEnXq1IGpqSmaN2+Obdu2aVznxx9/RMOGDWFqaoouXbpoxElE+ovJBpHETE1NUVRUBAA4fPgwkpOTcfDgQezevRvFxcXw9/eHpaUlfv75Z/z666+wsLBAz5491ccsXrwYGzduxPr16/HLL7/g7t272LFjx1OvOXLkSGzZsgXLly/H5cuXsXbtWlhYWKBmzZr47rvvAADJyclITU3FsmXLAACRkZH48ssvER0djaSkJEyaNAlvvPEGjh07BuBRUjRw4ED07dsXiYmJeOutt/DBBx+I9WUjoqpEIKJKExQUJPTv318QBEFQqVTCwYMHBYVCIUyZMkUICgoSnJ2dBaVSqe6/adMmwcPDQ1CpVOo2pVIpmJqaCvv37xcEQRBcXV2FqKgo9f7i4mKhRo0a6usIgiB06tRJmDBhgiAIgpCcnCwAEA4ePFhmjD/99JMAQLh37566rbCwUDAzMxNOnDih0XfMmDHC66+/LgiCIISHhwuenp4a+6dNm1bqXESkfzhng6iS7d69GxYWFiguLoZKpcKwYcMQERGBkJAQeHl5aczT+O2333Dt2jVYWlpqnKOwsBDXr19HTk4OUlNT4ePjo95naGiIVq1alRpKeSwxMREGBgbo1KlTuWO+du0aHjx4UOo14kVFRWjZsiUA4PLlyxpxAICvr2+5r0FE1ReTDaJK1qVLF6xZswbGxsZwc3ODoeH//xqam5tr9L1//z68vb2xefPmUudxdHR8ruubmppW+Jj79+8DAPbs2YOXXnpJY59CoXiuOIhIfzDZIKpk5ubmqF+/frn6vvzyy/j666/h5OQEKyurMvu4urri1KlT6NixIwDg4cOHSEhIwMsvv1xmfy8vL6hUKhw7dgx+fn6l9j+urJSUlKjbPD09oVAokJKS8sSKSOPGjfHDDz9otJ08efLZN0lE1R4niBLpsOHDh8PBwQH9+/fHzz//jBs3buDo0aN47733cOvWLQDAhAkTsGDBAuzcuRNXrlzBu++++9RnZNSuXRtBQUF48803sXPnTvU5v/nmGwBArVq1IJPJsHv3bty5cwf379+HpaUlpkyZgkmTJiEmJgbXr1/H2bNnsWLFCsTExAAA3nnnHVy9ehVTp05FcnIyYmNjsXHjRrG/RERUBTDZINJhZmZmOH78ONzd3TFw4EA0btwYY8aMQWFhobrSMXnyZIwYMQJBQUHw9fWFpaUlBgwY8NTzrlmzBoMHD8a7776LRo0aYezYscjPzwcAvPTSS5gzZw4++OADODs7IzQ0FAAwb948zJw5E5GRkWjcuDF69uyJPXv2oE6dOgAAd3d3fPfdd9i5cyeaN2+O6OhozJ8/X8SvDhFVFTLhSbPIiIiIiLSAlQ0iIiISFZMNIiIiEhWTDSIiIhIVkw0iIiISFZMNIiIiEhWTDSIiIhIVkw0iIiISFZMNIiIiEhWTDSIiIhIVkw0iIiISFZMNIiIiEhWTDSIiIhLV/wC3fYdtJ9SRLAAAAABJRU5ErkJggg==", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", + "from sklearn.compose import ColumnTransformer\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "from sklearn.metrics import classification_report, confusion_matrix, accuracy_score\n", + "import seaborn as sns\n", + "import matplotlib.pyplot as plt\n", + "\n", + "\n", + "# Загружаем набор данных\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", + "\n", + "# Устанавливаем случайное состояние\n", + "random_state = 42\n", + "\n", + "\n", + "# Предобработка данных\n", + "# Определяем категориальные и числовые столбцы\n", + "categorical_features = ['employment_type', 'job_title', 'employee_residence', 'company_location', 'company_size']\n", + "numeric_features = ['work_year', 'salary_in_usd', 'remote_ratio']\n", + "\n", + "# Создаем пайплайн для обработки данных\n", + "preprocessor = ColumnTransformer(\n", + " transformers=[\n", + " ('num', StandardScaler(), numeric_features),\n", + " ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)])\n", + "\n", + "# Определяем целевую переменную и признаки\n", + "X = df.drop('experience_level', axis=1)\n", + "y = df['experience_level']\n", + "\n", + "# Разделяем данные на обучающую и тестовую выборки\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=random_state)\n", + "\n", + "# Создаем и обучаем модель\n", + "model = Pipeline(steps=[\n", + " ('preprocessor', preprocessor),\n", + " ('classifier', RandomForestClassifier(random_state=random_state))])\n", + "\n", + "model.fit(X_train, y_train)\n", + "\n", + "# Делаем предсказания на тестовой выборке\n", + "y_pred = model.predict(X_test)\n", + "\n", + "# Оцениваем качество модели\n", + "print(\"Classification Report:\")\n", + "print(classification_report(y_test, y_pred))\n", + "\n", + "print(\"Confusion Matrix:\")\n", + "print(confusion_matrix(y_test, y_pred))\n", + "\n", + "print(f\"Accuracy Score: {accuracy_score(y_test, y_pred)}\")\n", + "\n", + "# Визуализация результатов\n", + "conf_matrix = confusion_matrix(y_test, y_pred)\n", + "sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')\n", + "plt.xlabel('Predicted')\n", + "plt.ylabel('Actual')\n", + "plt.title('Confusion Matrix')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Ориентир**\n" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "MAE: 37795.639591701794\n", + "MSE: 2482079980.9527493\n", + "RMSE: 49820.47752634201\n", + "R²: 0.37127352660208646\n", + "Ориентиры для предсказания заработной платы не достигнуты.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\metrics\\_regression.py:492: FutureWarning: 'squared' is deprecated in version 1.4 and will be removed in 1.6. To calculate the root mean squared error, use the function'root_mean_squared_error'.\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", + "from sklearn.compose import ColumnTransformer\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n", + "\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", + "\n", + "# Предобработка данных\n", + "categorical_features = ['experience_level', 'employment_type', 'job_title', 'employee_residence', 'company_location', 'company_size']\n", + "numeric_features = ['work_year', 'remote_ratio']\n", + "\n", + "preprocessor = ColumnTransformer(\n", + " transformers=[\n", + " ('num', StandardScaler(), numeric_features),\n", + " ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)])\n", + "\n", + "X = df.drop('salary_in_usd', axis=1)\n", + "y = df['salary_in_usd']\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", + "\n", + "model = Pipeline(steps=[\n", + " ('preprocessor', preprocessor),\n", + " ('regressor', LinearRegression())])\n", + "\n", + "model.fit(X_train, y_train)\n", + "\n", + "y_pred = model.predict(X_test)\n", + "\n", + "mae = mean_absolute_error(y_test, y_pred)\n", + "mse = mean_squared_error(y_test, y_pred)\n", + "rmse = mean_squared_error(y_test, y_pred, squared=False)\n", + "r2 = r2_score(y_test, y_pred)\n", + "\n", + "print(f\"MAE: {mae}\")\n", + "print(f\"MSE: {mse}\")\n", + "print(f\"RMSE: {rmse}\")\n", + "print(f\"R²: {r2}\")\n", + "\n", + "# Проверяем, достигнуты ли ориентиры\n", + "if r2 >= 0.75 and mae <= 15000 and rmse <= 20000:\n", + " print(\"Ориентиры для предсказания заработной платы достигнуты!\")\n", + "else:\n", + " print(\"Ориентиры для предсказания заработной платы не достигнуты.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Accuracy: 0.7217043941411452\n", + "Classification Report:\n", + " precision recall f1-score support\n", + "\n", + " EN 0.55 0.48 0.51 67\n", + " EX 0.46 0.26 0.33 23\n", + " MI 0.48 0.54 0.51 157\n", + " SE 0.83 0.83 0.83 504\n", + "\n", + " accuracy 0.72 751\n", + " macro avg 0.58 0.53 0.55 751\n", + "weighted avg 0.72 0.72 0.72 751\n", + "\n", + "Confusion Matrix:\n", + "[[ 32 0 20 15]\n", + " [ 0 6 5 12]\n", + " [ 14 0 84 59]\n", + " [ 12 7 65 420]]\n", + "Ориентиры для классификации уровня опыта не достигнуты.\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", + "from sklearn.compose import ColumnTransformer\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "from sklearn.metrics import accuracy_score, classification_report, confusion_matrix\n", + "\n", + "# Загружаем набор данных\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", + "\n", + "# Предобработка данных\n", + "categorical_features = ['employment_type', 'job_title', 'employee_residence', 'company_location', 'company_size']\n", + "numeric_features = ['work_year', 'salary_in_usd', 'remote_ratio']\n", + "\n", + "preprocessor = ColumnTransformer(\n", + " transformers=[\n", + " ('num', StandardScaler(), numeric_features),\n", + " ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)])\n", + "\n", + "X = df.drop('experience_level', axis=1)\n", + "y = df['experience_level']\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", + "\n", + "model = Pipeline(steps=[\n", + " ('preprocessor', preprocessor),\n", + " ('classifier', RandomForestClassifier(random_state=42))])\n", + "\n", + "model.fit(X_train, y_train)\n", + "\n", + "y_pred = model.predict(X_test)\n", + "\n", + "accuracy = accuracy_score(y_test, y_pred)\n", + "print(f\"Accuracy: {accuracy}\")\n", + "\n", + "print(\"Classification Report:\")\n", + "print(classification_report(y_test, y_pred))\n", + "\n", + "print(\"Confusion Matrix:\")\n", + "print(confusion_matrix(y_test, y_pred))\n", + "\n", + "# Проверяем, достигнуты ли ориентиры\n", + "if accuracy >= 0.80:\n", + " print(\"Ориентиры для классификации уровня опыта достигнуты!\")\n", + "else:\n", + " print(\"Ориентиры для классификации уровня опыта не достигнуты.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Конвейер" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "from sklearn.base import BaseEstimator, TransformerMixin\n", + "from sklearn.compose import ColumnTransformer\n", + "from sklearn.impute import SimpleImputer\n", + "from sklearn.preprocessing import OneHotEncoder, StandardScaler\n", + "from sklearn.pipeline import Pipeline\n", + "\n", + "# Определение столбцов\n", + "numeric_columns = [\"work_year\", \"salary\", \"salary_in_usd\", \"remote_ratio\"]\n", + "cat_columns = [\"experience_level\", \"employment_type\", \"job_title\", \"salary_currency\", \"employee_residence\", \"company_location\", \"company_size\"]\n", + "\n", + "# Обработка числовых данных: заполнение пропущенных значений медианой и стандартизация\n", + "preprocessing_num_class = Pipeline(steps=[\n", + " ('imputer', SimpleImputer(strategy='median')),\n", + " ('scaler', StandardScaler())\n", + "])\n", + "\n", + "# Обработка категориальных данных: заполнение пропущенных значений наиболее частым значением и one-hot encoding\n", + "preprocessing_cat_class = Pipeline(steps=[\n", + " ('imputer', SimpleImputer(strategy='most_frequent')),\n", + " ('onehot', OneHotEncoder(handle_unknown='ignore'))\n", + "])\n", + "\n", + "# Объединение всех преобразований в один ColumnTransformer\n", + "features_preprocessing = ColumnTransformer(\n", + " verbose_feature_names_out=False,\n", + " transformers=[\n", + " (\"prepocessing_num\", preprocessing_num_class, numeric_columns),\n", + " (\"prepocessing_cat\", preprocessing_cat_class, cat_columns),\n", + " ],\n", + " remainder=\"passthrough\"\n", + ")\n", + "\n", + "# Определение конвейера\n", + "pipeline_end = Pipeline(\n", + " [\n", + " (\"features_preprocessing\", features_preprocessing),\n", + " ]\n", + ")\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "ename": "NameError", + "evalue": "name 'train_test_split' is not defined", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[5], line 2\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[38;5;66;03m# Разделение данных на тренировочный и тестовый наборы\u001b[39;00m\n\u001b[1;32m----> 2\u001b[0m X_train, X_test \u001b[38;5;241m=\u001b[39m \u001b[43mtrain_test_split\u001b[49m(df, test_size\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m0.2\u001b[39m, random_state\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m42\u001b[39m)\n\u001b[0;32m 4\u001b[0m \u001b[38;5;66;03m# Применение конвейера для предобработки данных\u001b[39;00m\n\u001b[0;32m 5\u001b[0m preprocessing_result \u001b[38;5;241m=\u001b[39m pipeline_end\u001b[38;5;241m.\u001b[39mfit_transform(X_train)\n", + "\u001b[1;31mNameError\u001b[0m: name 'train_test_split' is not defined" + ] + } + ], + "source": [ + "# Разделение данных на тренировочный и тестовый наборы\n", + "X_train, X_test = train_test_split(df, test_size=0.2, random_state=42)\n", + "\n", + "# Применение конвейера для предобработки данных\n", + "preprocessing_result = pipeline_end.fit_transform(X_train)\n", + "\n", + "# Получение имен столбцов после преобразования\n", + "feature_names = pipeline_end.named_steps['features_preprocessing'].get_feature_names_out()\n", + "\n", + "# Создание DataFrame с преобразованными данными\n", + "preprocessed_df = pd.DataFrame(\n", + " preprocessing_result,\n", + " columns=feature_names,\n", + ")\n", + "\n", + "# Вывод преобразованного DataFrame\n", + "print(preprocessed_df)" + ] } ], "metadata": { -- 2.25.1 From 5bfab95a94ad510bc810bd58ad89a8b211f06448 Mon Sep 17 00:00:00 2001 From: kaznacheeva Date: Sat, 23 Nov 2024 12:21:27 +0400 Subject: [PATCH 3/3] =?UTF-8?q?=D0=BA=D0=BE=D0=BC=D0=B8=D1=82?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- lab_4/Lab4.ipynb | 1085 +++++++++++++++++++++++----------------------- 1 file changed, 546 insertions(+), 539 deletions(-) diff --git a/lab_4/Lab4.ipynb b/lab_4/Lab4.ipynb index 814ef70..c43967a 100644 --- a/lab_4/Lab4.ipynb +++ b/lab_4/Lab4.ipynb @@ -337,9 +337,16 @@ "print(df.isnull().any())" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Классификация" + ] + }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -1248,45 +1255,104 @@ "plt.show()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Теперь перейдем к делению на выборки и созданию ориентира" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Размер обучающей выборки: (3004, 10)\n", + "Размер тестовой выборки: (751, 10)\n", + "Baseline Accuracy: 0.5126498002663116\n", + "Baseline F1 Score: 0.3474826991241725\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.metrics import accuracy_score, f1_score\n", + "\n", + "# Загрузка данных\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", + "\n", + "# Создание целевого признака\n", + "median_salary = df['salary_in_usd'].median()\n", + "df['above_median_salary'] = np.where(df['salary_in_usd'] > median_salary, 1, 0)\n", + "\n", + "# Разделение на признаки и целевую переменную\n", + "features = ['work_year', 'experience_level', 'employment_type', 'job_title', 'salary', 'salary_currency', 'remote_ratio', 'employee_residence', 'company_location', 'company_size']\n", + "target = 'above_median_salary'\n", + "\n", + "# Разделение данных на тренировочный и тестовый наборы\n", + "X_train, X_test, y_train, y_test = train_test_split(df[features], df[target], test_size=0.2, random_state=42, stratify=df[target])\n", + "\n", + "print(\"Размер обучающей выборки:\", X_train.shape)\n", + "print(\"Размер тестовой выборки:\", X_test.shape)\n", + "\n", + "# Создание ориентира (baseline)\n", + "baseline_threshold = y_train.mean()\n", + "baseline_predictions = [1 if pred > baseline_threshold else 0 for pred in [baseline_threshold] * len(y_test)]\n", + "\n", + "# Вычисление метрик для ориентира\n", + "baseline_accuracy = accuracy_score(y_test, baseline_predictions)\n", + "baseline_f1 = f1_score(y_test, baseline_predictions, average='weighted')\n", + "\n", + "print('Baseline Accuracy:', baseline_accuracy)\n", + "print('Baseline F1 Score:', baseline_f1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Создание конвейера и обучение моделей" + ] + }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { - "ename": "IndexError", - "evalue": "Index dimension must be 1 or 2", - "output_type": "error", - "traceback": [ - "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[1;31mIndexError\u001b[0m Traceback (most recent call last)", - "Cell \u001b[1;32mIn[14], line 71\u001b[0m\n\u001b[0;32m 62\u001b[0m pipeline_end \u001b[38;5;241m=\u001b[39m Pipeline(\n\u001b[0;32m 63\u001b[0m [\n\u001b[0;32m 64\u001b[0m (\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfeatures_preprocessing\u001b[39m\u001b[38;5;124m\"\u001b[39m, features_preprocessing),\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 67\u001b[0m ]\n\u001b[0;32m 68\u001b[0m )\n\u001b[0;32m 70\u001b[0m \u001b[38;5;66;03m# Демонстрация работы конвейера для предобработки данных при классификации\u001b[39;00m\n\u001b[1;32m---> 71\u001b[0m preprocessing_result \u001b[38;5;241m=\u001b[39m \u001b[43mpipeline_end\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfit_transform\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX_train\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 72\u001b[0m preprocessed_df \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mDataFrame(\n\u001b[0;32m 73\u001b[0m preprocessing_result,\n\u001b[0;32m 74\u001b[0m columns\u001b[38;5;241m=\u001b[39mpipeline_end\u001b[38;5;241m.\u001b[39mget_feature_names_out(),\n\u001b[0;32m 75\u001b[0m )\n\u001b[0;32m 77\u001b[0m preprocessed_df\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\base.py:1473\u001b[0m, in \u001b[0;36m_fit_context..decorator..wrapper\u001b[1;34m(estimator, *args, **kwargs)\u001b[0m\n\u001b[0;32m 1466\u001b[0m estimator\u001b[38;5;241m.\u001b[39m_validate_params()\n\u001b[0;32m 1468\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m config_context(\n\u001b[0;32m 1469\u001b[0m skip_parameter_validation\u001b[38;5;241m=\u001b[39m(\n\u001b[0;32m 1470\u001b[0m prefer_skip_nested_validation \u001b[38;5;129;01mor\u001b[39;00m global_skip_validation\n\u001b[0;32m 1471\u001b[0m )\n\u001b[0;32m 1472\u001b[0m ):\n\u001b[1;32m-> 1473\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfit_method\u001b[49m\u001b[43m(\u001b[49m\u001b[43mestimator\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\pipeline.py:533\u001b[0m, in \u001b[0;36mPipeline.fit_transform\u001b[1;34m(self, X, y, **params)\u001b[0m\n\u001b[0;32m 490\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Fit the model and transform with the final estimator.\u001b[39;00m\n\u001b[0;32m 491\u001b[0m \n\u001b[0;32m 492\u001b[0m \u001b[38;5;124;03mFit all the transformers one after the other and sequentially transform\u001b[39;00m\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 530\u001b[0m \u001b[38;5;124;03m Transformed samples.\u001b[39;00m\n\u001b[0;32m 531\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m 532\u001b[0m routed_params \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_check_method_params(method\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfit_transform\u001b[39m\u001b[38;5;124m\"\u001b[39m, props\u001b[38;5;241m=\u001b[39mparams)\n\u001b[1;32m--> 533\u001b[0m Xt \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_fit\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrouted_params\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 535\u001b[0m last_step \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_final_estimator\n\u001b[0;32m 536\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m _print_elapsed_time(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mPipeline\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_log_message(\u001b[38;5;28mlen\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39msteps) \u001b[38;5;241m-\u001b[39m \u001b[38;5;241m1\u001b[39m)):\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\pipeline.py:406\u001b[0m, in \u001b[0;36mPipeline._fit\u001b[1;34m(self, X, y, routed_params)\u001b[0m\n\u001b[0;32m 404\u001b[0m cloned_transformer \u001b[38;5;241m=\u001b[39m clone(transformer)\n\u001b[0;32m 405\u001b[0m \u001b[38;5;66;03m# Fit or load from cache the current transformer\u001b[39;00m\n\u001b[1;32m--> 406\u001b[0m X, fitted_transformer \u001b[38;5;241m=\u001b[39m \u001b[43mfit_transform_one_cached\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m 407\u001b[0m \u001b[43m \u001b[49m\u001b[43mcloned_transformer\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 408\u001b[0m \u001b[43m \u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 409\u001b[0m \u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 410\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[0;32m 411\u001b[0m \u001b[43m \u001b[49m\u001b[43mmessage_clsname\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mPipeline\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[0;32m 412\u001b[0m \u001b[43m \u001b[49m\u001b[43mmessage\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_log_message\u001b[49m\u001b[43m(\u001b[49m\u001b[43mstep_idx\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 413\u001b[0m \u001b[43m \u001b[49m\u001b[43mparams\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrouted_params\u001b[49m\u001b[43m[\u001b[49m\u001b[43mname\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 414\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 415\u001b[0m \u001b[38;5;66;03m# Replace the transformer of the step with the fitted\u001b[39;00m\n\u001b[0;32m 416\u001b[0m \u001b[38;5;66;03m# transformer. This is necessary when loading the transformer\u001b[39;00m\n\u001b[0;32m 417\u001b[0m \u001b[38;5;66;03m# from the cache.\u001b[39;00m\n\u001b[0;32m 418\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39msteps[step_idx] \u001b[38;5;241m=\u001b[39m (name, fitted_transformer)\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\joblib\\memory.py:312\u001b[0m, in \u001b[0;36mNotMemorizedFunc.__call__\u001b[1;34m(self, *args, **kwargs)\u001b[0m\n\u001b[0;32m 311\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m__call__\u001b[39m(\u001b[38;5;28mself\u001b[39m, \u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[1;32m--> 312\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\pipeline.py:1310\u001b[0m, in \u001b[0;36m_fit_transform_one\u001b[1;34m(transformer, X, y, weight, message_clsname, message, params)\u001b[0m\n\u001b[0;32m 1308\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m _print_elapsed_time(message_clsname, message):\n\u001b[0;32m 1309\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(transformer, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfit_transform\u001b[39m\u001b[38;5;124m\"\u001b[39m):\n\u001b[1;32m-> 1310\u001b[0m res \u001b[38;5;241m=\u001b[39m \u001b[43mtransformer\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfit_transform\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mparams\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mfit_transform\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m{\u001b[49m\u001b[43m}\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 1311\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m 1312\u001b[0m res \u001b[38;5;241m=\u001b[39m transformer\u001b[38;5;241m.\u001b[39mfit(X, y, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mparams\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfit\u001b[39m\u001b[38;5;124m\"\u001b[39m, {}))\u001b[38;5;241m.\u001b[39mtransform(\n\u001b[0;32m 1313\u001b[0m X, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mparams\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtransform\u001b[39m\u001b[38;5;124m\"\u001b[39m, {})\n\u001b[0;32m 1314\u001b[0m )\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\utils\\_set_output.py:316\u001b[0m, in \u001b[0;36m_wrap_method_output..wrapped\u001b[1;34m(self, X, *args, **kwargs)\u001b[0m\n\u001b[0;32m 314\u001b[0m \u001b[38;5;129m@wraps\u001b[39m(f)\n\u001b[0;32m 315\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mwrapped\u001b[39m(\u001b[38;5;28mself\u001b[39m, X, \u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[1;32m--> 316\u001b[0m data_to_wrap \u001b[38;5;241m=\u001b[39m \u001b[43mf\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 317\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(data_to_wrap, \u001b[38;5;28mtuple\u001b[39m):\n\u001b[0;32m 318\u001b[0m \u001b[38;5;66;03m# only wrap the first output for cross decomposition\u001b[39;00m\n\u001b[0;32m 319\u001b[0m return_tuple \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m 320\u001b[0m _wrap_data_with_container(method, data_to_wrap[\u001b[38;5;241m0\u001b[39m], X, \u001b[38;5;28mself\u001b[39m),\n\u001b[0;32m 321\u001b[0m \u001b[38;5;241m*\u001b[39mdata_to_wrap[\u001b[38;5;241m1\u001b[39m:],\n\u001b[0;32m 322\u001b[0m )\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\base.py:1098\u001b[0m, in \u001b[0;36mTransformerMixin.fit_transform\u001b[1;34m(self, X, y, **fit_params)\u001b[0m\n\u001b[0;32m 1083\u001b[0m warnings\u001b[38;5;241m.\u001b[39mwarn(\n\u001b[0;32m 1084\u001b[0m (\n\u001b[0;32m 1085\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mThis object (\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__class__\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m) has a `transform`\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 1093\u001b[0m \u001b[38;5;167;01mUserWarning\u001b[39;00m,\n\u001b[0;32m 1094\u001b[0m )\n\u001b[0;32m 1096\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m y \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m 1097\u001b[0m \u001b[38;5;66;03m# fit method of arity 1 (unsupervised transformation)\u001b[39;00m\n\u001b[1;32m-> 1098\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfit\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mfit_params\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtransform\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 1099\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m 1100\u001b[0m \u001b[38;5;66;03m# fit method of arity 2 (supervised transformation)\u001b[39;00m\n\u001b[0;32m 1101\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mfit(X, y, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mfit_params)\u001b[38;5;241m.\u001b[39mtransform(X)\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\utils\\_set_output.py:316\u001b[0m, in \u001b[0;36m_wrap_method_output..wrapped\u001b[1;34m(self, X, *args, **kwargs)\u001b[0m\n\u001b[0;32m 314\u001b[0m \u001b[38;5;129m@wraps\u001b[39m(f)\n\u001b[0;32m 315\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mwrapped\u001b[39m(\u001b[38;5;28mself\u001b[39m, X, \u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[1;32m--> 316\u001b[0m data_to_wrap \u001b[38;5;241m=\u001b[39m \u001b[43mf\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 317\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(data_to_wrap, \u001b[38;5;28mtuple\u001b[39m):\n\u001b[0;32m 318\u001b[0m \u001b[38;5;66;03m# only wrap the first output for cross decomposition\u001b[39;00m\n\u001b[0;32m 319\u001b[0m return_tuple \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m 320\u001b[0m _wrap_data_with_container(method, data_to_wrap[\u001b[38;5;241m0\u001b[39m], X, \u001b[38;5;28mself\u001b[39m),\n\u001b[0;32m 321\u001b[0m \u001b[38;5;241m*\u001b[39mdata_to_wrap[\u001b[38;5;241m1\u001b[39m:],\n\u001b[0;32m 322\u001b[0m )\n", - "Cell \u001b[1;32mIn[14], line 18\u001b[0m, in \u001b[0;36mSalaryFeatures.transform\u001b[1;34m(self, X, y)\u001b[0m\n\u001b[0;32m 15\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mtransform\u001b[39m(\u001b[38;5;28mself\u001b[39m, X, y\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m):\n\u001b[0;32m 16\u001b[0m \u001b[38;5;66;03m# Создание новых признаков\u001b[39;00m\n\u001b[0;32m 17\u001b[0m X \u001b[38;5;241m=\u001b[39m X\u001b[38;5;241m.\u001b[39mcopy()\n\u001b[1;32m---> 18\u001b[0m X[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mwork_year_to_remote_ratio\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[43mX\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mwork_year\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m \u001b[38;5;241m/\u001b[39m X[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mremote_ratio\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n\u001b[0;32m 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m X\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\scipy\\sparse\\_csr.py:24\u001b[0m, in \u001b[0;36m_csr_base.__getitem__\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m 22\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m__getitem__\u001b[39m(\u001b[38;5;28mself\u001b[39m, key):\n\u001b[0;32m 23\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mndim \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m2\u001b[39m:\n\u001b[1;32m---> 24\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;21;43m__getitem__\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 26\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(key, \u001b[38;5;28mtuple\u001b[39m) \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(key) \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[0;32m 27\u001b[0m key \u001b[38;5;241m=\u001b[39m key[\u001b[38;5;241m0\u001b[39m]\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\scipy\\sparse\\_index.py:52\u001b[0m, in \u001b[0;36mIndexMixin.__getitem__\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m 51\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m__getitem__\u001b[39m(\u001b[38;5;28mself\u001b[39m, key):\n\u001b[1;32m---> 52\u001b[0m row, col \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_validate_indices\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 54\u001b[0m \u001b[38;5;66;03m# Dispatch to specialized methods.\u001b[39;00m\n\u001b[0;32m 55\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(row, INT_TYPES):\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\scipy\\sparse\\_index.py:186\u001b[0m, in \u001b[0;36mIndexMixin._validate_indices\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m 184\u001b[0m row \u001b[38;5;241m=\u001b[39m _validate_bool_idx(bool_row, M, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrow\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m 185\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(row, \u001b[38;5;28mslice\u001b[39m):\n\u001b[1;32m--> 186\u001b[0m row \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_asindices\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrow\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mM\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 188\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m isintlike(col):\n\u001b[0;32m 189\u001b[0m col \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mint\u001b[39m(col)\n", - "File \u001b[1;32md:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\scipy\\sparse\\_index.py:212\u001b[0m, in \u001b[0;36mIndexMixin._asindices\u001b[1;34m(self, idx, length)\u001b[0m\n\u001b[0;32m 209\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mIndexError\u001b[39;00m(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124minvalid index\u001b[39m\u001b[38;5;124m'\u001b[39m) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01me\u001b[39;00m\n\u001b[0;32m 211\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m x\u001b[38;5;241m.\u001b[39mndim \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m (\u001b[38;5;241m1\u001b[39m, \u001b[38;5;241m2\u001b[39m):\n\u001b[1;32m--> 212\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mIndexError\u001b[39;00m(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mIndex dimension must be 1 or 2\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m 214\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m x\u001b[38;5;241m.\u001b[39msize \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m0\u001b[39m:\n\u001b[0;32m 215\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m x\n", - "\u001b[1;31mIndexError\u001b[0m: Index dimension must be 1 or 2" + "name": "stdout", + "output_type": "stream", + "text": [ + "Model: Logistic Regression\n", + "Accuracy: 0.7523\n", + "F1 Score: 0.7609\n", + "----------------------------------------\n", + "Model: Decision Tree\n", + "Accuracy: 0.9960\n", + "F1 Score: 0.9959\n", + "----------------------------------------\n", + "Model: Gradient Boosting\n", + "Accuracy: 0.9947\n", + "F1 Score: 0.9945\n", + "----------------------------------------\n" ] } ], "source": [ - "import numpy as np\n", "import pandas as pd\n", - "from sklearn.base import BaseEstimator, TransformerMixin\n", - "from sklearn.compose import ColumnTransformer\n", - "from sklearn.impute import SimpleImputer\n", - "from sklearn.preprocessing import OneHotEncoder, StandardScaler\n", - "from sklearn.pipeline import Pipeline\n", "from sklearn.model_selection import train_test_split\n", + "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", + "from sklearn.compose import ColumnTransformer\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.tree import DecisionTreeClassifier\n", + "from sklearn.ensemble import GradientBoostingClassifier\n", + "from sklearn.metrics import accuracy_score, f1_score\n", "\n", "# Загрузка данных\n", "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", @@ -1302,168 +1368,350 @@ "# Разделение данных на тренировочный и тестовый наборы\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)\n", "\n", - "# Построение конвейеров предобработки\n", - "\n", - "class SalaryFeatures(BaseEstimator, TransformerMixin):\n", - " def __init__(self):\n", - " pass\n", - " def fit(self, X, y=None):\n", - " return self\n", - " def transform(self, X, y=None):\n", - " # Создание новых признаков\n", - " X = X.copy()\n", - " X[\"work_year_to_remote_ratio\"] = X[\"work_year\"] / X[\"remote_ratio\"]\n", - " return X\n", - " def get_feature_names_out(self, features_in):\n", - " # Добавление имен новых признаков\n", - " new_features = [\"work_year_to_remote_ratio\"]\n", - " return np.append(features_in, new_features, axis=0)\n", - "\n", - "# Обработка числовых данных. Числовой конвейер: заполнение пропущенных значений медианой и стандартизация\n", - "preprocessing_num_class = Pipeline(steps=[\n", - " ('imputer', SimpleImputer(strategy='median')),\n", - " ('scaler', StandardScaler())\n", - "])\n", - "\n", - "# Обработка категориальных данных: заполнение пропущенных значений наиболее частым значением и one-hot encoding\n", - "preprocessing_cat_class = Pipeline(steps=[\n", - " ('imputer', SimpleImputer(strategy='most_frequent')),\n", - " ('onehot', OneHotEncoder(handle_unknown='ignore'))\n", - "])\n", - "\n", "# Определение столбцов\n", - "numeric_columns = [\"work_year\", \"salary\", \"salary_in_usd\", \"remote_ratio\"]\n", + "numeric_columns = [\"work_year\", \"salary\", \"remote_ratio\"]\n", "cat_columns = [\"experience_level\", \"employment_type\", \"job_title\", \"salary_currency\", \"employee_residence\", \"company_location\", \"company_size\"]\n", "\n", - "# Предобработка признаков\n", - "features_preprocessing = ColumnTransformer(\n", - " verbose_feature_names_out=False,\n", + "# Предобработка данных\n", + "preprocessor = ColumnTransformer(\n", " transformers=[\n", - " (\"prepocessing_num\", preprocessing_num_class, numeric_columns),\n", - " (\"prepocessing_cat\", preprocessing_cat_class, cat_columns),\n", - " ],\n", - " remainder=\"passthrough\"\n", - ")\n", + " ('num', StandardScaler(), numeric_columns),\n", + " ('cat', OneHotEncoder(handle_unknown='ignore'), cat_columns)])\n", "\n", - "# Удаление колонок\n", - "columns_to_drop = [] # Укажите столбцы, которые нужно удалить, если они есть\n", - "drop_columns = ColumnTransformer(\n", - " verbose_feature_names_out=False,\n", - " transformers=[\n", - " (\"drop_columns\", \"drop\", columns_to_drop),\n", - " ],\n", - " remainder=\"passthrough\",\n", - ")\n", + "# Создание конвейеров для моделей\n", + "pipeline_logistic_regression = Pipeline(steps=[\n", + " ('preprocessor', preprocessor),\n", + " ('classifier', LogisticRegression(random_state=42))])\n", "\n", - "# Основной конвейер предобработки данных и конструирования признаков\n", - "pipeline_end = Pipeline(\n", - " [\n", - " (\"features_preprocessing\", features_preprocessing),\n", - " (\"custom_features\", SalaryFeatures()),\n", - " (\"drop_columns\", drop_columns),\n", - " ]\n", - ")\n", + "pipeline_decision_tree = Pipeline(steps=[\n", + " ('preprocessor', preprocessor),\n", + " ('classifier', DecisionTreeClassifier(random_state=42))])\n", "\n", - "# Демонстрация работы конвейера для предобработки данных при классификации\n", - "preprocessing_result = pipeline_end.fit_transform(X_train)\n", + "pipeline_gradient_boosting = Pipeline(steps=[\n", + " ('preprocessor', preprocessor),\n", + " ('classifier', GradientBoostingClassifier(random_state=42))])\n", "\n", - "# Получение имен столбцов после преобразования\n", - "feature_names = pipeline_end.named_steps['features_preprocessing'].get_feature_names_out(numeric_columns + cat_columns)\n", - "feature_names = np.append(feature_names, [\"work_year_to_remote_ratio\"])\n", + "# Список конвейеров \n", + "pipelines = [\n", + " ('Logistic Regression', pipeline_logistic_regression),\n", + " ('Decision Tree', pipeline_decision_tree),\n", + " ('Gradient Boosting', pipeline_gradient_boosting)\n", + "]\n", "\n", - "# Создание DataFrame с преобразованными данными\n", - "preprocessed_df = pd.DataFrame(\n", - " preprocessing_result,\n", - " columns=feature_names,\n", - ")\n", - "\n", - "preprocessed_df" + "# Обучение моделей и вывод результатов\n", + "for name, pipeline in pipelines:\n", + " pipeline.fit(X_train, y_train)\n", + " y_pred = pipeline.predict(X_test)\n", + " accuracy = accuracy_score(y_test, y_pred)\n", + " f1 = f1_score(y_test, y_pred)\n", + " print(f\"Model: {name}\")\n", + " print(f\"Accuracy: {accuracy:.4f}\")\n", + " print(f\"F1 Score: {f1:.4f}\")\n", + " print(\"-\" * 40)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "**Бизнес-цели**\n", - "\n", - "1. Предсказание заработной платы (Регрессия)\n", - "\n", - " Цель: Предсказать заработную плату (salary_in_usd) на основе других характеристик, таких как уровень опыта (experience_level), тип занятости (employment_type), должность (job_title), место проживания сотрудника (employee_residence), размер компании (company_size) и другие факторы.\n", - "\n", - " Применение: Это может быть полезно для HR-отделов, которые хотят оценить справедливую зарплату для новых сотрудников или для анализа рынка труда.\n", - "\n", - "2. Классификация уровня опыта по зарплате (Классификация)\n", - "\n", - " Цель: Классифицировать уровень опыта (experience_level) на основе заработной платы (salary_in_usd) и других факторов.\n", - "\n", - " Применение: Это может помочь в оценке, на каком уровне опыта находится сотрудник, основываясь на его зарплате, что может быть полезно для оценки карьерного роста." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "1. Прогнозирование зарплаты" + "Оценка качества моделей" ] }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - " work_year experience_level employment_type job_title \\\n", - "0 2023 SE FT Principal Data Scientist \n", - "1 2023 MI CT ML Engineer \n", - "2 2023 MI CT ML Engineer \n", - "3 2023 SE FT Data Scientist \n", - "4 2023 SE FT Data Scientist \n", + "Model: Logistic Regression\n", + "Accuracy: 0.7523302263648469\n", + "F1 Score: 0.7517841210039291\n", "\n", - " salary salary_currency salary_in_usd employee_residence remote_ratio \\\n", - "0 80000 EUR 85847 ES 100 \n", - "1 30000 USD 30000 US 100 \n", - "2 25500 USD 25500 US 100 \n", - "3 175000 USD 175000 CA 100 \n", - "4 120000 USD 120000 CA 100 \n", + "Model: Decision Tree\n", + "Accuracy: 0.996005326231691\n", + "F1 Score: 0.9960048583691977\n", "\n", - " company_location company_size \n", - "0 ES L \n", - "1 US S \n", - "2 US S \n", - "3 CA M \n", - "4 CA M \n", - "\n", - "RangeIndex: 3755 entries, 0 to 3754\n", - "Data columns (total 11 columns):\n", - " # Column Non-Null Count Dtype \n", - "--- ------ -------------- ----- \n", - " 0 work_year 3755 non-null int64 \n", - " 1 experience_level 3755 non-null object\n", - " 2 employment_type 3755 non-null object\n", - " 3 job_title 3755 non-null object\n", - " 4 salary 3755 non-null int64 \n", - " 5 salary_currency 3755 non-null object\n", - " 6 salary_in_usd 3755 non-null int64 \n", - " 7 employee_residence 3755 non-null object\n", - " 8 remote_ratio 3755 non-null int64 \n", - " 9 company_location 3755 non-null object\n", - " 10 company_size 3755 non-null object\n", - "dtypes: int64(4), object(7)\n", - "memory usage: 322.8+ KB\n", - "None\n", - " work_year salary salary_in_usd remote_ratio\n", - "count 3755.000000 3.755000e+03 3755.000000 3755.000000\n", - "mean 2022.373635 1.906956e+05 137570.389880 46.271638\n", - "std 0.691448 6.716765e+05 63055.625278 48.589050\n", - "min 2020.000000 6.000000e+03 5132.000000 0.000000\n", - "25% 2022.000000 1.000000e+05 95000.000000 0.000000\n", - "50% 2022.000000 1.380000e+05 135000.000000 0.000000\n", - "75% 2023.000000 1.800000e+05 175000.000000 100.000000\n", - "max 2023.000000 3.040000e+07 450000.000000 100.000000\n", - "work_year 0\n", + "Model: Gradient Boosting\n", + "Accuracy: 0.9946737683089214\n", + "F1 Score: 0.9946728986768623\n", + "\n" + ] + } + ], + "source": [ + "from sklearn.metrics import accuracy_score, f1_score\n", + "\n", + "for name, pipeline in pipelines:\n", + " y_pred = pipeline.predict(X_test)\n", + " print(f\"Model: {name}\")\n", + " print('Accuracy:', accuracy_score(y_test, y_pred))\n", + " print('F1 Score:', f1_score(y_test, y_pred, average='weighted'))\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Регрессия\n", + "Цель: Разработать модель регрессии, которая будет предсказывать зарплату (salary_in_usd) на основе демографических данных, типа работы и других факторов." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Размер данных до удаления выбросов: (3755, 11)\n", + "Размер данных после удаления выбросов: (3708, 11)\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "from scipy import stats\n", + "\n", + "# Загрузка данных\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", + "\n", + "# Определение числовых признаков\n", + "numeric_features = ['work_year', 'salary', 'salary_in_usd', 'remote_ratio']\n", + "\n", + "# Вычисление z-оценок для числовых признаков\n", + "z_scores = stats.zscore(df[numeric_features])\n", + "\n", + "# Определение порога для удаления выбросов\n", + "threshold = 3\n", + "\n", + "# Удаление выбросов\n", + "df_cleaned = df[(z_scores < threshold).all(axis=1)]\n", + "\n", + "print(\"Размер данных до удаления выбросов:\", df.shape)\n", + "print(\"Размер данных после удаления выбросов:\", df_cleaned.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Размер обучающей выборки: (2966, 9)\n", + "Размер тестовой выборки: (742, 9)\n", + "Baseline MAE: 48988.97819674187\n", + "Baseline MSE: 3791583837.2779293\n", + "Baseline R²: -0.005051587587466155\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n", + "\n", + "# Определение признаков и целевой переменной\n", + "features = ['work_year', 'experience_level', 'employment_type', 'job_title', 'salary_currency', 'remote_ratio', 'employee_residence', 'company_location', 'company_size']\n", + "target = 'salary_in_usd'\n", + "\n", + "# Разделение данных на тренировочный и тестовый наборы\n", + "X_train, X_test, y_train, y_test = train_test_split(df_cleaned[features], df_cleaned[target], test_size=0.2, random_state=42)\n", + "\n", + "print(\"Размер обучающей выборки:\", X_train.shape)\n", + "print(\"Размер тестовой выборки:\", X_test.shape)\n", + "\n", + "# Создание ориентира (baseline)\n", + "baseline_predictions = [y_train.mean()] * len(y_test)\n", + "\n", + "# Вычисление метрик для ориентира\n", + "print('Baseline MAE:', mean_absolute_error(y_test, baseline_predictions))\n", + "print('Baseline MSE:', mean_squared_error(y_test, baseline_predictions))\n", + "print('Baseline R²:', r2_score(y_test, baseline_predictions))" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Размер данных до удаления выбросов: (3755, 11)\n", + "Размер данных после удаления выбросов: (3733, 11)\n", + "Размер обучающей выборки: (2986, 9)\n", + "Размер тестовой выборки: (747, 9)\n", + "Baseline MAE: 47593.92288600708\n", + "Baseline MSE: 3680965527.9964128\n", + "Baseline R²: -0.0016576422593919116\n", + "Model: Linear Regression trained.\n", + "Model: Decision Tree trained.\n", + "Model: Gradient Boosting trained.\n", + "Model: Linear Regression\n", + "MAE: 36617.65439873256\n", + "MSE: 2194684192.4416404\n", + "R²: 0.4027865306031213\n", + "\n", + "Model: Decision Tree\n", + "MAE: 36516.71804922624\n", + "MSE: 2246643776.062331\n", + "R²: 0.38864738324451775\n", + "\n", + "Model: Gradient Boosting\n", + "MAE: 35842.80843437428\n", + "MSE: 2125285552.2470944\n", + "R²: 0.42167116230764956\n", + "\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "from scipy import stats\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", + "from sklearn.compose import ColumnTransformer\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.tree import DecisionTreeRegressor\n", + "from sklearn.ensemble import GradientBoostingRegressor\n", + "from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n", + "\n", + "# Загрузка данных\n", + "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", + "\n", + "# Определение числовых признаков\n", + "numeric_features = ['work_year', 'salary_in_usd', 'remote_ratio']\n", + "\n", + "# Вычисление z-оценок для числовых признаков\n", + "z_scores = stats.zscore(df[numeric_features])\n", + "\n", + "# Определение порога для удаления выбросов\n", + "threshold = 3\n", + "\n", + "# Удаление выбросов\n", + "df_cleaned = df[(z_scores < threshold).all(axis=1)]\n", + "\n", + "print(\"Размер данных до удаления выбросов:\", df.shape)\n", + "print(\"Размер данных после удаления выбросов:\", df_cleaned.shape)\n", + "\n", + "# Разделение на выборки и создание ориентира\n", + "features = ['work_year', 'experience_level', 'employment_type', 'job_title', 'salary_currency', 'remote_ratio', 'employee_residence', 'company_location', 'company_size']\n", + "target = 'salary_in_usd'\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(df_cleaned[features], df_cleaned[target], test_size=0.2, random_state=42)\n", + "\n", + "print(\"Размер обучающей выборки:\", X_train.shape)\n", + "print(\"Размер тестовой выборки:\", X_test.shape)\n", + "\n", + "# Создание ориентира (baseline)\n", + "baseline_predictions = [y_train.mean()] * len(y_test)\n", + "\n", + "print('Baseline MAE:', mean_absolute_error(y_test, baseline_predictions))\n", + "print('Baseline MSE:', mean_squared_error(y_test, baseline_predictions))\n", + "print('Baseline R²:', r2_score(y_test, baseline_predictions))\n", + "\n", + "# Создание конвейера и обучение моделей\n", + "categorical_features = ['experience_level', 'employment_type', 'job_title', 'salary_currency', 'employee_residence', 'company_location', 'company_size']\n", + "numeric_features = ['work_year', 'remote_ratio']\n", + "\n", + "preprocessor = ColumnTransformer(\n", + " transformers=[\n", + " ('num', StandardScaler(), numeric_features),\n", + " ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)])\n", + "\n", + "pipeline_linear_regression = Pipeline(steps=[\n", + " ('preprocessor', preprocessor),\n", + " ('regressor', LinearRegression())])\n", + "\n", + "pipeline_decision_tree = Pipeline(steps=[\n", + " ('preprocessor', preprocessor),\n", + " ('regressor', DecisionTreeRegressor(random_state=42))])\n", + "\n", + "pipeline_gradient_boosting = Pipeline(steps=[\n", + " ('preprocessor', preprocessor),\n", + " ('regressor', GradientBoostingRegressor(random_state=42))])\n", + "\n", + "pipelines = [\n", + " ('Linear Regression', pipeline_linear_regression),\n", + " ('Decision Tree', pipeline_decision_tree),\n", + " ('Gradient Boosting', pipeline_gradient_boosting)\n", + "]\n", + "\n", + "for name, pipeline in pipelines:\n", + " pipeline.fit(X_train, y_train)\n", + " print(f\"Model: {name} trained.\")\n", + "\n", + "# Оценка качества моделей\n", + "for name, pipeline in pipelines:\n", + " y_pred = pipeline.predict(X_test)\n", + " print(f\"Model: {name}\")\n", + " print('MAE:', mean_absolute_error(y_test, y_pred))\n", + " print('MSE:', mean_squared_error(y_test, y_pred))\n", + " print('R²:', r2_score(y_test, y_pred))\n", + " print()" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model: Linear Regression\n", + "MAE: 36617.65439873256\n", + "MSE: 2194684192.4416404\n", + "R²: 0.4027865306031213\n", + "\n", + "Model: Decision Tree\n", + "MAE: 36516.71804922624\n", + "MSE: 2246643776.062331\n", + "R²: 0.38864738324451775\n", + "\n", + "Model: Gradient Boosting\n", + "MAE: 35842.80843437428\n", + "MSE: 2125285552.2470944\n", + "R²: 0.42167116230764956\n", + "\n" + ] + } + ], + "source": [ + "from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n", + "\n", + "for name, pipeline in pipelines:\n", + " y_pred = pipeline.predict(X_test)\n", + " print(f\"Model: {name}\")\n", + " print('MAE:', mean_absolute_error(y_test, y_pred))\n", + " print('MSE:', mean_squared_error(y_test, y_pred))\n", + " print('R²:', r2_score(y_test, y_pred))\n", + " print()" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Пропущенные значения:\n", + " work_year 0\n", "experience_level 0\n", "employment_type 0\n", "job_title 0\n", @@ -1474,115 +1722,124 @@ "remote_ratio 0\n", "company_location 0\n", "company_size 0\n", - "dtype: int64\n", - "Mean Squared Error: 2482079980.9527493\n", - "R^2 Score: 0.37127352660208646\n" + "dtype: int64\n" ] } ], "source": [ "import pandas as pd\n", - "import numpy as np\n", "from sklearn.model_selection import train_test_split\n", - "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", + "from sklearn.preprocessing import OneHotEncoder, StandardScaler\n", "from sklearn.compose import ColumnTransformer\n", "from sklearn.pipeline import Pipeline\n", "from sklearn.linear_model import LinearRegression\n", + "from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor\n", "from sklearn.metrics import mean_squared_error, r2_score\n", - "import seaborn as sns\n", - "import matplotlib.pyplot as plt\n", + "from scipy.stats import uniform, randint\n", + "from sklearn.model_selection import RandomizedSearchCV\n", "\n", - "# Загружаем набор данных\n", + "# Загрузка данных\n", "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", "\n", - "# Устанавливаем случайное состояние\n", - "random_state = 42\n", - "\n", - "# Предварительный анализ данных\n", - "print(df.head())\n", - "print(df.info())\n", - "print(df.describe())\n", - "\n", "# Проверка на пропущенные значения\n", - "print(df.isnull().sum())\n", + "print(\"Пропущенные значения:\\n\", df.isnull().sum())\n", "\n", - "# Предобработка данных\n", - "# Определяем категориальные и числовые столбцы\n", + "# Удаление строк с пропущенными значениями\n", + "df = df.dropna()\n", + "\n", + "# Выбор признаков и целевой переменной\n", + "features = ['work_year', 'experience_level', 'employment_type', 'job_title', 'employee_residence', 'remote_ratio', 'company_location', 'company_size']\n", + "target = 'salary_in_usd'\n", + "\n", + "# Определение категориальных и числовых признаков\n", "categorical_features = ['experience_level', 'employment_type', 'job_title', 'employee_residence', 'company_location', 'company_size']\n", "numeric_features = ['work_year', 'remote_ratio']\n", "\n", - "# Создаем пайплайн для обработки данных\n", + "# Создание пайплайна для обработки данных\n", + "categorical_transformer = Pipeline(steps=[\n", + " ('onehot', OneHotEncoder(handle_unknown='ignore'))\n", + "])\n", + "\n", + "numeric_transformer = Pipeline(steps=[\n", + " ('scaler', StandardScaler())\n", + "])\n", + "\n", "preprocessor = ColumnTransformer(\n", " transformers=[\n", - " ('num', StandardScaler(), numeric_features),\n", - " ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)])\n", + " ('num', numeric_transformer, numeric_features),\n", + " ('cat', categorical_transformer, categorical_features)\n", + " ])\n", "\n", - "# Определяем целевую переменную и признаки\n", - "X = df.drop('salary_in_usd', axis=1)\n", - "y = df['salary_in_usd']\n", + "# Преобразование данных\n", + "X = preprocessor.fit_transform(df[features])\n", + "y = df[target]\n", "\n", - "# Разделяем данные на обучающую и тестовую выборки\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=random_state)\n", - "\n", - "# Создаем и обучаем модель\n", - "model = Pipeline(steps=[\n", - " ('preprocessor', preprocessor),\n", - " ('regressor', LinearRegression())])\n", - "\n", - "model.fit(X_train, y_train)\n", - "\n", - "# Делаем предсказания на тестовой выборке\n", - "y_pred = model.predict(X_test)\n", - "\n", - "# Оцениваем качество модели\n", - "mse = mean_squared_error(y_test, y_pred)\n", - "r2 = r2_score(y_test, y_pred)\n", - "\n", - "print(f\"Mean Squared Error: {mse}\")\n", - "print(f\"R^2 Score: {r2}\")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "2. Классифицировать уровень опыта" + "# Разделение данных на обучающую и тестовую выборки\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)" ] }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 47, "metadata": {}, "outputs": [ { - "name": "stdout", + "name": "stderr", "output_type": "stream", "text": [ - "Classification Report:\n", - " precision recall f1-score support\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\model_selection\\_search.py:320: UserWarning: The total space of parameters 4 is smaller than n_iter=10. Running 4 iterations. For exhaustive searches, use GridSearchCV.\n", + " warnings.warn(\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\model_selection\\_validation.py:540: FitFailedWarning: \n", + "6 fits failed out of a total of 12.\n", + "The score on these train-test partitions for these parameters will be set to nan.\n", + "If these failures are not expected, you can try to debug them by setting error_score='raise'.\n", "\n", - " EN 0.55 0.48 0.51 67\n", - " EX 0.46 0.26 0.33 23\n", - " MI 0.48 0.54 0.51 157\n", - " SE 0.83 0.83 0.83 504\n", + "Below are more details about the failures:\n", + "--------------------------------------------------------------------------------\n", + "6 fits failed with the following error:\n", + "Traceback (most recent call last):\n", + " File \"d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\model_selection\\_validation.py\", line 888, in _fit_and_score\n", + " estimator.fit(X_train, y_train, **fit_params)\n", + " File \"d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\base.py\", line 1473, in wrapper\n", + " return fit_method(estimator, *args, **kwargs)\n", + " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + " File \"d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\pipeline.py\", line 473, in fit\n", + " self._final_estimator.fit(Xt, y, **last_step_params[\"fit\"])\n", + " File \"d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\base.py\", line 1473, in wrapper\n", + " return fit_method(estimator, *args, **kwargs)\n", + " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + " File \"d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\linear_model\\_base.py\", line 609, in fit\n", + " X, y = self._validate_data(\n", + " ^^^^^^^^^^^^^^^^^^^^\n", + " File \"d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\base.py\", line 650, in _validate_data\n", + " X, y = check_X_y(X, y, **check_params)\n", + " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + " File \"d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\utils\\validation.py\", line 1301, in check_X_y\n", + " X = check_array(\n", + " ^^^^^^^^^^^^\n", + " File \"d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\utils\\validation.py\", line 971, in check_array\n", + " array = _ensure_sparse_format(\n", + " ^^^^^^^^^^^^^^^^^^^^^^\n", + " File \"d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\utils\\validation.py\", line 595, in _ensure_sparse_format\n", + " raise TypeError(\n", + "TypeError: Sparse data was passed for X, but dense data is required. Use '.toarray()' to convert to a dense numpy array.\n", "\n", - " accuracy 0.72 751\n", - " macro avg 0.58 0.53 0.55 751\n", - "weighted avg 0.72 0.72 0.72 751\n", - "\n", - "Confusion Matrix:\n", - "[[ 32 0 20 15]\n", - " [ 0 6 5 12]\n", - " [ 14 0 84 59]\n", - " [ 12 7 65 420]]\n", - "Accuracy Score: 0.7217043941411452\n" + " warnings.warn(some_fits_failed_message, FitFailedWarning)\n", + "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\model_selection\\_search.py:1103: UserWarning: One or more of the test scores are non-finite: [ nan 0.37308723 nan 0.37316524]\n", + " warnings.warn(\n", + "C:\\Users\\user\\AppData\\Local\\Temp\\ipykernel_14908\\2948510432.py:70: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.\n", + " axes[i].set_xticklabels(params.keys(), rotation=45, ha=\"right\") #Поворачиваем подписи на оси х\n", + "C:\\Users\\user\\AppData\\Local\\Temp\\ipykernel_14908\\2948510432.py:70: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.\n", + " axes[i].set_xticklabels(params.keys(), rotation=45, ha=\"right\") #Поворачиваем подписи на оси х\n", + "C:\\Users\\user\\AppData\\Local\\Temp\\ipykernel_14908\\2948510432.py:70: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.\n", + " axes[i].set_xticklabels(params.keys(), rotation=45, ha=\"right\") #Поворачиваем подписи на оси х\n" ] }, { "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhsAAAHHCAYAAAAWM5p0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABVJElEQVR4nO3deXwM5x8H8M9ujs193yVuIcTR0Ij7CEEccVSVEqq0aaIIqlFH0IoGdRNtHakKbRUt6la0FURIEaSoNlQOCTlEsons/P5Q++s2QcJOZpP9vPua18s+88zMd5KSb77P88zIBEEQQERERCQSudQBEBERUfXGZIOIiIhExWSDiIiIRMVkg4iIiETFZIOIiIhExWSDiIiIRMVkg4iIiETFZIOIiIhExWSDiIiIRMVkg0hEV69eRY8ePWBtbQ2ZTIadO3dq9fx//vknZDIZNm7cqNXzVmWdO3dG586dpQ6DiP6FyQZVe9evX8fbb7+NunXrwsTEBFZWVmjXrh2WLVuGgoICUa8dFBSECxcu4OOPP8amTZvQqlUrUa9XmUaNGgWZTAYrK6syv45Xr16FTCaDTCbDokWLKnz+27dvIyIiAomJiVqIloikZCh1AERi2rNnD1599VUoFAqMHDkSTZs2RVFREX755RdMnToVSUlJ+Oyzz0S5dkFBAeLi4vDhhx8iNDRUlGvUqlULBQUFMDIyEuX8z2JoaIgHDx5g165dGDJkiMa+zZs3w8TEBIWFhc917tu3b2POnDmoXbs2WrRoUe7jDhw48FzXIyLxMNmgauvGjRsYOnQoatWqhSNHjsDV1VW9LyQkBNeuXcOePXtEu/6dO3cAADY2NqJdQyaTwcTERLTzP4tCoUC7du2wZcuWUslGbGwsAgIC8N1331VKLA8ePICZmRmMjY0r5XpEVH4cRqFqKyoqCvfv38e6des0Eo3H6tevjwkTJqg/P3z4EPPmzUO9evWgUChQu3ZtTJ8+HUqlUuO42rVro0+fPvjll1/wyiuvwMTEBHXr1sWXX36p7hMREYFatWoBAKZOnQqZTIbatWsDeDT88PjP/xYREQGZTKbRdvDgQbRv3x42NjawsLCAh4cHpk+frt7/pDkbR44cQYcOHWBubg4bGxv0798fly9fLvN6165dw6hRo2BjYwNra2uMHj0aDx48ePIX9j+GDRuGvXv3Ijs7W90WHx+Pq1evYtiwYaX63717F1OmTIGXlxcsLCxgZWWFXr164bffflP3OXr0KFq3bg0AGD16tHo45vF9du7cGU2bNkVCQgI6duwIMzMz9dflv3M2goKCYGJiUur+/f39YWtri9u3b5f7Xono+TDZoGpr165dqFu3Ltq2bVuu/m+99RZmzZqFl19+GUuWLEGnTp0QGRmJoUOHlup77do1DB48GN27d8fixYtha2uLUaNGISkpCQAwcOBALFmyBADw+uuvY9OmTVi6dGmF4k9KSkKfPn2gVCoxd+5cLF68GP369cOvv/761OMOHToEf39/ZGRkICIiAmFhYThx4gTatWuHP//8s1T/IUOGIC8vD5GRkRgyZAg2btyIOXPmlDvOgQMHQiaTYfv27eq22NhYNGrUCC+//HKp/n/88Qd27tyJPn364NNPP8XUqVNx4cIFdOrUSf2Dv3Hjxpg7dy4AYNy4cdi0aRM2bdqEjh07qs+TlZWFXr16oUWLFli6dCm6dOlSZnzLli2Do6MjgoKCUFJSAgBYu3YtDhw4gBUrVsDNza3c90pEz0kgqoZycnIEAEL//v3L1T8xMVEAILz11lsa7VOmTBEACEeOHFG31apVSwAgHD9+XN2WkZEhKBQKYfLkyeq2GzduCACEhQsXapwzKChIqFWrVqkYZs+eLfz7r+SSJUsEAMKdO3eeGPfja2zYsEHd1qJFC8HJyUnIyspSt/3222+CXC4XRo4cWep6b775psY5BwwYINjb2z/xmv++D3Nzc0EQBGHw4MFCt27dBEEQhJKSEsHFxUWYM2dOmV+DwsJCoaSkpNR9KBQKYe7cueq2+Pj4Uvf2WKdOnQQAQnR0dJn7OnXqpNG2f/9+AYDw0UcfCX/88YdgYWEhBAYGPvMeiUg7WNmgaik3NxcAYGlpWa7+P/74IwAgLCxMo33y5MkAUGpuh6enJzp06KD+7OjoCA8PD/zxxx/PHfN/PZ7r8f3330OlUpXrmNTUVCQmJmLUqFGws7NTtzdr1gzdu3dX3+e/vfPOOxqfO3TogKysLPXXsDyGDRuGo0ePIi0tDUeOHEFaWlqZQyjAo3kecvmjf3pKSkqQlZWlHiI6e/Zsua+pUCgwevTocvXt0aMH3n77bcydOxcDBw6EiYkJ1q5dW+5rEdGLYbJB1ZKVlRUAIC8vr1z9//rrL8jlctSvX1+j3cXFBTY2Nvjrr7802t3d3Uudw9bWFvfu3XvOiEt77bXX0K5dO7z11ltwdnbG0KFD8c033zw18Xgcp4eHR6l9jRs3RmZmJvLz8zXa/3svtra2AFChe+nduzcsLS3x9ddfY/PmzWjdunWpr+VjKpUKS5YsQYMGDaBQKODg4ABHR0ecP38eOTk55b7mSy+9VKHJoIsWLYKdnR0SExOxfPlyODk5lftYInoxTDaoWrKysoKbmxsuXrxYoeP+O0HzSQwMDMpsFwThua/xeD7BY6ampjh+/DgOHTqEESNG4Pz583jttdfQvXv3Un1fxIvcy2MKhQIDBw5ETEwMduzY8cSqBgDMnz8fYWFh6NixI7766ivs378fBw8eRJMmTcpdwQEefX0q4ty5c8jIyAAAXLhwoULHEtGLYbJB1VafPn1w/fp1xMXFPbNvrVq1oFKpcPXqVY329PR0ZGdnq1eWaIOtra3Gyo3H/ls9AQC5XI5u3brh008/xaVLl/Dxxx/jyJEj+Omnn8o89+M4k5OTS+27cuUKHBwcYG5u/mI38ATDhg3DuXPnkJeXV+ak2se2bduGLl26YN26dRg6dCh69OgBPz+/Ul+T8iZ+5ZGfn4/Ro0fD09MT48aNQ1RUFOLj47V2fiJ6OiYbVG29//77MDc3x1tvvYX09PRS+69fv45ly5YBeDQMAKDUipFPP/0UABAQEKC1uOrVq4ecnBycP39e3ZaamoodO3Zo9Lt7926pYx8/3Oq/y3Efc3V1RYsWLRATE6Pxw/vixYs4cOCA+j7F0KVLF8ybNw8rV66Ei4vLE/sZGBiUqpp8++23+PvvvzXaHidFZSVmFTVt2jSkpKQgJiYGn376KWrXro2goKAnfh2JSLv4UC+qturVq4fY2Fi89tpraNy4scYTRE+cOIFvv/0Wo0aNAgA0b94cQUFB+Oyzz5CdnY1OnTrh9OnTiImJQWBg4BOXVT6PoUOHYtq0aRgwYADee+89PHjwAGvWrEHDhg01JkjOnTsXx48fR0BAAGrVqoWMjAysXr0aNWrUQPv27Z94/oULF6JXr17w9fXFmDFjUFBQgBUrVsDa2hoRERFau4//ksvlmDFjxjP79enTB3PnzsXo0aPRtm1bXLhwAZs3b0bdunU1+tWrVw82NjaIjo6GpaUlzM3N4ePjgzp16lQoriNHjmD16tWYPXu2einuhg0b0LlzZ8ycORNRUVEVOh8RPQeJV8MQie73338Xxo4dK9SuXVswNjYWLC0thXbt2gkrVqwQCgsL1f2Ki4uFOXPmCHXq1BGMjIyEmjVrCuHh4Rp9BOHR0teAgIBS1/nvkssnLX0VBEE4cOCA0LRpU8HY2Fjw8PAQvvrqq1JLXw8fPiz0799fcHNzE4yNjQU3Nzfh9ddfF37//fdS1/jv8tBDhw4J7dq1E0xNTQUrKyuhb9++wqVLlzT6PL7ef5fWbtiwQQAg3Lhx44lfU0HQXPr6JE9a+jp58mTB1dVVMDU1Fdq1ayfExcWVuWT1+++/Fzw9PQVDQ0ON++zUqZPQpEmTMq/57/Pk5uYKtWrVEl5++WWhuLhYo9+kSZMEuVwuxMXFPfUeiOjFyQShArPAiIiIiCqIczaIiIhIVEw2iIiISFRMNoiIiEhUTDaIiIhIVEw2iIiISFRMNoiIiEhUTDaIiIhIVNXyCaLpucVSh0D/sDYzkjoE+peCIu29wI1ejFyL736hF2NtKv7v3aYtQ7VynoJzK7VynsrGygYRERGJqlpWNoiIiHSKTL9/t2eyQUREJDY9HzZjskFERCQ2Pa9s6PfdExERkehY2SAiIhIbh1GIiIhIVBxGISIiIhIPKxtERERi4zAKERERiYrDKERERETiYWWDiIhIbHo+jMLKBhERkdhkcu1sL2DBggWQyWSYOHGiuq2wsBAhISGwt7eHhYUFBg0ahPT0dI3jUlJSEBAQADMzMzg5OWHq1Kl4+PBhha7NZIOIiKiai4+Px9q1a9GsWTON9kmTJmHXrl349ttvcezYMdy+fRsDBw5U7y8pKUFAQACKiopw4sQJxMTEYOPGjZg1a1aFrs9kg4iISGwymXa253D//n0MHz4cn3/+OWxtbdXtOTk5WLduHT799FN07doV3t7e2LBhA06cOIGTJ08CAA4cOIBLly7hq6++QosWLdCrVy/MmzcPq1atQlFRUbljYLJBREQkNgmHUUJCQhAQEAA/Pz+N9oSEBBQXF2u0N2rUCO7u7oiLiwMAxMXFwcvLC87Ozuo+/v7+yM3NRVJSUrlj4ARRIiIisWlpgqhSqYRSqdRoUygUUCgUZfbfunUrzp49i/j4+FL70tLSYGxsDBsbG412Z2dnpKWlqfv8O9F4vP/xvvJiZYOIiKiKiIyMhLW1tcYWGRlZZt+bN29iwoQJ2Lx5M0xMTCo5Uk1MNoiIiMSmpWGU8PBw5OTkaGzh4eFlXjIhIQEZGRl4+eWXYWhoCENDQxw7dgzLly+HoaEhnJ2dUVRUhOzsbI3j0tPT4eLiAgBwcXEptTrl8efHfcqDyQYREZHYtJRsKBQKWFlZaWxPGkLp1q0bLly4gMTERPXWqlUrDB8+XP1nIyMjHD58WH1McnIyUlJS4OvrCwDw9fXFhQsXkJGRoe5z8OBBWFlZwdPTs9y3zzkbRERE1ZClpSWaNm2q0WZubg57e3t1+5gxYxAWFgY7OztYWVlh/Pjx8PX1RZs2bQAAPXr0gKenJ0aMGIGoqCikpaVhxowZCAkJeWKSUxYmG0RERGKT6+YTRJcsWQK5XI5BgwZBqVTC398fq1evVu83MDDA7t27ERwcDF9fX5ibmyMoKAhz586t0HVkgiAI2g5eaum5xVKHQP+wNjOSOgT6l4KiEqlDoH/I9fzx1brE2lT8GQWmXT/WynkKjnyolfNUNs7ZICIiIlFxGIWIiEhsel7JYrJBREQkthd8iVpVp993T0RERKJjZYOIiEhsHEYhIiIiUen5MAqTDSIiIrHpeWVDv1MtIiIiEh0rG0RERGLjMAoRERGJisMoREREROJhZYOIiEhsHEYhIiIiUXEYhYiIiEg8rGwQERGJjcMoREREJCo9Tzb0++6JiIhIdKxsSGjntq3Y+d3XSEu9DQCoU7c+gsa8gzbtOiA3JwfrP1uF+JMnkJ6eChsbW3To3BVj3hkPCwtLiSPXL1tjNyNmwzpkZt5BQ49G+GD6THg1ayZ1WNVazLrPcPTIIfz15x9QKEzg1bwFQiZMRq3addR9lEolln8ahYP7f0RxURF8fNtj6vSZsLd3kDDy6udsQjy+ilmPK5eTkHnnDqI+XYHOXf3U++fMDMeeXTs1jmnTtj2Wr/68kiPVcXo+QZTJhoQcnVzwdugk1KhZCxAE7NvzPaZPGY91X22DIAjIvJOBdydMQe26dZGWmorFC+Yi884dzPtkidSh6419e3/EoqhIzJg9B15ezbF5UwyC3x6D73fvg729vdThVVvnzp7BoNdeh2eTpih5WII1K5diQvBb2LJ9F0xNzQAASxctwIlfjmF+1BJYWFhi0YKP8MHkCfh842aJo69eCgsK0KChB/oGDsS0sPfK7OPbrgNmzvlY/dnY2Liywqs69HwYRSYIgiB1ENqWnlssdQjPLaBbWwS/Nxl9+g8qte+nQ/vx0awPsP94PAwNq0aeaG1mJHUIL2T40FfRpKkXps+YBQBQqVTo0a0TXh82AmPGjpM4uoorKCqROoTncu/uXfTq1h5rvvgSLb1b4X5eHnp2bYe58xeia3d/AMCfN/7A0IF98EXMFjRt1lziiJ9NXgV/032lReMyKxt5eXlYtHSlhJG9GGtT8RMB08DPtHKegp1V798dQOLKRmZmJtavX4+4uDikpaUBAFxcXNC2bVuMGjUKjo6OUoZXqUpKSnD08H4UFhSgqVeLMvvk38+DmblFlUk0qrrioiJcvpSEMWPfVrfJ5XK0adMW5387J2Fk+uf+/TwAgJW1NQDgyuUkPHz4EK3b+Kr71K5TFy4urrhwPrFKJBvVydkzp+HfpR0srazQ6hUfvBMyATY2tlKHRTpEsp9a8fHx8Pf3h5mZGfz8/NCwYUMAQHp6OpYvX44FCxZg//79aNWq1VPPo1QqoVQq/9Mmh0KhEC12bbp+7Xe8++ZwFBUVwdTUDB8tXIbadeuV6pedfQ8x69ai34DBEkSpn+5l30NJSUmp4RJ7e3vcuPGHRFHpH5VKhaWLFqBZi5dRr34DAEBWViaMjIxgaWml0dfO3gFZWZlShKm3fNu1R5du3eH2Ug3cupmCNSuXYmLI21j35RYYGBhIHZ7u0PNhFMmSjfHjx+PVV19FdHQ0ZP8pJwqCgHfeeQfjx49HXFzcU88TGRmJOXPmaLRN/mAGpobP0nrMYnCvVQfrNn+H/Pt5OHr4AOZHfIgVazdqJBz59+9j2sR3UbtOPYwe966E0RJVvoWR83D92lV8tuErqUOhMvToGaD+c/0GDdGgoQcG9OmBhDOn8YqP71OO1DNVcNhMmyRLtX777TdMmjSpVKIBADKZDJMmTUJiYuIzzxMeHo6cnByN7b2waSJELA4jIyPUqOkOj8ZN8HboJNRv4IFvt/7/H9UH+fmY8t7bMDMzx0cLl8HQsGrPgahKbG1sYWBggKysLI32rKwsODhwxUNlWLTgI/z68zGs/nwjnJxd1O329g4oLi5GXl6uRv+7WZlcjSKxl2rUhI2tLW7dTJE6FNIhkiUbLi4uOH369BP3nz59Gs7Ozs88j0KhgJWVlcZWVYZQyqISVCguKgLwqKIxefw4GBkZIfLTFVX6vqoiI2NjNPZsglMn/19dU6lUOHUqDs2at5QwsupPEAQsWvARjh05hJVr18PtpRoa+xs1bgJDQ0PEnzqpbvvrzxtIS0uFV7MWlRwt/Vt6ehpysrPh4KA/c+7KQyaTaWWrqiQbRpkyZQrGjRuHhIQEdOvWTZ1YpKen4/Dhw/j888+xaNEiqcKrFGtXLoFP2w5wdnHFgwf5OLRvDxIT4rFoxVp1olFYWIAZc5ch/34+8u/nAwBsbG05FlpJRgSNxszp09CkSVM09WqGrzbFoKCgAIEDBkodWrW2MHIeDuzdg6glK2Fubo6szDsAAHMLS5iYmMDC0hJ9Awdh+eJPYG1tDXNzCyz+5GN4NWvByaFa9uBBPm6l/L9KcfvvW/j9ymVYWVvDytoaX0SvRhe/7rC3d8StWylYuXQRatR0R5u27SWMWvdU5URBGyRd+vr1119jyZIlSEhIQEnJoyV5BgYG8Pb2RlhYGIYMGfJc560qS18XzJuJs/GnkJV5B+YWlqhXvyGGBb2J1j5tcS7hNCa882aZx339/X64ur1UydE+n6q+9BUAtmz+Sv1QL49GjTFt+gw0q6I/0KrK0tc2LT3LbJ8x52P06TcAwL8e6rVvD4qKiuHTth3eD58J+yryG3VVWfqaEH8awWODSrUH9A3EtA9nY+qkUPx+5TLy8vLg6OgIH992eDvkvSo1nFUZS1/NB2/Qynnyt43Wynkqm048Z6O4uBiZmY9mkDs4OMDI6MV+QFWVZEMfVIdkozqpKsmGPqgqyYY+qJRk41UtJRvfVs1kQyce2GBkZARXV1epwyAiIhKFvg+j6PfCXyIiIhKdTlQ2iIiIqjN9r2ww2SAiIhIZkw0iIiISlb4nG5yzQUREVA2tWbMGzZo1Uz/w0tfXF3v37lXv79y5c6mHhr3zzjsa50hJSUFAQADMzMzg5OSEqVOn4uHDhxWOhZUNIiIisUlQ2KhRowYWLFiABg0aQBAExMTEoH///jh37hyaNGkCABg7dizmzp2rPsbMzEz955KSEgQEBMDFxQUnTpxAamoqRo4cCSMjI8yfP79CsTDZICIiEpkUwyh9+/bV+Pzxxx9jzZo1OHnypDrZMDMzg4uLS1mH48CBA7h06RIOHToEZ2dntGjRAvPmzcO0adMQEREBY2PjcsfCYRQiIqJqrqSkBFu3bkV+fj58ff//Nt7NmzfDwcEBTZs2RXh4OB48eKDeFxcXBy8vL433lPn7+yM3NxdJSUkVuj4rG0RERCLTVmVDqVRCqVRqtCkUiie+qPPChQvw9fVFYWEhLCwssGPHDnh6PnodwLBhw1CrVi24ubnh/PnzmDZtGpKTk7F9+3YAQFpaWqkXoj7+nJaWVqG4mWwQERGJTFvJRmRkJObMmaPRNnv2bERERJTZ38PDA4mJicjJycG2bdsQFBSEY8eOwdPTE+PGjVP38/LygqurK7p164br16+jXr16Won3MSYbREREVUR4eDjCwsI02p5U1QAAY2Nj1K9fHwDg7e2N+Ph4LFu2DGvXri3V18fHBwBw7do11KtXDy4uLjh9+rRGn/T0dAB44jyPJ+GcDSIiIpH9d4np824KhUK9lPXx9rRk479UKlWpYZjHEhMTAUD9rjJfX19cuHABGRkZ6j4HDx6ElZWVeiimvFjZICIiEpsES1/Dw8PRq1cvuLu7Iy8vD7GxsTh69Cj279+P69evIzY2Fr1794a9vT3Onz+PSZMmoWPHjmjWrBkAoEePHvD09MSIESMQFRWFtLQ0zJgxAyEhIRVKcAAmG0RERNVSRkYGRo4cidTUVFhbW6NZs2bYv38/unfvjps3b+LQoUNYunQp8vPzUbNmTQwaNAgzZsxQH29gYIDdu3cjODgYvr6+MDc3R1BQkMZzOcpLJgiCoM2b0wXpucVSh0D/sDYzkjoE+peCohKpQ6B/yPX88dW6xNpU/BkFDqO2auU8mRuHauU8lY2VDSIiIpHp+7tRmGwQERGJTN+TDa5GISIiIlGxskFERCQ2/S5sMNkgIiISG4dRiIiIiETEygYREZHI9L2ywWSDiIhIZPqebHAYhYiIiETFygYREZHI9L2ywWSDiIhIbPqda3AYhYiIiMTFygYREZHIOIxCREREomKyQURERKLS92SDczaIiIhIVKxsEBERiU2/CxtMNoiIiMTGYRQiIiIiEbGyQUREJDJ9r2ww2SAiIhKZvicbHEYhIiIiUbGyQUREJDJ9r2ww2SAiIhKbfucaHEYhIiIicVXLyoa1mZHUIdA/VCpB6hDoXxSG/P1CV5Tw74Ze4TAKERERiYrJBhEREYlKz3MNztkgIiIicbGyQUREJDIOoxAREZGo9DzX4DAKERERiYuVDSIiIpHp+zAKKxtEREQik8m0s1XEmjVr0KxZM1hZWcHKygq+vr7Yu3even9hYSFCQkJgb28PCwsLDBo0COnp6RrnSElJQUBAAMzMzODk5ISpU6fi4cOHFb5/JhtERETVUI0aNbBgwQIkJCTgzJkz6Nq1K/r374+kpCQAwKRJk7Br1y58++23OHbsGG7fvo2BAweqjy8pKUFAQACKiopw4sQJxMTEYOPGjZg1a1aFY5EJglDtHmNXWPGki0TCJ4gSlY1PENUdlibi/97tOf2AVs5zaX6PFzrezs4OCxcuxODBg+Ho6IjY2FgMHjwYAHDlyhU0btwYcXFxaNOmDfbu3Ys+ffrg9u3bcHZ2BgBER0dj2rRpuHPnDoyNjct9XVY2iIiIRCbFMMq/lZSUYOvWrcjPz4evry8SEhJQXFwMPz8/dZ9GjRrB3d0dcXFxAIC4uDh4eXmpEw0A8Pf3R25urro6Ul6cIEpERFRFKJVKKJVKjTaFQgGFQlFm/wsXLsDX1xeFhYWwsLDAjh074OnpicTERBgbG8PGxkajv7OzM9LS0gAAaWlpGonG4/2P91UEKxtEREQik8lkWtkiIyNhbW2tsUVGRj7xuh4eHkhMTMSpU6cQHByMoKAgXLp0qRLv/BFWNoiIiESmrZWv4eHhCAsL02h7UlUDAIyNjVG/fn0AgLe3N+Lj47Fs2TK89tprKCoqQnZ2tkZ1Iz09HS4uLgAAFxcXnD59WuN8j1erPO5TXqxsEBERiUxblQ2FQqFeyvp4e1qy8V8qlQpKpRLe3t4wMjLC4cOH1fuSk5ORkpICX19fAICvry8uXLiAjIwMdZ+DBw/CysoKnp6eFbp/VjaIiIiqofDwcPTq1Qvu7u7Iy8tDbGwsjh49iv3798Pa2hpjxoxBWFgY7OzsYGVlhfHjx8PX1xdt2rQBAPTo0QOenp4YMWIEoqKikJaWhhkzZiAkJKRCCQ7AZIOIiEh0UjxBNCMjAyNHjkRqaiqsra3RrFkz7N+/H927dwcALFmyBHK5HIMGDYJSqYS/vz9Wr16tPt7AwAC7d+9GcHAwfH19YW5ujqCgIMydO7fCsfA5GyQqPmeDqGx8zobuqIznbLSIOPzsTuWQGNFNK+epbJyzQURERKLiMAoREZHI9P1FbEw2iIiIRKbnuQaHUYiIiEhcrGwQERGJjMMoREREJCo9zzU4jEJERETiYmWDiIhIZBxGISIiIlHpea7BZIOIiEhs+l7Z4JwNIiIiEhUrG0RERCLT88IGkw0iIiKxcRiFiIiISESsbBAREYlMzwsbTDaIiIjExmEUIiIiIhGxskFERCQyPS9sMNkgIiISG4dRiIiIiETEygYREZHIWNkgnbM1djN6de+K1i29MHzoq7hw/rzUIemtjPR0fPjBVHRu74M2rZrj1QF9kZR0Qeqw9E706hVo6dVIYxvQt5fUYemFswnxmDQ+GD39OqJV88Y4euSQet/D4mIsX7IIrw3qh/Y+L6OnX0fM+nAa7mRkSBixbpLJtLNVVaxs6Jh9e3/EoqhIzJg9B15ezbF5UwyC3x6D73fvg729vdTh6ZXcnByMGvk6Wrf2wco1n8PW1g4pKX/Cyspa6tD0Ur36DRD9+Xr1ZwMD/vNVGQoKCtDAwwP9Agdiath7GvsKCwtx5colvDUuGA08GiEvNweLPolE2IR3sWnLNoki1k36Xtng31YdsylmAwYOHoLAAYMAADNmz8Hx40exc/t3GDN2nMTR6ZcN67+Ai4sr5nwUqW57qUYNCSPSbwYGBnBwcJQ6DL3Trn1HtGvfscx9FpaWWL12vUbb++EzEDR8CNJSb8PF1a0yQqQqgMMoOqS4qAiXLyWhjW9bdZtcLkebNm1x/rdzEkamn44dPQJPz6aYGjYBXTu1xdBXB2D7tm+kDktvpaT8he5dO6BPTz9MnzYFqam3pQ6JynD/fh5kMhksLK2kDkWn6Pswik4nGzdv3sSbb74pdRiV5l72PZSUlJQaLrG3t0dmZqZEUemvv2/dxLffbIF7rVpYHf0FXh0yFFELPsYP3++QOjS909SrOebOi8SqNV9g+szZ+PvvW3gz6A3k59+XOjT6F6VSiRVLF8O/VwAsLCykDkenyGQyrWxVlU4Po9y9excxMTFYv379E/solUoolUqNNsFAAYVCIXZ4VM2pVAI8mzTB+AlhAIBGjT1x7dpVbPtmK/r1HyBxdPqlfYf/l/EbenjAy6s5evt3xYH9+zBg4GAJI6PHHhYX44OpkyAIAj74cLbU4ZCOkTTZ+OGHH566/48//njmOSIjIzFnzhyNtg9nzsaMWREvEpokbG1sYWBggKysLI32rKwsODg4SBSV/nJwdETdevU12urUrYfDhw5IFBE9ZmllBfdatXEz5S+pQyH8P9FIS72NNZ9vYFWjDFW4KKEVkiYbgYGBkMlkEAThiX2eVTYKDw9HWFiYRptgUDWrGkbGxmjs2QSnTsahazc/AIBKpcKpU3EY+vobEkenf1q0aIm//ryh0Zby559w5aQ3yT14kI9bN28ioG8/qUPRe48TjZSUv7D2ixjY2NhKHZJOkut5tiHpnA1XV1ds374dKpWqzO3s2bPPPIdCoYCVlZXGVpWHUEYEjcb2bd/gh5078Mf16/hobgQKCgoQOGCg1KHpnTdGjsKF879h3efRSEn5C3v37MJ3332D14YOlzo0vfPpok9wJv40bv99C4mJZxE2YTzkBnL07NVH6tCqvQcP8pF85TKSr1wGAPz99y0kX7mMtNTbeFhcjPenTMTlS0n4KHIhSlQlyMy8g8zMOyguLpI4ctIlklY2vL29kZCQgP79+5e5/1lVj+qoZ6/euHf3LlavXI7MzDvwaNQYq9d+AXsOo1S6Jk29sHjpCqxY+ik+i16Nl16qganvh6N3n75Sh6Z30tPTET5tMnKys2Fra4cWL3vjy81fw87OTurQqr1LSUl4560g9ecliz4BAPTpF4hx74Ti+NEjAIBhQzTnMUV/EYNWrV+pvEB1nJ4XNiATJPxp/vPPPyM/Px89e/Ysc39+fj7OnDmDTp06Vei8hQ+1ER1pg0qlX8kiUXmV8O+GzrA0Eb/I77/6lFbOs/9dH62cp7JJWtno0KHDU/ebm5tXONEgIiLSNXI9r2zo9HM2iIiI6PlERkaidevWsLS0hJOTEwIDA5GcnKzRp3PnzqWe5fHOO+9o9ElJSUFAQADMzMzg5OSEqVOn4uHDig0h6PRzNoiIiKoDKR7IdezYMYSEhKB169Z4+PAhpk+fjh49euDSpUswNzdX9xs7dizmzp2r/mxmZqb+c0lJCQICAuDi4oITJ04gNTUVI0eOhJGREebPn1/uWJhsEBERiUyKCaL79u3T+Lxx40Y4OTkhISEBHTv+/0F5ZmZmcHFxKfMcBw4cwKVLl3Do0CE4OzujRYsWmDdvHqZNm4aIiAgYGxuXKxYOoxAREemBnJwcACi1imvz5s1wcHBA06ZNER4ejgcPHqj3xcXFwcvLC87Ozuo2f39/5ObmIikpqdzXZmWDiIhIZDJop7RR1is6FIpnv6JDpVJh4sSJaNeuHZo2bapuHzZsGGrVqgU3NzecP38e06ZNQ3JyMrZv3w4ASEtL00g0AKg/p6WllTtuJhtEREQi09ZqlLJe0TF79mxEREQ89biQkBBcvHgRv/zyi0b7uHHj1H/28vKCq6srunXrhuvXr6NevXraCRocRiEiIqoywsPDkZOTo7GFh4c/9ZjQ0FDs3r0bP/30E2rUqPHUvj4+j57jce3aNQCAi4sL0tPTNfo8/vykeR5lYbJBREQkMm29Yr4ir+gQBAGhoaHYsWMHjhw5gjp16jwzzsTERACPXicCAL6+vrhw4QIyMjLUfQ4ePAgrKyt4enqW+/45jEJERCQyKVajhISEIDY2Ft9//z0sLS3Vcyysra1hamqK69evIzY2Fr1794a9vT3Onz+PSZMmoWPHjmjWrBkAoEePHvD09MSIESMQFRWFtLQ0zJgxAyEhIRV6D5mkjysXCx9Xrjv4uHKisvFx5bqjMh5XHvjFGa2cZ+dbrcrd90nP9tiwYQNGjRqFmzdv4o033sDFixeRn5+PmjVrYsCAAZgxYwasrKzU/f/66y8EBwfj6NGjMDc3R1BQEBYsWABDw/LXK5hskKiYbBCVjcmG7qiMZGPgugStnGf7GG+tnKeycRiFiIhIZPr+1lcmG0RERCKT4nHluoSrUYiIiEhUrGwQERGJTM8LG0w2iIiIxCbX82yDwyhEREQkKlY2iIiIRKbfdQ0mG0RERKLjahQiIiIiEbGyQUREJDJtvWK+qipXsvHDDz+U+4T9+vV77mCIiIiqI30fRilXshEYGFiuk8lkMpSUlLxIPERERFTNlCvZUKlUYsdBRERUbel5YYNzNoiIiMTGYZTnkJ+fj2PHjiElJQVFRUUa+9577z2tBEZERFRdcIJoBZ07dw69e/fGgwcPkJ+fDzs7O2RmZsLMzAxOTk5MNoiIiEhDhZ+zMWnSJPTt2xf37t2DqakpTp48ib/++gve3t5YtGiRGDESERFVaTKZTCtbVVXhZCMxMRGTJ0+GXC6HgYEBlEolatasiaioKEyfPl2MGImIiKo0mZa2qqrCyYaRkRHk8keHOTk5ISUlBQBgbW2Nmzdvajc6IiIiqvIqPGejZcuWiI+PR4MGDdCpUyfMmjULmZmZ2LRpE5o2bSpGjERERFUaXzFfQfPnz4erqysA4OOPP4atrS2Cg4Nx584dfPbZZ1oPkIiIqKqTybSzVVUVrmy0atVK/WcnJyfs27dPqwERERFR9cKHehEREYmsKq8k0YYKJxt16tR56hftjz/+eKGAiIiIqhs9zzUqnmxMnDhR43NxcTHOnTuHffv2YerUqdqKi4iIiKqJCicbEyZMKLN91apVOHPmzAsHREREVN1wNYqW9OrVC9999522TkdERFRtcDWKlmzbtg12dnbaOh0REVG1wQmiFdSyZUuNL5ogCEhLS8OdO3ewevVqrQZHREREVV+Fk43+/ftrJBtyuRyOjo7o3LkzGjVqpNXgnlfRQ5XUIdA/jA21NlJHWpB8O0/qEOgfFiZ88oCusDQxFf0a+v4vYYX/b4+IiBAhDCIioupL34dRKpxsGRgYICMjo1R7VlYWDAwMtBIUERERVR8VrmwIglBmu1KphLGx8QsHREREVN3I9buwUf5kY/ny5QAelYK++OILWFhYqPeVlJTg+PHjOjNng4iISJdIkWxERkZi+/btuHLlCkxNTdG2bVt88skn8PDwUPcpLCzE5MmTsXXrViiVSvj7+2P16tVwdnZW90lJSUFwcDB++uknWFhYICgoCJGRkTA0LH+9otw9lyxZAuBRZSM6OlpjyMTY2Bi1a9dGdHR0uS9MRERE4jl27BhCQkLQunVrPHz4ENOnT0ePHj1w6dIlmJubAwAmTZqEPXv24Ntvv4W1tTVCQ0MxcOBA/PrrrwAeFRMCAgLg4uKCEydOIDU1FSNHjoSRkRHmz59f7lhkwpPGRZ6gS5cu2L59O2xtbStyWKXKLeRqFF3B1Si6hatRdAdXo+iOek7ir0aZvCtZK+dZ3Nfj2Z2e4M6dO3BycsKxY8fQsWNH5OTkwNHREbGxsRg8eDAA4MqVK2jcuDHi4uLQpk0b7N27F3369MHt27fV1Y7o6GhMmzYNd+7cKff0iQr/JPjpp590OtEgIiLSNXKZdrYXkZOTAwDqB3AmJCSguLgYfn5+6j6NGjWCu7s74uLiAABxcXHw8vLSGFbx9/dHbm4ukpKSyn//FQ120KBB+OSTT0q1R0VF4dVXX63o6YiIiKiclEolcnNzNTalUvnM41QqFSZOnIh27dqhadOmAIC0tDQYGxvDxsZGo6+zszPS0tLUff6daDze/3hfeVU42Th+/Dh69+5dqr1Xr144fvx4RU9HRERU7Wnr3SiRkZGwtrbW2CIjI595/ZCQEFy8eBFbt26thLstrcKDhvfv3y9zjMbIyAi5ublaCYqIiKg60dZbX8PDwxEWFqbRplAonnpMaGgodu/ejePHj6NGjRrqdhcXFxQVFSE7O1ujupGeng4XFxd1n9OnT2ucLz09Xb2vvCpc2fDy8sLXX39dqn3r1q3w9PSs6OmIiIiqPbmWNoVCASsrK43tScmGIAgIDQ3Fjh07cOTIEdSpU0djv7e3N4yMjHD48GF1W3JyMlJSUuDr6wsA8PX1xYULFzQe5nnw4EFYWVlV6Gd+hSsbM2fOxMCBA3H9+nV07doVAHD48GHExsZi27ZtFT0dERERiSAkJASxsbH4/vvvYWlpqZ5jYW1tDVNTU1hbW2PMmDEICwuDnZ0drKysMH78ePj6+qJNmzYAgB49esDT0xMjRoxAVFQU0tLSMGPGDISEhDyzovJvFU42+vbti507d2L+/PnYtm0bTE1N0bx5cxw5coSvmCciIiqDFK9GWbNmDQCgc+fOGu0bNmzAqFGjADx6hpZcLsegQYM0Hur1mIGBAXbv3o3g4GD4+vrC3NwcQUFBmDt3boViqfBzNv4rNzcXW7Zswbp165CQkICSkpIXOZ1W8DkbuoPP2dAtfM6G7uBzNnRHZTxnY+a+q1o5z7yeDbRynsr23D8Jjh8/jqCgILi5uWHx4sXo2rUrTp48qc3YiIiIqBqoUGqdlpaGjRs3Yt26dcjNzcWQIUOgVCqxc+dOTg4lIiJ6Aj1/w3z5Kxt9+/aFh4cHzp8/j6VLl+L27dtYsWKFmLERERFVC7rwBFEplbuysXfvXrz33nsIDg5GgwZVc8yIiIiIKl+5Kxu//PIL8vLy4O3tDR8fH6xcuRKZmZlixkZERFQtyGUyrWxVVbmTjTZt2uDzzz9Hamoq3n77bWzduhVubm5QqVQ4ePAg8vI4y52IiKgs2npceVVV4dUo5ubmePPNN/HLL7/gwoULmDx5MhYsWAAnJyf069dPjBiJiIioCnuhhyB4eHggKioKt27dwpYtW7QVExERUbXCCaJaYGBggMDAQAQGBmrjdERERNWKDFU4U9ACPsKOiIhIZFW5KqENfJY0ERERiYqVDSIiIpHpe2WDyQYREZHIZFV53aoWcBiFiIiIRMXKBhERkcg4jEJERESi0vNRFA6jEBERkbhY2SAiIhJZVX6JmjYw2SAiIhKZvs/Z4DAKERERiYqVDSIiIpHp+SgKkw0iIiKxyfkiNiIiIhKTvlc2OGeDiIiIRMXKBhERkci4GoUkdTYhHpPGB6OXX0e0bt4YR48cemLfyHkRaN28MWK/iqnECGlr7Gb06t4VrVt6YfjQV3Hh/HmpQ6rWVCUl2LpxDUJG9MPwgHYYP7I/tn31BQRBKLP/Z0vnY0j3VtizPbaSI9UPX61fg94dWmhs44YHqven/n0T86ZPwtA+XTDIvx3mz5qKe3ezpAtYR8llMq1sVRWTDYkVFBSgoYcH3g+f+dR+Px0+iAsXfoOjo1MlRUYAsG/vj1gUFYm33w3B1m93wMOjEYLfHoOsLP5jKpadX8fg4K5tGBP6Ppas+xbD3xqPH775Ent3fl2q7+lffsLVyxdha+8oQaT6o1adevhq5yH1tnDVBgBAYUEBPgwLhkwmQ+Syz7Bo9UY8LC7GnA/eg0qlkjhq0iVMNiTWrn1HBIdORJdu3Z/YJyM9HYsWfIx586NgaMSRr8q0KWYDBg4egsABg1Cvfn3MmD0HJiYm2Ln9O6lDq7Z+v3Qerdp2wss+7eHk4oY2Hf3QzNsH15KTNPrdzczA+lUL8V74PBga8u+FmAwMDGBn76DerG1sAQCXLpxDRtpthE2fizr1GqBOvQaY/OE8XL1yCb+dPS1x1LpFJtPOVlUx2dBxKpUKsz+chjdGvYl69RtIHY5eKS4qwuVLSWjj21bdJpfL0aZNW5z/7ZyEkVVvDT2b4eK5eNy+9RcA4M/rvyP54m9o2fr/3weVSoUVn8xCv1dHoGbtelKFqjf+vpWCNwK7480hAYiaG46M9FQAQHFxMSCTwcjIWN3X2FgBmVyOpPP8O/Jv+j6Mwl8HdFzMhi9gYGCAocNGSB2K3rmXfQ8lJSWwt7fXaLe3t8eNG39IFFX1Fzh0FAoe5GPSm4Mhl8uhUqkwdPS76NCtl7rP91/HwEBugF4DhkoYqX7w8PRC2PS5qFGzNu5mZSJ2YzSmhryJNV9uQyNPL5iYmGJ99FIEjRsPCMCG6GVQlZTgXlam1KGTDpE82SgoKEBCQgLs7Ozg6empsa+wsBDffPMNRo4c+cTjlUollEqlZptgBIVCIUq8lenypSRs3bwJX239DrIqnNESVUTcsYP45cg+vBf+EWrWroc/ryVj45pPYWvviM49+uCP3y/jxx1b8cnqr/j3ohK0btNe/ec69RvCw7MpRr3aGz8fOQD/PgMwfW4UVi6ejx+2bYFMLkenbj1Rv2FjyGQsnP+bvv+vKmmy8fvvv6NHjx5ISUmBTCZD+/btsXXrVri6ugIAcnJyMHr06KcmG5GRkZgzZ45G2wcfzkL4jNmixl4Zzp09g3t3s9C3Z1d1W0lJCZYtjsLWzV/ih72HJYyu+rO1sYWBgUGpyaBZWVlwcHCQKKrq76vPl6P/a0Fo18UfAOBepz7uZKRi59YN6NyjDy5fPIfc7Lt4d3gf9TEqVQm+XLsUP27fglVf7ZIqdL1gYWmFl2q64/atmwCAl19pi/Vf70ZO9j0YGBjAwtIKw/t3g4vbSxJHqlv0PfWSNNmYNm0amjZtijNnziA7OxsTJ05Eu3btcPToUbi7u5frHOHh4QgLC9NoUwpGYoRb6Xr36YdXfHw12t4LHoteffqhb+BAiaLSH0bGxmjs2QSnTsahazc/AI/mCpw6FYehr78hcXTVl7KwEHK55j/NcrkBBNWjpa8d/XrDq+UrGvs/Dh+Pjn690cW/b6XFqa8KHjxA6t+30NVfM+F+PGk0MeE0su/dRZv2nSWIjnSVpMnGiRMncOjQITg4OMDBwQG7du3Cu+++iw4dOuCnn36Cubn5M8+hUChKDZnkFladJVcPHuTjZkqK+vPtv28h+cplWFtbw8XVDTb//AV+zNDIEPYODqhdu05lh6qXRgSNxszp09CkSVM09WqGrzbFoKCgAIEDmOyJxbtNB2yPXQ8HJxfUqFUXf15Lxu7vNqOLfz8AgKWVDSytbDSOMTQ0hI2dPdxq1q78gKu5L1Z9Cp+2HeHk4oqszDv4av0ayOUG6NytJwDgwJ6dcK9dF9Y2trh88TzWLo9C4JA3UMO9trSB6xiphvyOHz+OhQsXIiEhAampqdixYwcCAwPV+0eNGoWYGM1nN/n7+2Pfvn3qz3fv3sX48eOxa9cuyOVyDBo0CMuWLYOFhUW545A02SgoKNBYsiaTybBmzRqEhoaiU6dOiI2t/g/puZyUhHfeClJ/XrLoEwBAQL9ARMyLlCos+kfPXr1x7+5drF65HJmZd+DRqDFWr/0C9hxGEc2boVPx9cZofLF8AXKy78HO3gHdAwZi8BtjpQ5NL2VmpOOTOeHIzc2GtY0tmni1xJK1X8La1g4A8PfNvxDz2Qrk5ebAycUNr414CwNeY+Xvv6SaspGfn4/mzZvjzTffxMCBZf+S1LNnT2zYsEH9+b+/wA8fPhypqak4ePAgiouLMXr0aIwbN65CP6NlwpMey1cJXnnlFYwfPx4jRpReaREaGorNmzcjNzcXJSUlFTpvVapsVHfGhvo+Uqlbkm/nSR0C/cPCRPL5+fSPek6mol/jq4RbWjnPG941nvtYmUxWZmUjOzsbO3fuLPOYy5cvw9PTE/Hx8WjVqhUAYN++fejduzdu3boFNze3cl1b0p8EAwYMwJYtW8rct3LlSrz++utPfEQxERERvbijR4/CyckJHh4eCA4O1pgUHxcXBxsbG3WiAQB+fn6Qy+U4depUua8habIRHh6OH3/88Yn7V69ezUfeEhFRlSfT0qZUKpGbm6ux/ffxDxXRs2dPfPnllzh8+DA++eQTHDt2DL169VKPKKSlpcHJSfM1GYaGhrCzs0NaWlq5r8MaNxERkci09bjyyMhIWFtba2yRkc8/v2/o0KHo168fvLy8EBgYiN27dyM+Ph5Hjx7V3s2DyQYREVGVER4ejpycHI0tPDxca+evW7cuHBwccO3aNQCAi4sLMjIyNPo8fPgQd+/ehYuLS7nPyxlKREREItPW0teyHvegTbdu3UJWVpb64Zq+vr7Izs5GQkICvL29AQBHjhyBSqWCj49Puc/LZIOIiEhkUg0j3L9/X12lAIAbN24gMTERdnZ2sLOzw5w5czBo0CC4uLjg+vXreP/991G/fn34+z96gm/jxo3Rs2dPjB07FtHR0SguLkZoaCiGDh1a7pUoAIdRiIiIqq0zZ86gZcuWaNmyJQAgLCwMLVu2xKxZs2BgYIDz58+jX79+aNiwIcaMGQNvb2/8/PPPGtWTzZs3o1GjRujWrRt69+6N9u3b47PPPqtQHJI+Z0MsfM6G7uBzNnQLn7OhO/icDd1RGc/Z+CbxtlbOM6RF+asJuoT/txMREYlMz1/6ymEUIiIiEhcrG0RERCKT6kVsuoLJBhERkcj0fRiByQYREZHI9L2yoe/JFhEREYmMlQ0iIiKR6Xddg8kGERGR6PR8FIXDKERERCQuVjaIiIhEJtfzgRQmG0RERCLjMAoRERGRiFjZICIiEpmMwyhEREQkJg6jEBEREYmIlQ0iIiKRcTUKERERiUrfh1GYbBAREYlM35MNztkgIiIiUbGyQUREJDIufSUiIiJRyfU71+AwChEREYmLlQ0iIiKRcRiFiIiIRMXVKEREREQiYmWDiIhIZBxGISIiIlFxNQoRERGRiFjZICIiEhmHUYiIiEhU+r4ahckGERGRyPQ81+CcDSIiIhIXKxtEREQik+v5OEq1TDb0+1uqW0pUgtQh0L+YGhtIHQL9o6n/VKlDoH8UnFsp+jWk+rl0/PhxLFy4EAkJCUhNTcWOHTsQGBio3i8IAmbPno3PP/8c2dnZaNeuHdasWYMGDRqo+9y9exfjx4/Hrl27IJfLMWjQICxbtgwWFhbljoPDKERERNVUfn4+mjdvjlWrVpW5PyoqCsuXL0d0dDROnToFc3Nz+Pv7o7CwUN1n+PDhSEpKwsGDB7F7924cP34c48aNq1AcMkEQqt2vnnmFKqlDoH/I9f1JNjrm77sFUodA/2je632pQ6B/VEZl4+T1bK2cp009m+c+ViaTaVQ2BEGAm5sbJk+ejClTpgAAcnJy4OzsjI0bN2Lo0KG4fPkyPD09ER8fj1atWgEA9u3bh969e+PWrVtwc3Mr17VZ2SAiIhKZTEv/adONGzeQlpYGPz8/dZu1tTV8fHwQFxcHAIiLi4ONjY060QAAPz8/yOVynDp1qtzXqpZzNoiIiKojpVIJpVKp0aZQKKBQKCp8rrS0NACAs7OzRruzs7N6X1paGpycnDT2Gxoaws7OTt2nPFjZICIiEplMpp0tMjIS1tbWGltkZKTUt/dMrGwQERGJTFsDIOHh4QgLC9Noe56qBgC4uLgAANLT0+Hq6qpuT09PR4sWLdR9MjIyNI57+PAh7t69qz6+PFjZICIiqiIUCgWsrKw0tudNNurUqQMXFxccPnxY3Zabm4tTp07B19cXAODr64vs7GwkJCSo+xw5cgQqlQo+Pj7lvhYrG0RERGKTaGHe/fv3ce3aNfXnGzduIDExEXZ2dnB3d8fEiRPx0UcfoUGDBqhTpw5mzpwJNzc39YqVxo0bo2fPnhg7diyio6NRXFyM0NBQDB06tNwrUQAmG0RERKKT6q2vZ86cQZcuXdSfHw/BBAUFYePGjXj//feRn5+PcePGITs7G+3bt8e+fftgYmKiPmbz5s0IDQ1Ft27d1A/1Wr58eYXi4HM2SFR8zoZu4XM2dAefs6E7KuM5Gwl/5mrlPN61rbRynsrGORtEREQkKg6jEBERiUzfa7xMNoiIiMSm59kGh1GIiIhIVKxsEBERiUyq1Si6gskGERGRyGT6nWtwGIWIiIjExcoGERGRyPS8sMFkg4iISHR6nm1wGIWIiIhExcoGERGRyLgahYiIiESl76tRmGwQERGJTM9zDc7ZICIiInGxskFERCQ2PS9tMNkgIiISmb5PEOUwChEREYmKlQ0iIiKRcTUKERERiUrPcw0OoxAREZG4WNkgIiISm56XNphsEBERiYyrUYiIiIhExMoGERGRyLgahYiIiESl57kGkw0iIiLR6Xm2wTkbREREJCpWNoiIiESm76tRmGwQERGJTN8niHIYhYiIiETFyobEzibEY9PG9bh8OQmZd+5g0ZIV6NzVDwDwsLgYq1cuw6+/HMfft27BwtICr/j4YvyEyXB0cpI48uovwL8rUm/fLtX+6mvDED5jlgQR6Y+sOxnYuHYZEk79CmVhIVxfqokJH0SgQaMmAIAlkbNwZN8ujWNefqUt5ixcJUW41daU0d0x773+WLn5J0xd9B1srcwwMzgA3do0Qk0XW2Teu49dR89jzurdyL1fqD6upostlk1/DZ1aNcT9AiU27zqFmSt+QEmJSsK7kZaeFzaYbEitoKAADTw80C9wIKaGvaexr7CwEFeuXMJb44LRwKMR8nJzsOiTSIRNeBebtmyTKGL98dWWbShRlag/X796FcHj3kR3f38Jo6r+7ufl4v3QUfBq0RoRUSthZWOL27dSYGFppdHv5VfaYuIHc9SfjYyNKzvUas3b0x1jBrXD+d9vqdtcHa3h6miN8CU7cPmPNLi72mHFh0Ph6miNYVPXAQDkchm2Lw9GelYuuoxaDBdHa3wxbwSKH5Zg9spdT7pc9afn2QaTDYm1a98R7dp3LHOfhaUlVq9dr9H2fvgMBA0fgrTU23BxdauMEPWWrZ2dxucN6z5HjZru8G71ikQR6YdtsRvg4OiCieH/TyRcXF8q1c/I2Bi29g6VGZreMDc1xob5o/DuvC344K2e6vZL11Px+pQv1J9v3MpExMpdWP/xSBgYyFFSooKfb2M0ruuCgHdWIONuHs7//jfmrt6Dj97rj4+if0Txw5KyLknVHOdsVDH37+dBJpOV+i2PxFVcXIS9u39A/wEDIdP3mV4iO/3rMdRv5IkFs6bijf5dMWHMUOzftb1Uv4uJZ/BG/654541ArF78MXJzsis/2Gpqafhr2PfzRfx0KvmZfa0sTZCbX6geIvFpVgcXr91Gxt08dZ+DJy7D2tIUnvVcRYtZ18m09F9FREREQCaTaWyNGjVS7y8sLERISAjs7e1hYWGBQYMGIT09Xdu3DkAHKhuXL1/GyZMn4evri0aNGuHKlStYtmwZlEol3njjDXTt2lXqEHWGUqnEiqWL4d8rABYWFlKHo1d+OnwYeXl56Nd/gNShVHtpqX9j7/ffIvDVN/DqG2Nw9UoSPlseBUMjQ3Tr2Q8A4P1KW7Tt2BXOLi8h9fYtbPp8BSLeD8XC1TEwMDCQ+A6qtlf9vdGiUU20fyPqmX3tbcwRPrYX1n93Qt3mbG+FjKw8jX4Zd3Mf7XOwAp6dv1RLUv2O0qRJExw6dEj92dDw/z/2J02ahD179uDbb7+FtbU1QkNDMXDgQPz6669aj0PSZGPfvn3o378/LCws8ODBA+zYsQMjR45E8+bNoVKp0KNHDxw4cOCpCYdSqYRSqdRoKxKMoFAoxA6/Uj0sLsYHUydBEAR88OFsqcPROzt3bEPb9h3g6OQsdSjVnqBSob6HJ0aOGw8AqNewEf66cQ17v9+mTjY6dvt/ab92vQaoU68Bxr7eFxcTz6C5t48kcVcHNZxtsHDqIPQJXgll0cOn9rU0N8GO5cG4/EcqPlq7p5IipIoyNDSEi4tLqfacnBysW7cOsbGx6p+xGzZsQOPGjXHy5Em0adNGq3FIOowyd+5cTJ06FVlZWdiwYQOGDRuGsWPH4uDBgzh8+DCmTp2KBQsWPPUckZGRsLa21tgWL3z6MVXN40QjLfU2Vq1dx6pGJbt9+2+cPhmHAQNflToUvWBr74CatetqtNWsVQd3MtKeeIyLWw1YWdvg9t83xQ6vWmvZ2B3O9laIi52GvPhlyItfho6tGuDd1zshL34Z5PJHv55bmCnww6p3kfegEK+FfY6HD/+/yiQ9KxdO9pYa53WyezTsm56ZW3k3o2NkWtoq6urVq3Bzc0PdunUxfPhwpKSkAAASEhJQXFwMPz8/dd9GjRrB3d0dcXFxz3eTTyFpZSMpKQlffvklAGDIkCEYMWIEBg8erN4/fPhwbNiw4annCA8PR1hYmEZbkWCk/WAl8jjRSEn5C2u/iIGNja3UIemdH3Zuh52dPdp37CR1KHqhcdMW+DvlL422v2+lwMn5yeP9mRnpyMvNgR0njL6Qn04nw3vwxxptn815A8k30rF440GoVAIszU2wa3UIlEUPMXji2lIVkFPnb2DaGH842lrgzr37AIBubRohJ68Al/94csJY7WlpGKWsar5CoSizmu/j44ONGzfCw8MDqampmDNnDjp06ICLFy8iLS0NxsbGsLGx0TjG2dkZaWna/z5JPmfj8WQ7uVwOExMTWFtbq/dZWloiJyfnqceX9UXOK6w6a7kfPMjHzX8yTQD4++9bSL5yGdbW1nBwcMT7UyYi+fIlLFmxBiWqEmRm3gEAWFtbw8iIS/3EplKp8MPOHejTL1BjrJPE0//VN/B+yCh8s2kd2nfpjt8vJ2H/ru8QOmUmAKDgwQNsiVmLth27wdbOAWm3b2JD9DK4vlQTL7duK3H0Vdv9B0pcup6q0ZZfUIS7Ofm4dD0VluYm2L06BKYmxhj9YQyszE1gZW4CALhz7z5UKgGH4i7j8h9pWPdRED5cthPO9laYHdIHa785jqLipw/NVGfaelx5ZGQk5syZo9E2e/ZsRERElOrbq1cv9Z+bNWsGHx8f1KpVC9988w1MTU21Ek95SfqvZ+3atXH16lXUq1cPABAXFwd3d3f1/pSUFLi6Vu/Zy5eSkvDOW0Hqz0sWfQIA6NMvEOPeCcXxo0cAAMOGaE5MjP4iBq1acwmm2E6dPIG01NvoP2Cg1KHojYaNm2D6R4vx5WcrsPXLz+Ds8hLGhk5F5+69AQByAzn+vH4VR/btQv79PNg5OKJlK18MH/Mun7UhshaNauKVZnUAAJd2RWjs8+g9Cympd6FSCRg0YQ2WTR+KoxsnI79Qic27TmPuGs7r0IayqvnlnaNoY2ODhg0b4tq1a+jevTuKioqQnZ2tUd1IT08vc47Hi5IJgiBo/azlFB0djZo1ayIgIKDM/dOnT0dGRga++OKLMvc/SVWqbFR3j8d4STf8fbdA6hDoH817vS91CPSPgnMrRb9Gyl3lszuVg7vd8y9+uH//Ptzd3REREYGgoCA4Ojpiy5YtGDRoEAAgOTkZjRo1QlxcnNYniEqabIiFyYbuYLKhW5hs6A4mG7qjMpKNm1pKNmpWINmYMmUK+vbti1q1auH27duYPXs2EhMTcenSJTg6OiI4OBg//vgjNm7cCCsrK4wf/2gF2IkTJ55x5orjIDQREVE1dOvWLbz++uvIysqCo6Mj2rdvj5MnT8LR0REAsGTJEsjlcgwaNAhKpRL+/v5YvXq1KLGwskGiYmVDt7CyoTtY2dAdlVHZuHVPO5WNGrZV8xlSrGwQERGJTr9/8eK7UYiIiEhUrGwQERGJTN/f38hkg4iISGR6nmtwGIWIiIjExcoGERGRyDiMQkRERKLS1rtRqiomG0RERGLT71yDczaIiIhIXKxsEBERiUzPCxtMNoiIiMSm7xNEOYxCREREomJlg4iISGRcjUJERETi0u9cg8MoREREJC5WNoiIiESm54UNJhtERERi42oUIiIiIhGxskFERCQyrkYhIiIiUXEYhYiIiEhETDaIiIhIVBxGISIiEpm+D6Mw2SAiIhKZvk8Q5TAKERERiYqVDSIiIpFxGIWIiIhEpee5BodRiIiISFysbBAREYlNz0sbTDaIiIhExtUoRERERCJiZYOIiEhkXI1CREREotLzXIPDKERERKKTaWl7DqtWrULt2rVhYmICHx8fnD59+oVu5Xkw2SAiIqqmvv76a4SFhWH27Nk4e/YsmjdvDn9/f2RkZFRqHEw2iIiIRCbT0n8V9emnn2Ls2LEYPXo0PD09ER0dDTMzM6xfv16Eu3wyJhtEREQik8m0s1VEUVEREhIS4Ofnp26Ty+Xw8/NDXFyclu/w6ThBlIiIqIpQKpVQKpUabQqFAgqFolTfzMxMlJSUwNnZWaPd2dkZV65cETXO/6qWyYalSdUv2CiVSkRGRiI8PLzM/4mo8lSn70VDFzOpQ3gh1el7UXBupdQhvJDq9L2oDCZa+mkb8VEk5syZo9E2e/ZsREREaOcCIpEJgiBIHQSVlpubC2tra+Tk5MDKykrqcPQavxe6g98L3cHvhTQqUtkoKiqCmZkZtm3bhsDAQHV7UFAQsrOz8f3334sdrlrVLwEQERHpCYVCASsrK43tSZUlY2NjeHt74/Dhw+o2lUqFw4cPw9fXt7JCBlBNh1GIiIgICAsLQ1BQEFq1aoVXXnkFS5cuRX5+PkaPHl2pcTDZICIiqqZee+013LlzB7NmzUJaWhpatGiBffv2lZo0KjYmGzpKoVBg9uzZnHilA/i90B38XugOfi+qjtDQUISGhkoaAyeIEhERkag4QZSIiIhExWSDiIiIRMVkg4iIiETFZIOIiIhExWRDB61atQq1a9eGiYkJfHx8cPr0aalD0kvHjx9H37594ebmBplMhp07d0odkt6KjIxE69atYWlpCScnJwQGBiI5OVnqsPTSmjVr0KxZM/UDpXx9fbF3716pwyIdx2RDx3z99dcICwvD7NmzcfbsWTRv3hz+/v7IyMiQOjS9k5+fj+bNm2PVqlVSh6L3jh07hpCQEJw8eRIHDx5EcXExevTogfz8fKlD0zs1atTAggULkJCQgDNnzqBr167o378/kpKSpA6NdBiXvuoYHx8ftG7dGitXPnpJk0qlQs2aNTF+/Hh88MEHEkenv2QyGXbs2KHxfgGSzp07d+Dk5IRjx46hY8eOUoej9+zs7LBw4UKMGTNG6lBIR7GyoUOKioqQkJAAPz8/dZtcLoefnx/i4uIkjIxIt+Tk5AB49EOOpFNSUoKtW7ciPz+/0t+1QVULnyCqQzIzM1FSUlLqMbLOzs64cuWKRFER6RaVSoWJEyeiXbt2aNq0qdTh6KULFy7A19cXhYWFsLCwwI4dO+Dp6Sl1WKTDmGwQUZUSEhKCixcv4pdffpE6FL3l4eGBxMRE5OTkYNu2bQgKCsKxY8eYcNATMdnQIQ4ODjAwMEB6erpGe3p6OlxcXCSKikh3hIaGYvfu3Th+/Dhq1KghdTh6y9jYGPXr1wcAeHt7Iz4+HsuWLcPatWsljox0Feds6BBjY2N4e3vj8OHD6jaVSoXDhw9zPJT0miAICA0NxY4dO3DkyBHUqVNH6pDoX1QqFZRKpdRhkA5jZUPHhIWFISgoCK1atcIrr7yCpUuXIj8/H6NHj5Y6NL1z//59XLt2Tf35xo0bSExMhJ2dHdzd3SWMTP+EhIQgNjYW33//PSwtLZGWlgYAsLa2hqmpqcTR6Zfw8HD06tUL7u7uyMvLQ2xsLI4ePYr9+/dLHRrpMC591UErV67EwoULkZaWhhYtWmD58uXw8fGROiy9c/ToUXTp0qVUe1BQEDZu3Fj5AekxmUxWZvuGDRswatSoyg1Gz40ZMwaHDx9GamoqrK2t0axZM0ybNg3du3eXOjTSYUw2iIiISFScs0FERESiYrJBREREomKyQURERKJiskFERESiYrJBREREomKyQURERKJiskFERESiYrJBVA2NGjUKgYGB6s+dO3fGxIkTKz2Oo0ePQiaTITs7u9KvTUS6g8kGUSUaNWoUZDIZZDKZ+mVWc+fOxcOHD0W97vbt2zFv3rxy9WWCQETaxnejEFWynj17YsOGDVAqlfjxxx8REhICIyMjhIeHa/QrKiqCsbGxVq5pZ2enlfMQET0PVjaIKplCoYCLiwtq1aqF4OBg+Pn54YcfflAPfXz88cdwc3ODh4cHAODmzZsYMmQIbGxsYGdnh/79++PPP/9Un6+kpARhYWGwsbGBvb093n//ffz3LQT/HUZRKpWYNm0aatasCYVCgfr162PdunX4888/1e+DsbW1hUwmU797RKVSITIyEnXq1IGpqSmaN2+Obdu2aVznxx9/RMOGDWFqaoouXbpoxElE+ovJBpHETE1NUVRUBAA4fPgwkpOTcfDgQezevRvFxcXw9/eHpaUlfv75Z/z666+wsLBAz5491ccsXrwYGzduxPr16/HLL7/g7t272LFjx1OvOXLkSGzZsgXLly/H5cuXsXbtWlhYWKBmzZr47rvvAADJyclITU3FsmXLAACRkZH48ssvER0djaSkJEyaNAlvvPEGjh07BuBRUjRw4ED07dsXiYmJeOutt/DBBx+I9WUjoqpEIKJKExQUJPTv318QBEFQqVTCwYMHBYVCIUyZMkUICgoSnJ2dBaVSqe6/adMmwcPDQ1CpVOo2pVIpmJqaCvv37xcEQRBcXV2FqKgo9f7i4mKhRo0a6usIgiB06tRJmDBhgiAIgpCcnCwAEA4ePFhmjD/99JMAQLh37566rbCwUDAzMxNOnDih0XfMmDHC66+/LgiCIISHhwuenp4a+6dNm1bqXESkfzhng6iS7d69GxYWFiguLoZKpcKwYcMQERGBkJAQeHl5aczT+O2333Dt2jVYWlpqnKOwsBDXr19HTk4OUlNT4ePjo95naGiIVq1alRpKeSwxMREGBgbo1KlTuWO+du0aHjx4UOo14kVFRWjZsiUA4PLlyxpxAICvr2+5r0FE1ReTDaJK1qVLF6xZswbGxsZwc3ODoeH//xqam5tr9L1//z68vb2xefPmUudxdHR8ruubmppW+Jj79+8DAPbs2YOXXnpJY59CoXiuOIhIfzDZIKpk5ubmqF+/frn6vvzyy/j666/h5OQEKyurMvu4urri1KlT6NixIwDg4cOHSEhIwMsvv1xmfy8vL6hUKhw7dgx+fn6l9j+urJSUlKjbPD09oVAokJKS8sSKSOPGjfHDDz9otJ08efLZN0lE1R4niBLpsOHDh8PBwQH9+/fHzz//jBs3buDo0aN47733cOvWLQDAhAkTsGDBAuzcuRNXrlzBu++++9RnZNSuXRtBQUF48803sXPnTvU5v/nmGwBArVq1IJPJsHv3bty5cwf379+HpaUlpkyZgkmTJiEmJgbXr1/H2bNnsWLFCsTExAAA3nnnHVy9ehVTp05FcnIyYmNjsXHjRrG/RERUBTDZINJhZmZmOH78ONzd3TFw4EA0btwYY8aMQWFhobrSMXnyZIwYMQJBQUHw9fWFpaUlBgwY8NTzrlmzBoMHD8a7776LRo0aYezYscjPzwcAvPTSS5gzZw4++OADODs7IzQ0FAAwb948zJw5E5GRkWjcuDF69uyJPXv2oE6dOgAAd3d3fPfdd9i5cyeaN2+O6OhozJ8/X8SvDhFVFTLhSbPIiIiIiLSAlQ0iIiISFZMNIiIiEhWTDSIiIhIVkw0iIiISFZMNIiIiEhWTDSIiIhIVkw0iIiISFZMNIiIiEhWTDSIiIhIVkw0iIiISFZMNIiIiEhWTDSIiIhLV/wC3fYdtJ9SRLAAAAABJRU5ErkJggg==", + "image/png": "", "text/plain": [ - "
" + "
" ] }, "metadata": {}, @@ -1592,328 +1849,78 @@ "source": [ "import pandas as pd\n", "import numpy as np\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", - "from sklearn.compose import ColumnTransformer\n", - "from sklearn.pipeline import Pipeline\n", - "from sklearn.ensemble import RandomForestClassifier\n", - "from sklearn.metrics import classification_report, confusion_matrix, accuracy_score\n", - "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", - "\n", - "\n", - "# Загружаем набор данных\n", - "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", - "\n", - "# Устанавливаем случайное состояние\n", - "random_state = 42\n", - "\n", - "\n", - "# Предобработка данных\n", - "# Определяем категориальные и числовые столбцы\n", - "categorical_features = ['employment_type', 'job_title', 'employee_residence', 'company_location', 'company_size']\n", - "numeric_features = ['work_year', 'salary_in_usd', 'remote_ratio']\n", - "\n", - "# Создаем пайплайн для обработки данных\n", - "preprocessor = ColumnTransformer(\n", - " transformers=[\n", - " ('num', StandardScaler(), numeric_features),\n", - " ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)])\n", - "\n", - "# Определяем целевую переменную и признаки\n", - "X = df.drop('experience_level', axis=1)\n", - "y = df['experience_level']\n", - "\n", - "# Разделяем данные на обучающую и тестовую выборки\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=random_state)\n", - "\n", - "# Создаем и обучаем модель\n", - "model = Pipeline(steps=[\n", - " ('preprocessor', preprocessor),\n", - " ('classifier', RandomForestClassifier(random_state=random_state))])\n", - "\n", - "model.fit(X_train, y_train)\n", - "\n", - "# Делаем предсказания на тестовой выборке\n", - "y_pred = model.predict(X_test)\n", - "\n", - "# Оцениваем качество модели\n", - "print(\"Classification Report:\")\n", - "print(classification_report(y_test, y_pred))\n", - "\n", - "print(\"Confusion Matrix:\")\n", - "print(confusion_matrix(y_test, y_pred))\n", - "\n", - "print(f\"Accuracy Score: {accuracy_score(y_test, y_pred)}\")\n", - "\n", - "# Визуализация результатов\n", - "conf_matrix = confusion_matrix(y_test, y_pred)\n", - "sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')\n", - "plt.xlabel('Predicted')\n", - "plt.ylabel('Actual')\n", - "plt.title('Confusion Matrix')\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Ориентир**\n" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "MAE: 37795.639591701794\n", - "MSE: 2482079980.9527493\n", - "RMSE: 49820.47752634201\n", - "R²: 0.37127352660208646\n", - "Ориентиры для предсказания заработной платы не достигнуты.\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "d:\\MII\\AIM-PIbd-32-Kaznacheeva-E-K\\aimenv\\Lib\\site-packages\\sklearn\\metrics\\_regression.py:492: FutureWarning: 'squared' is deprecated in version 1.4 and will be removed in 1.6. To calculate the root mean squared error, use the function'root_mean_squared_error'.\n", - " warnings.warn(\n" - ] - } - ], - "source": [ - "import pandas as pd\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", + "from sklearn.model_selection import train_test_split, RandomizedSearchCV\n", + "from sklearn.preprocessing import OneHotEncoder, StandardScaler\n", "from sklearn.compose import ColumnTransformer\n", "from sklearn.pipeline import Pipeline\n", "from sklearn.linear_model import LinearRegression\n", - "from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n", + "from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor\n", + "from scipy.stats import uniform, randint\n", "\n", + "# Загрузка данных\n", "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", "\n", - "# Предобработка данных\n", - "categorical_features = ['experience_level', 'employment_type', 'job_title', 'employee_residence', 'company_location', 'company_size']\n", - "numeric_features = ['work_year', 'remote_ratio']\n", + "# ... (ваш код предобработки данных, как в предыдущем примере) ...\n", "\n", - "preprocessor = ColumnTransformer(\n", - " transformers=[\n", - " ('num', StandardScaler(), numeric_features),\n", - " ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)])\n", + "# Определение распределений для гиперпараметров\n", + "param_distributions = {\n", + " 'Linear Regression': {\n", + " 'regressor__fit_intercept': [True, False],\n", + " 'regressor__positive': [True, False]\n", + " },\n", + " 'Random Forest': {\n", + " 'regressor__n_estimators': randint(50, 200),\n", + " 'regressor__max_depth': [None, 10, 20],\n", + " 'regressor__min_samples_split': randint(2, 11),\n", + " 'regressor__min_samples_leaf': randint(1, 5),\n", + " 'regressor__bootstrap': [True, False]\n", + " },\n", + " 'Gradient Boosting': {\n", + " 'regressor__n_estimators': randint(50, 200),\n", + " 'regressor__learning_rate': uniform(0.01, 0.49), # uniform distribution for learning rate\n", + " 'regressor__max_depth': [3, 5, 7],\n", + " 'regressor__min_samples_split': randint(2, 11),\n", + " 'regressor__min_samples_leaf': randint(1, 5),\n", + " 'regressor__subsample': uniform(0.5, 0.5) # uniform distribution for subsample\n", "\n", - "X = df.drop('salary_in_usd', axis=1)\n", - "y = df['salary_in_usd']\n", + " }\n", + "}\n", "\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", + "# Словарь для хранения лучших моделей и их гиперпараметров\n", + "best_models = {}\n", "\n", - "model = Pipeline(steps=[\n", - " ('preprocessor', preprocessor),\n", - " ('regressor', LinearRegression())])\n", + "# Цикл для обучения и настройки гиперпараметров каждой модели\n", + "for model_name, model_params in param_distributions.items():\n", + " if model_name == 'Linear Regression':\n", + " model = LinearRegression()\n", + " elif model_name == 'Random Forest':\n", + " model = RandomForestRegressor(random_state=42)\n", + " elif model_name == 'Gradient Boosting':\n", + " model = GradientBoostingRegressor(random_state=42)\n", + " else:\n", + " continue #Обработка неизвестных моделей\n", "\n", - "model.fit(X_train, y_train)\n", + " pipeline = Pipeline([('regressor', model)])\n", + " random_search = RandomizedSearchCV(pipeline, param_distributions=model_params, n_iter=10, cv=3, n_jobs=-1, random_state=42)\n", + " random_search.fit(X_train, y_train)\n", + " best_models[model_name] = random_search.best_params_\n", "\n", - "y_pred = model.predict(X_test)\n", "\n", - "mae = mean_absolute_error(y_test, y_pred)\n", - "mse = mean_squared_error(y_test, y_pred)\n", - "rmse = mean_squared_error(y_test, y_pred, squared=False)\n", - "r2 = r2_score(y_test, y_pred)\n", + "# Визуализация лучших гиперпараметров\n", "\n", - "print(f\"MAE: {mae}\")\n", - "print(f\"MSE: {mse}\")\n", - "print(f\"RMSE: {rmse}\")\n", - "print(f\"R²: {r2}\")\n", + "fig, axes = plt.subplots(len(best_models), 1, figsize=(10, 5 * len(best_models)))\n", + "if len(best_models) == 1:\n", + " axes = [axes] # обработка случая с одной моделью\n", "\n", - "# Проверяем, достигнуты ли ориентиры\n", - "if r2 >= 0.75 and mae <= 15000 and rmse <= 20000:\n", - " print(\"Ориентиры для предсказания заработной платы достигнуты!\")\n", - "else:\n", - " print(\"Ориентиры для предсказания заработной платы не достигнуты.\")" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Accuracy: 0.7217043941411452\n", - "Classification Report:\n", - " precision recall f1-score support\n", - "\n", - " EN 0.55 0.48 0.51 67\n", - " EX 0.46 0.26 0.33 23\n", - " MI 0.48 0.54 0.51 157\n", - " SE 0.83 0.83 0.83 504\n", - "\n", - " accuracy 0.72 751\n", - " macro avg 0.58 0.53 0.55 751\n", - "weighted avg 0.72 0.72 0.72 751\n", - "\n", - "Confusion Matrix:\n", - "[[ 32 0 20 15]\n", - " [ 0 6 5 12]\n", - " [ 14 0 84 59]\n", - " [ 12 7 65 420]]\n", - "Ориентиры для классификации уровня опыта не достигнуты.\n" - ] - } - ], - "source": [ - "import pandas as pd\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", - "from sklearn.compose import ColumnTransformer\n", - "from sklearn.pipeline import Pipeline\n", - "from sklearn.ensemble import RandomForestClassifier\n", - "from sklearn.metrics import accuracy_score, classification_report, confusion_matrix\n", + "for i, (model_name, params) in enumerate(best_models.items()):\n", + " axes[i].bar(params.keys(), params.values())\n", + " axes[i].set_title(f\"Лучшие гиперпараметры для {model_name}\")\n", + " axes[i].set_xticklabels(params.keys(), rotation=45, ha=\"right\") #Поворачиваем подписи на оси х\n", + " axes[i].tick_params(axis='x', which='major', labelsize=8) # Размер шрифта подписей оси х\n", "\n", - "# Загружаем набор данных\n", - "df = pd.read_csv(\"..//static//csv//ds_salaries.csv\")\n", - "\n", - "# Предобработка данных\n", - "categorical_features = ['employment_type', 'job_title', 'employee_residence', 'company_location', 'company_size']\n", - "numeric_features = ['work_year', 'salary_in_usd', 'remote_ratio']\n", - "\n", - "preprocessor = ColumnTransformer(\n", - " transformers=[\n", - " ('num', StandardScaler(), numeric_features),\n", - " ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)])\n", - "\n", - "X = df.drop('experience_level', axis=1)\n", - "y = df['experience_level']\n", - "\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", - "\n", - "model = Pipeline(steps=[\n", - " ('preprocessor', preprocessor),\n", - " ('classifier', RandomForestClassifier(random_state=42))])\n", - "\n", - "model.fit(X_train, y_train)\n", - "\n", - "y_pred = model.predict(X_test)\n", - "\n", - "accuracy = accuracy_score(y_test, y_pred)\n", - "print(f\"Accuracy: {accuracy}\")\n", - "\n", - "print(\"Classification Report:\")\n", - "print(classification_report(y_test, y_pred))\n", - "\n", - "print(\"Confusion Matrix:\")\n", - "print(confusion_matrix(y_test, y_pred))\n", - "\n", - "# Проверяем, достигнуты ли ориентиры\n", - "if accuracy >= 0.80:\n", - " print(\"Ориентиры для классификации уровня опыта достигнуты!\")\n", - "else:\n", - " print(\"Ориентиры для классификации уровня опыта не достигнуты.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Конвейер" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn.base import BaseEstimator, TransformerMixin\n", - "from sklearn.compose import ColumnTransformer\n", - "from sklearn.impute import SimpleImputer\n", - "from sklearn.preprocessing import OneHotEncoder, StandardScaler\n", - "from sklearn.pipeline import Pipeline\n", - "\n", - "# Определение столбцов\n", - "numeric_columns = [\"work_year\", \"salary\", \"salary_in_usd\", \"remote_ratio\"]\n", - "cat_columns = [\"experience_level\", \"employment_type\", \"job_title\", \"salary_currency\", \"employee_residence\", \"company_location\", \"company_size\"]\n", - "\n", - "# Обработка числовых данных: заполнение пропущенных значений медианой и стандартизация\n", - "preprocessing_num_class = Pipeline(steps=[\n", - " ('imputer', SimpleImputer(strategy='median')),\n", - " ('scaler', StandardScaler())\n", - "])\n", - "\n", - "# Обработка категориальных данных: заполнение пропущенных значений наиболее частым значением и one-hot encoding\n", - "preprocessing_cat_class = Pipeline(steps=[\n", - " ('imputer', SimpleImputer(strategy='most_frequent')),\n", - " ('onehot', OneHotEncoder(handle_unknown='ignore'))\n", - "])\n", - "\n", - "# Объединение всех преобразований в один ColumnTransformer\n", - "features_preprocessing = ColumnTransformer(\n", - " verbose_feature_names_out=False,\n", - " transformers=[\n", - " (\"prepocessing_num\", preprocessing_num_class, numeric_columns),\n", - " (\"prepocessing_cat\", preprocessing_cat_class, cat_columns),\n", - " ],\n", - " remainder=\"passthrough\"\n", - ")\n", - "\n", - "# Определение конвейера\n", - "pipeline_end = Pipeline(\n", - " [\n", - " (\"features_preprocessing\", features_preprocessing),\n", - " ]\n", - ")\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "ename": "NameError", - "evalue": "name 'train_test_split' is not defined", - "output_type": "error", - "traceback": [ - "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)", - "Cell \u001b[1;32mIn[5], line 2\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[38;5;66;03m# Разделение данных на тренировочный и тестовый наборы\u001b[39;00m\n\u001b[1;32m----> 2\u001b[0m X_train, X_test \u001b[38;5;241m=\u001b[39m \u001b[43mtrain_test_split\u001b[49m(df, test_size\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m0.2\u001b[39m, random_state\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m42\u001b[39m)\n\u001b[0;32m 4\u001b[0m \u001b[38;5;66;03m# Применение конвейера для предобработки данных\u001b[39;00m\n\u001b[0;32m 5\u001b[0m preprocessing_result \u001b[38;5;241m=\u001b[39m pipeline_end\u001b[38;5;241m.\u001b[39mfit_transform(X_train)\n", - "\u001b[1;31mNameError\u001b[0m: name 'train_test_split' is not defined" - ] - } - ], - "source": [ - "# Разделение данных на тренировочный и тестовый наборы\n", - "X_train, X_test = train_test_split(df, test_size=0.2, random_state=42)\n", - "\n", - "# Применение конвейера для предобработки данных\n", - "preprocessing_result = pipeline_end.fit_transform(X_train)\n", - "\n", - "# Получение имен столбцов после преобразования\n", - "feature_names = pipeline_end.named_steps['features_preprocessing'].get_feature_names_out()\n", - "\n", - "# Создание DataFrame с преобразованными данными\n", - "preprocessed_df = pd.DataFrame(\n", - " preprocessing_result,\n", - " columns=feature_names,\n", - ")\n", - "\n", - "# Вывод преобразованного DataFrame\n", - "print(preprocessed_df)" + "plt.tight_layout()\n", + "plt.show()\n" ] } ], -- 2.25.1