2323 lines
149 KiB
Plaintext
2323 lines
149 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Лабораторная работа №3\n",
|
|||
|
"\n",
|
|||
|
"## Набор данных Students Performance in Exams (Успеваемость студентов на экзаменах)\n",
|
|||
|
"\n",
|
|||
|
"Выгрузка данных из CSV файла в датафрейм"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 674,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"from sklearn import set_config\n",
|
|||
|
"\n",
|
|||
|
"set_config(transform_output=\"pandas\")\n",
|
|||
|
"\n",
|
|||
|
"random_state=9\n",
|
|||
|
"# Загрузка данных\n",
|
|||
|
"df = pd.read_csv(\"..//..//static//csv//StudentsPerformance.csv\")\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Описание набора \n",
|
|||
|
"\n",
|
|||
|
"Контекст\n",
|
|||
|
"Оценки, полученные студентами\n",
|
|||
|
"\n",
|
|||
|
"Содержание\n",
|
|||
|
"Этот набор данных состоит из оценок, полученных учащимися по различным предметам.\n",
|
|||
|
"\n",
|
|||
|
"Вдохновение\n",
|
|||
|
"Понять влияние происхождения родителей, подготовки к тестированию и т. д. на успеваемость учащихся."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Анализ содержимого\n",
|
|||
|
"\n",
|
|||
|
"*Объекты наблюдения:* студенты, участвующие в экзаменах.\n",
|
|||
|
"\n",
|
|||
|
"*Атрибуты объектов:* \n",
|
|||
|
"\n",
|
|||
|
"1. gender — пол: определяет гендерную принадлежность студента (мужской, женский). \n",
|
|||
|
"2. race/ethnicity — этническая принадлежность: группа, к которой относится студент (например, различные расовые/этнические категории). \n",
|
|||
|
"3. parental level of education — уровень образования родителей(например, среднее образование, высшее образование и т.д.). \n",
|
|||
|
"4. lunch — тип обеда: информация о том, получает ли студент бесплатный или платный обед. \n",
|
|||
|
"5. test preparation course — курс подготовки к тесту\n",
|
|||
|
"6. math score — результаты экзаменов по математике.\n",
|
|||
|
"7. reading score — результаты экзаменов по чтению.\n",
|
|||
|
"8. writing score — результаты экзаменов по письму.\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"### Бизнес-цель:\n",
|
|||
|
"\n",
|
|||
|
"**Цель**: Разработка модели, которая будет классифицировать студентов на основе их предсказанных баллов в одну из категорий: High, Medium, Low. \n",
|
|||
|
"\n",
|
|||
|
"**Эффект**: Это позволит образовательным учреждениям не только выявлять студентов с низкими результатами, но и более точно классифицировать их на разные группы. Например, те, кто попадает в группу \"High\", могут получить более сложные задания, а те, кто в группе \"Low\", могут потребовать дополнительной помощи.\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"### Техническая цель\n",
|
|||
|
"\n",
|
|||
|
"**Цель**: Разработка классификационной модели, которая будет работать с целевым признаком \"total_score_discrete\", классифицируя студентов по трем категориям (High, Medium, Low). \n",
|
|||
|
"\n",
|
|||
|
"**Подход**: Для этой задачи можно использовать алгоритмы классификации, такие как логистическая регрессия, деревья решений или случайные леса. Модели должны учитывать категориальные переменные и их влияние на категориальный целевой признак. Методы переклассификации будут оценивать, в какую категорию попадает студент на основе его характеристик.\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Конструирование признаков для решения задач\n",
|
|||
|
"\n",
|
|||
|
"Можно создать новый признак, который будет представлять общую успеваемость студента. Например, можно суммировать баллы по всем предметам и создать общий балл. \n",
|
|||
|
"\n",
|
|||
|
"Далее используем дискретизацию числового признака (преобразование баллов в категории) для обучения модели, которая будет работать с дискретными данными, а не с непрерывными.\n",
|
|||
|
"\n",
|
|||
|
"Категории:\n",
|
|||
|
"\n",
|
|||
|
"\"Low\", \"Medium\", \"High\" для баллов\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 675,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"from sklearn.preprocessing import StandardScaler\n",
|
|||
|
"\n",
|
|||
|
"# Создание новых признаков\n",
|
|||
|
"# - Общий балл\n",
|
|||
|
"df['total_score'] = df['math score'] + df['reading score'] + df['writing score']\n",
|
|||
|
"\n",
|
|||
|
"# - Категоризация баллов по математике, чтению и письму\n",
|
|||
|
"def discretize_score(score):\n",
|
|||
|
" if score < 200:\n",
|
|||
|
" return 0\n",
|
|||
|
" elif 200 <= score < 250:\n",
|
|||
|
" return 1\n",
|
|||
|
" else:\n",
|
|||
|
" return 2\n",
|
|||
|
"df['total_score_discrete'] = df['total_score'].apply(lambda x: discretize_score(x))\n",
|
|||
|
"\n",
|
|||
|
"df = df.drop(columns=['math score', 'reading score', 'writing score','total_score'])"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Выберем три модели для задач классификации\n",
|
|||
|
"\n",
|
|||
|
"1. Логистическая регрессия (Logistic Regression) — базовая модель для классификации.\n",
|
|||
|
"\n",
|
|||
|
"2. Дерево решений (Decision Tree) — модель, которая хорошо справляется с выявлением сложных закономерностей.\n",
|
|||
|
"\n",
|
|||
|
"3. Градиентный бустинг (Gradient Boosting) — мощная ансамблевая модель, обеспечивающая высокое качество предсказаний.\n",
|
|||
|
"\n",
|
|||
|
"Модели выбраны исходя из того, что они предоставляют разные подходы к решению задачи, и это позволит сравнить эффективность различных методов."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Разделение набора данных на обучающую и тестовые выборки (80/20) для задачи классификации и создание ориентира\n",
|
|||
|
"\n",
|
|||
|
"Целевой признак -- total_score_discrete"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 676,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from utils import split_stratified_into_train_val_test\n",
|
|||
|
"\n",
|
|||
|
"X_train, X_val, X_test, y_train, y_val, y_test = split_stratified_into_train_val_test(\n",
|
|||
|
" df, stratify_colname=\"total_score_discrete\", frac_train=0.80, frac_val=0, frac_test=0.20, random_state=random_state\n",
|
|||
|
")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Аугментация данных для целевого признака в обучающей выборке"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 677,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Количество данных в y_train до RandomOverSampling: 800\n",
|
|||
|
"Количество данных в X_train до RandomOverSampling: 800\n",
|
|||
|
"Количество данных в y_train после RandomOverSampling: 1065\n",
|
|||
|
"Количество данных в X_train после RandomOverSampling: 1065\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from imblearn.over_sampling import RandomOverSampler\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"print(\"Количество данных в y_train до RandomOverSampling:\", len(y_train))\n",
|
|||
|
"print(\"Количество данных в X_train до RandomOverSampling:\", len(X_train))\n",
|
|||
|
"\n",
|
|||
|
"# Объединяем исходные данные и \"шумные\" данные для увеличения обучающей выборки\n",
|
|||
|
"X_train_combined = np.vstack([X_train, X_train])\n",
|
|||
|
"y_train_combined = np.hstack([y_train, y_train]) # Убедитесь, что y_train повторяется для новых данных\n",
|
|||
|
"\n",
|
|||
|
"# Применение oversampling и undersampling\n",
|
|||
|
"ros = RandomOverSampler(random_state=42)\n",
|
|||
|
"X_train, y_train = ros.fit_resample(X_train, y_train)\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"print(\"Количество данных в y_train после RandomOverSampling:\", len(y_train))\n",
|
|||
|
"print(\"Количество данных в X_train после RandomOverSampling:\", len(X_train))\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Формирование конвейера"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 678,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from sklearn.compose import ColumnTransformer\n",
|
|||
|
"from sklearn.discriminant_analysis import StandardScaler\n",
|
|||
|
"from sklearn.impute import SimpleImputer\n",
|
|||
|
"from sklearn.pipeline import Pipeline\n",
|
|||
|
"from sklearn.preprocessing import OneHotEncoder\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"columns_to_drop = [\"total_score_discrete\"]\n",
|
|||
|
"num_columns = [\n",
|
|||
|
" column\n",
|
|||
|
" for column in df.columns\n",
|
|||
|
" if column not in columns_to_drop and df[column].dtype != \"object\"\n",
|
|||
|
"]\n",
|
|||
|
"cat_columns = [\n",
|
|||
|
" column\n",
|
|||
|
" for column in df.columns\n",
|
|||
|
" if column not in columns_to_drop and df[column].dtype == \"object\"\n",
|
|||
|
"]\n",
|
|||
|
"\n",
|
|||
|
"num_imputer = SimpleImputer(strategy=\"median\")\n",
|
|||
|
"num_scaler = StandardScaler()\n",
|
|||
|
"preprocessing_num = Pipeline(\n",
|
|||
|
" [\n",
|
|||
|
" (\"imputer\", num_imputer),\n",
|
|||
|
" (\"scaler\", num_scaler),\n",
|
|||
|
" ]\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"cat_imputer = SimpleImputer(strategy=\"constant\", fill_value=\"unknown\")\n",
|
|||
|
"cat_encoder = OneHotEncoder(handle_unknown=\"ignore\", sparse_output=False, drop=\"first\")\n",
|
|||
|
"preprocessing_cat = Pipeline(\n",
|
|||
|
" [\n",
|
|||
|
" (\"imputer\", cat_imputer),\n",
|
|||
|
" (\"encoder\", cat_encoder),\n",
|
|||
|
" ]\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"features_preprocessing = ColumnTransformer(\n",
|
|||
|
" verbose_feature_names_out=False,\n",
|
|||
|
" transformers=[\n",
|
|||
|
" (\"prepocessing_num\", preprocessing_num, num_columns),\n",
|
|||
|
" (\"prepocessing_cat\", preprocessing_cat, cat_columns),\n",
|
|||
|
" ],\n",
|
|||
|
" remainder=\"passthrough\",\n",
|
|||
|
" force_int_remainder_cols=False \n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"drop_columns = ColumnTransformer(\n",
|
|||
|
" verbose_feature_names_out=False,\n",
|
|||
|
" transformers=[\n",
|
|||
|
" (\"drop_columns\", \"drop\", columns_to_drop),\n",
|
|||
|
" ],\n",
|
|||
|
" remainder=\"passthrough\",\n",
|
|||
|
")\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Настройка гиперпараметров для каждой модели и обучение\n",
|
|||
|
"\n",
|
|||
|
"Для каждой модели важно настроить гиперпараметры, чтобы достичь наилучших результатов. Мы будем использовать GridSearchCV для выполнения кросс-валидации и выбора оптимальных гиперпараметров для каждой модели.\n",
|
|||
|
"\n",
|
|||
|
"##### 1. Логистическая регрессия\n",
|
|||
|
"Для логистической регрессии мы настроим гиперпараметры регуляризации C и оптимизатор."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 679,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"d:\\5semestr\\AIM\\aimvenv\\Lib\\site-packages\\sklearn\\compose\\_column_transformer.py:1623: FutureWarning: \n",
|
|||
|
"The format of the columns of the 'remainder' transformer in ColumnTransformer.transformers_ will change in version 1.7 to match the format of the other transformers.\n",
|
|||
|
"At the moment the remainder columns are stored as indices (of type int). With the same ColumnTransformer configuration, in the future they will be stored as column names (of type str).\n",
|
|||
|
"To use the new behavior now and suppress this warning, use ColumnTransformer(force_int_remainder_cols=False).\n",
|
|||
|
"\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<style>#sk-container-id-6 {\n",
|
|||
|
" /* Definition of color scheme common for light and dark mode */\n",
|
|||
|
" --sklearn-color-text: black;\n",
|
|||
|
" --sklearn-color-line: gray;\n",
|
|||
|
" /* Definition of color scheme for unfitted estimators */\n",
|
|||
|
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
|
|||
|
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
|
|||
|
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
|
|||
|
" --sklearn-color-unfitted-level-3: chocolate;\n",
|
|||
|
" /* Definition of color scheme for fitted estimators */\n",
|
|||
|
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
|
|||
|
" --sklearn-color-fitted-level-1: #d4ebff;\n",
|
|||
|
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
|
|||
|
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
|
|||
|
"\n",
|
|||
|
" /* Specific color for light theme */\n",
|
|||
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
|||
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
|
|||
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
|||
|
" --sklearn-color-icon: #696969;\n",
|
|||
|
"\n",
|
|||
|
" @media (prefers-color-scheme: dark) {\n",
|
|||
|
" /* Redefinition of color scheme for dark theme */\n",
|
|||
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
|||
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
|
|||
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
|||
|
" --sklearn-color-icon: #878787;\n",
|
|||
|
" }\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 pre {\n",
|
|||
|
" padding: 0;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 input.sk-hidden--visually {\n",
|
|||
|
" border: 0;\n",
|
|||
|
" clip: rect(1px 1px 1px 1px);\n",
|
|||
|
" clip: rect(1px, 1px, 1px, 1px);\n",
|
|||
|
" height: 1px;\n",
|
|||
|
" margin: -1px;\n",
|
|||
|
" overflow: hidden;\n",
|
|||
|
" padding: 0;\n",
|
|||
|
" position: absolute;\n",
|
|||
|
" width: 1px;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-dashed-wrapped {\n",
|
|||
|
" border: 1px dashed var(--sklearn-color-line);\n",
|
|||
|
" margin: 0 0.4em 0.5em 0.4em;\n",
|
|||
|
" box-sizing: border-box;\n",
|
|||
|
" padding-bottom: 0.4em;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-container {\n",
|
|||
|
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
|
|||
|
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
|
|||
|
" so we also need the `!important` here to be able to override the\n",
|
|||
|
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
|
|||
|
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
|
|||
|
" display: inline-block !important;\n",
|
|||
|
" position: relative;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-text-repr-fallback {\n",
|
|||
|
" display: none;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"div.sk-parallel-item,\n",
|
|||
|
"div.sk-serial,\n",
|
|||
|
"div.sk-item {\n",
|
|||
|
" /* draw centered vertical line to link estimators */\n",
|
|||
|
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
|
|||
|
" background-size: 2px 100%;\n",
|
|||
|
" background-repeat: no-repeat;\n",
|
|||
|
" background-position: center center;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Parallel-specific style estimator block */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-parallel-item::after {\n",
|
|||
|
" content: \"\";\n",
|
|||
|
" width: 100%;\n",
|
|||
|
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
|
|||
|
" flex-grow: 1;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-parallel {\n",
|
|||
|
" display: flex;\n",
|
|||
|
" align-items: stretch;\n",
|
|||
|
" justify-content: center;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
" position: relative;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-parallel-item {\n",
|
|||
|
" display: flex;\n",
|
|||
|
" flex-direction: column;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-parallel-item:first-child::after {\n",
|
|||
|
" align-self: flex-end;\n",
|
|||
|
" width: 50%;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-parallel-item:last-child::after {\n",
|
|||
|
" align-self: flex-start;\n",
|
|||
|
" width: 50%;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-parallel-item:only-child::after {\n",
|
|||
|
" width: 0;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Serial-specific style estimator block */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-serial {\n",
|
|||
|
" display: flex;\n",
|
|||
|
" flex-direction: column;\n",
|
|||
|
" align-items: center;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
" padding-right: 1em;\n",
|
|||
|
" padding-left: 1em;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
|
|||
|
"clickable and can be expanded/collapsed.\n",
|
|||
|
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
|
|||
|
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
|
|||
|
"*/\n",
|
|||
|
"\n",
|
|||
|
"/* Pipeline and ColumnTransformer style (default) */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-toggleable {\n",
|
|||
|
" /* Default theme specific background. It is overwritten whether we have a\n",
|
|||
|
" specific estimator or a Pipeline/ColumnTransformer */\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Toggleable label */\n",
|
|||
|
"#sk-container-id-6 label.sk-toggleable__label {\n",
|
|||
|
" cursor: pointer;\n",
|
|||
|
" display: block;\n",
|
|||
|
" width: 100%;\n",
|
|||
|
" margin-bottom: 0;\n",
|
|||
|
" padding: 0.5em;\n",
|
|||
|
" box-sizing: border-box;\n",
|
|||
|
" text-align: center;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 label.sk-toggleable__label-arrow:before {\n",
|
|||
|
" /* Arrow on the left of the label */\n",
|
|||
|
" content: \"▸\";\n",
|
|||
|
" float: left;\n",
|
|||
|
" margin-right: 0.25em;\n",
|
|||
|
" color: var(--sklearn-color-icon);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 label.sk-toggleable__label-arrow:hover:before {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Toggleable content - dropdown */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-toggleable__content {\n",
|
|||
|
" max-height: 0;\n",
|
|||
|
" max-width: 0;\n",
|
|||
|
" overflow: hidden;\n",
|
|||
|
" text-align: left;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-toggleable__content.fitted {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-toggleable__content pre {\n",
|
|||
|
" margin: 0.2em;\n",
|
|||
|
" border-radius: 0.25em;\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-toggleable__content.fitted pre {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
|
|||
|
" /* Expand drop-down */\n",
|
|||
|
" max-height: 200px;\n",
|
|||
|
" max-width: 100%;\n",
|
|||
|
" overflow: auto;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
|
|||
|
" content: \"▾\";\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Pipeline/ColumnTransformer-specific style */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Estimator-specific style */\n",
|
|||
|
"\n",
|
|||
|
"/* Colorize estimator box */\n",
|
|||
|
"#sk-container-id-6 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-label label.sk-toggleable__label,\n",
|
|||
|
"#sk-container-id-6 div.sk-label label {\n",
|
|||
|
" /* The background is the default theme color */\n",
|
|||
|
" color: var(--sklearn-color-text-on-default-background);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* On hover, darken the color of the background */\n",
|
|||
|
"#sk-container-id-6 div.sk-label:hover label.sk-toggleable__label {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Label box, darken color on hover, fitted */\n",
|
|||
|
"#sk-container-id-6 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Estimator label */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-label label {\n",
|
|||
|
" font-family: monospace;\n",
|
|||
|
" font-weight: bold;\n",
|
|||
|
" display: inline-block;\n",
|
|||
|
" line-height: 1.2em;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-label-container {\n",
|
|||
|
" text-align: center;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Estimator-specific */\n",
|
|||
|
"#sk-container-id-6 div.sk-estimator {\n",
|
|||
|
" font-family: monospace;\n",
|
|||
|
" border: 1px dotted var(--sklearn-color-border-box);\n",
|
|||
|
" border-radius: 0.25em;\n",
|
|||
|
" box-sizing: border-box;\n",
|
|||
|
" margin-bottom: 0.5em;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-estimator.fitted {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* on hover */\n",
|
|||
|
"#sk-container-id-6 div.sk-estimator:hover {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 div.sk-estimator.fitted:hover {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
|
|||
|
"\n",
|
|||
|
"/* Common style for \"i\" and \"?\" */\n",
|
|||
|
"\n",
|
|||
|
".sk-estimator-doc-link,\n",
|
|||
|
"a:link.sk-estimator-doc-link,\n",
|
|||
|
"a:visited.sk-estimator-doc-link {\n",
|
|||
|
" float: right;\n",
|
|||
|
" font-size: smaller;\n",
|
|||
|
" line-height: 1em;\n",
|
|||
|
" font-family: monospace;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
" border-radius: 1em;\n",
|
|||
|
" height: 1em;\n",
|
|||
|
" width: 1em;\n",
|
|||
|
" text-decoration: none !important;\n",
|
|||
|
" margin-left: 1ex;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
|||
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
".sk-estimator-doc-link.fitted,\n",
|
|||
|
"a:link.sk-estimator-doc-link.fitted,\n",
|
|||
|
"a:visited.sk-estimator-doc-link.fitted {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
|||
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* On hover */\n",
|
|||
|
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
|
|||
|
".sk-estimator-doc-link:hover,\n",
|
|||
|
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
|
|||
|
".sk-estimator-doc-link:hover {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
|||
|
" color: var(--sklearn-color-background);\n",
|
|||
|
" text-decoration: none;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
|
|||
|
".sk-estimator-doc-link.fitted:hover,\n",
|
|||
|
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
|
|||
|
".sk-estimator-doc-link.fitted:hover {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
|||
|
" color: var(--sklearn-color-background);\n",
|
|||
|
" text-decoration: none;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Span, style for the box shown on hovering the info icon */\n",
|
|||
|
".sk-estimator-doc-link span {\n",
|
|||
|
" display: none;\n",
|
|||
|
" z-index: 9999;\n",
|
|||
|
" position: relative;\n",
|
|||
|
" font-weight: normal;\n",
|
|||
|
" right: .2ex;\n",
|
|||
|
" padding: .5ex;\n",
|
|||
|
" margin: .5ex;\n",
|
|||
|
" width: min-content;\n",
|
|||
|
" min-width: 20ex;\n",
|
|||
|
" max-width: 50ex;\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" box-shadow: 2pt 2pt 4pt #999;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background: var(--sklearn-color-unfitted-level-0);\n",
|
|||
|
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
".sk-estimator-doc-link.fitted span {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background: var(--sklearn-color-fitted-level-0);\n",
|
|||
|
" border: var(--sklearn-color-fitted-level-3);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
".sk-estimator-doc-link:hover span {\n",
|
|||
|
" display: block;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 a.estimator_doc_link {\n",
|
|||
|
" float: right;\n",
|
|||
|
" font-size: 1rem;\n",
|
|||
|
" line-height: 1em;\n",
|
|||
|
" font-family: monospace;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
" border-radius: 1rem;\n",
|
|||
|
" height: 1rem;\n",
|
|||
|
" width: 1rem;\n",
|
|||
|
" text-decoration: none;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
|||
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 a.estimator_doc_link.fitted {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
|||
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* On hover */\n",
|
|||
|
"#sk-container-id-6 a.estimator_doc_link:hover {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
|||
|
" color: var(--sklearn-color-background);\n",
|
|||
|
" text-decoration: none;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-6 a.estimator_doc_link.fitted:hover {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
|||
|
"}\n",
|
|||
|
"</style><div id=\"sk-container-id-6\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>GridSearchCV(cv=5,\n",
|
|||
|
" estimator=Pipeline(steps=[('features_preprocessing',\n",
|
|||
|
" ColumnTransformer(force_int_remainder_cols=False,\n",
|
|||
|
" remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler',\n",
|
|||
|
" StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unkno...\n",
|
|||
|
" 'preparation '\n",
|
|||
|
" 'course'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('drop_columns',\n",
|
|||
|
" ColumnTransformer(remainder='passthrough',\n",
|
|||
|
" transformers=[('drop_columns',\n",
|
|||
|
" 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('model',\n",
|
|||
|
" LogisticRegression(max_iter=1000,\n",
|
|||
|
" random_state=9))]),\n",
|
|||
|
" n_jobs=-1,\n",
|
|||
|
" param_grid={'model__C': [0.1, 1, 10],\n",
|
|||
|
" 'model__solver': ['liblinear', 'saga']})</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-86\" type=\"checkbox\" ><label for=\"sk-estimator-id-86\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> GridSearchCV<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.model_selection.GridSearchCV.html\">?<span>Documentation for GridSearchCV</span></a><span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></label><div class=\"sk-toggleable__content fitted\"><pre>GridSearchCV(cv=5,\n",
|
|||
|
" estimator=Pipeline(steps=[('features_preprocessing',\n",
|
|||
|
" ColumnTransformer(force_int_remainder_cols=False,\n",
|
|||
|
" remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler',\n",
|
|||
|
" StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unkno...\n",
|
|||
|
" 'preparation '\n",
|
|||
|
" 'course'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('drop_columns',\n",
|
|||
|
" ColumnTransformer(remainder='passthrough',\n",
|
|||
|
" transformers=[('drop_columns',\n",
|
|||
|
" 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('model',\n",
|
|||
|
" LogisticRegression(max_iter=1000,\n",
|
|||
|
" random_state=9))]),\n",
|
|||
|
" n_jobs=-1,\n",
|
|||
|
" param_grid={'model__C': [0.1, 1, 10],\n",
|
|||
|
" 'model__solver': ['liblinear', 'saga']})</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-87\" type=\"checkbox\" ><label for=\"sk-estimator-id-87\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">best_estimator_: Pipeline</label><div class=\"sk-toggleable__content fitted\"><pre>Pipeline(steps=[('features_preprocessing',\n",
|
|||
|
" ColumnTransformer(force_int_remainder_cols=False,\n",
|
|||
|
" remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler',\n",
|
|||
|
" StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unknown',\n",
|
|||
|
" strategy='constant')),\n",
|
|||
|
" ('...\n",
|
|||
|
" ['gender', 'race/ethnicity',\n",
|
|||
|
" 'parental level of '\n",
|
|||
|
" 'education',\n",
|
|||
|
" 'lunch',\n",
|
|||
|
" 'test preparation course'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('drop_columns',\n",
|
|||
|
" ColumnTransformer(remainder='passthrough',\n",
|
|||
|
" transformers=[('drop_columns', 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('model',\n",
|
|||
|
" LogisticRegression(C=1, max_iter=1000, random_state=9,\n",
|
|||
|
" solver='liblinear'))])</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-serial\"><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-88\" type=\"checkbox\" ><label for=\"sk-estimator-id-88\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> features_preprocessing: ColumnTransformer<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.compose.ColumnTransformer.html\">?<span>Documentation for features_preprocessing: ColumnTransformer</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>ColumnTransformer(force_int_remainder_cols=False, remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler', StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unknown',\n",
|
|||
|
" strategy='constant')),\n",
|
|||
|
" ('encoder',\n",
|
|||
|
" OneHotEncoder(drop='first',\n",
|
|||
|
" handle_unknown='ignore',\n",
|
|||
|
" sparse_output=False))]),\n",
|
|||
|
" ['gender', 'race/ethnicity',\n",
|
|||
|
" 'parental level of education', 'lunch',\n",
|
|||
|
" 'test preparation course'])],\n",
|
|||
|
" verbose_feature_names_out=False)</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-89\" type=\"checkbox\" ><label for=\"sk-estimator-id-89\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">prepocessing_num</label><div class=\"sk-toggleable__content fitted\"><pre>[]</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-90\" type=\"checkbox\" ><label for=\"sk-estimator-id-90\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> SimpleImputer<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.impute.SimpleImputer.html\">?<span>Documentation for SimpleImputer</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>SimpleImputer(strategy='median')</pre></div> </div></div><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-91\" type=\"checkbox\" ><label for=\"sk-estimator-id-91\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> StandardScaler<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.preprocessing.StandardScaler.html\">?<span>Documentation for StandardScaler</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>StandardScaler()</pre></div> </div></div></div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-92\" type=\"checkbox\" ><label for=\"sk-estimator-id-92\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">prepocessing_cat</label><div class=\"sk-toggleable__content fitted\"><pre>['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course']</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-93\" type=\"checkbox\" ><label for=\"sk-estimator-id-93\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> SimpleImputer<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.impute.SimpleImputer.html\">?<span>Documentation for SimpleImputer</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>SimpleImputer(fill_value='unknown', strategy='constant')</pre></div> </div></div><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-94\" type=\"checkbox\" ><label for=\"sk-estimator-id-94\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> OneHotEncoder<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.preprocessing.OneHotEncoder.html\">?<span>Documentation for OneHotEncoder</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>OneHotEncoder(drop='first', handle_unknown='ignore', sparse_output=False)</pre></div> </div></div></div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"
|
|||
|
" transformers=[('drop_columns', 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-98\" type=\"checkbox\" ><label for=\"sk-estimator-id-98\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">drop_columns</label><div class=\"sk-toggleable__content fitted\"><pre>['total_score_discrete']</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-99\" type=\"checkbox\" ><label for=\"sk-estimator-id-99\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">drop</label><div class=\"sk-toggleable__content fitted\"><pre>drop</pre></div> </div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-100\" type=\"checkbox\" ><label for=\"sk-estimator-id-100\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">remainder</label><div class=\"sk-toggleable__content fitted\"><pre>['gender_male', 'race/ethnicity_group B', 'race/ethnicity_group C', 'race/ethnicity_group D', 'race/ethnicity_group E', "parental level of education_bachelor's degree", 'parental level of education_high school', "parental level of education_master's degree", 'parental level of education_some college', 'parental level of education_some high school', 'lunch_standard', 'test preparation course_none']</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-101\" type=\"checkbox\" ><label for=\"sk-estimator-id-101\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">passthrough</label><div class=\"sk-toggleable__content fitted\"><pre>passthrough</pre></div> </div></div></div></div></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-102\" type=\"checkbox\" ><label for=\"sk-estimator-id-102\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> LogisticRegression<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.LogisticRegression.html\">?<span>Documentation for LogisticRegression</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>LogisticRegression(C=1, max_iter=1000, random_state=9, solver='liblinear')</pre></div> </div></div></div></div></div></div></div></div></div></div></div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"GridSearchCV(cv=5,\n",
|
|||
|
" estimator=Pipeline(steps=[('features_preprocessing',\n",
|
|||
|
" ColumnTransformer(force_int_remainder_cols=False,\n",
|
|||
|
" remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler',\n",
|
|||
|
" StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unkno...\n",
|
|||
|
" 'preparation '\n",
|
|||
|
" 'course'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('drop_columns',\n",
|
|||
|
" ColumnTransformer(remainder='passthrough',\n",
|
|||
|
" transformers=[('drop_columns',\n",
|
|||
|
" 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('model',\n",
|
|||
|
" LogisticRegression(max_iter=1000,\n",
|
|||
|
" random_state=9))]),\n",
|
|||
|
" n_jobs=-1,\n",
|
|||
|
" param_grid={'model__C': [0.1, 1, 10],\n",
|
|||
|
" 'model__solver': ['liblinear', 'saga']})"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 679,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.linear_model import LogisticRegression\n",
|
|||
|
"from sklearn.model_selection import GridSearchCV\n",
|
|||
|
"from sklearn.pipeline import Pipeline\n",
|
|||
|
"\n",
|
|||
|
"# Модель логистической регрессии\n",
|
|||
|
"logistic_model = LogisticRegression(max_iter=1000, random_state=random_state)\n",
|
|||
|
"\n",
|
|||
|
"# Создаём пайплайн, который сначала применяет preprocessing, а потом обучает модель\n",
|
|||
|
"logistic_pipeline = Pipeline([\n",
|
|||
|
" (\"features_preprocessing\", features_preprocessing),\n",
|
|||
|
" (\"drop_columns\", drop_columns),\n",
|
|||
|
" (\"model\", logistic_model) # Здесь добавляем модель в пайплайн\n",
|
|||
|
"])\n",
|
|||
|
"\n",
|
|||
|
"# Параметры для настройки\n",
|
|||
|
"logistic_param_grid = {\n",
|
|||
|
" 'model__C': [0.1, 1, 10], # Регуляризация\n",
|
|||
|
" 'model__solver': ['liblinear', 'saga'] # Алгоритм оптимизации\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"# Настройка гиперпараметров с использованием GridSearchCV\n",
|
|||
|
"logistic_search = GridSearchCV(logistic_pipeline, logistic_param_grid, cv=5, n_jobs=-1)\n",
|
|||
|
"\n",
|
|||
|
"# Обучаем модель\n",
|
|||
|
"logistic_search.fit(X_train, y_train.values.ravel())\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"##### 2. Дерево решений\n",
|
|||
|
"\n",
|
|||
|
"Для дерева решений мы будем настраивать гиперпараметры, такие как максимальная глубина и минимальное количество объектов для разделения.\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 680,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"d:\\5semestr\\AIM\\aimvenv\\Lib\\site-packages\\sklearn\\compose\\_column_transformer.py:1623: FutureWarning: \n",
|
|||
|
"The format of the columns of the 'remainder' transformer in ColumnTransformer.transformers_ will change in version 1.7 to match the format of the other transformers.\n",
|
|||
|
"At the moment the remainder columns are stored as indices (of type int). With the same ColumnTransformer configuration, in the future they will be stored as column names (of type str).\n",
|
|||
|
"To use the new behavior now and suppress this warning, use ColumnTransformer(force_int_remainder_cols=False).\n",
|
|||
|
"\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<style>#sk-container-id-7 {\n",
|
|||
|
" /* Definition of color scheme common for light and dark mode */\n",
|
|||
|
" --sklearn-color-text: black;\n",
|
|||
|
" --sklearn-color-line: gray;\n",
|
|||
|
" /* Definition of color scheme for unfitted estimators */\n",
|
|||
|
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
|
|||
|
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
|
|||
|
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
|
|||
|
" --sklearn-color-unfitted-level-3: chocolate;\n",
|
|||
|
" /* Definition of color scheme for fitted estimators */\n",
|
|||
|
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
|
|||
|
" --sklearn-color-fitted-level-1: #d4ebff;\n",
|
|||
|
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
|
|||
|
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
|
|||
|
"\n",
|
|||
|
" /* Specific color for light theme */\n",
|
|||
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
|||
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
|
|||
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
|||
|
" --sklearn-color-icon: #696969;\n",
|
|||
|
"\n",
|
|||
|
" @media (prefers-color-scheme: dark) {\n",
|
|||
|
" /* Redefinition of color scheme for dark theme */\n",
|
|||
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
|||
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
|
|||
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
|||
|
" --sklearn-color-icon: #878787;\n",
|
|||
|
" }\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 pre {\n",
|
|||
|
" padding: 0;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 input.sk-hidden--visually {\n",
|
|||
|
" border: 0;\n",
|
|||
|
" clip: rect(1px 1px 1px 1px);\n",
|
|||
|
" clip: rect(1px, 1px, 1px, 1px);\n",
|
|||
|
" height: 1px;\n",
|
|||
|
" margin: -1px;\n",
|
|||
|
" overflow: hidden;\n",
|
|||
|
" padding: 0;\n",
|
|||
|
" position: absolute;\n",
|
|||
|
" width: 1px;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-dashed-wrapped {\n",
|
|||
|
" border: 1px dashed var(--sklearn-color-line);\n",
|
|||
|
" margin: 0 0.4em 0.5em 0.4em;\n",
|
|||
|
" box-sizing: border-box;\n",
|
|||
|
" padding-bottom: 0.4em;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-container {\n",
|
|||
|
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
|
|||
|
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
|
|||
|
" so we also need the `!important` here to be able to override the\n",
|
|||
|
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
|
|||
|
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
|
|||
|
" display: inline-block !important;\n",
|
|||
|
" position: relative;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-text-repr-fallback {\n",
|
|||
|
" display: none;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"div.sk-parallel-item,\n",
|
|||
|
"div.sk-serial,\n",
|
|||
|
"div.sk-item {\n",
|
|||
|
" /* draw centered vertical line to link estimators */\n",
|
|||
|
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
|
|||
|
" background-size: 2px 100%;\n",
|
|||
|
" background-repeat: no-repeat;\n",
|
|||
|
" background-position: center center;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Parallel-specific style estimator block */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-parallel-item::after {\n",
|
|||
|
" content: \"\";\n",
|
|||
|
" width: 100%;\n",
|
|||
|
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
|
|||
|
" flex-grow: 1;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-parallel {\n",
|
|||
|
" display: flex;\n",
|
|||
|
" align-items: stretch;\n",
|
|||
|
" justify-content: center;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
" position: relative;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-parallel-item {\n",
|
|||
|
" display: flex;\n",
|
|||
|
" flex-direction: column;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-parallel-item:first-child::after {\n",
|
|||
|
" align-self: flex-end;\n",
|
|||
|
" width: 50%;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-parallel-item:last-child::after {\n",
|
|||
|
" align-self: flex-start;\n",
|
|||
|
" width: 50%;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-parallel-item:only-child::after {\n",
|
|||
|
" width: 0;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Serial-specific style estimator block */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-serial {\n",
|
|||
|
" display: flex;\n",
|
|||
|
" flex-direction: column;\n",
|
|||
|
" align-items: center;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
" padding-right: 1em;\n",
|
|||
|
" padding-left: 1em;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
|
|||
|
"clickable and can be expanded/collapsed.\n",
|
|||
|
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
|
|||
|
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
|
|||
|
"*/\n",
|
|||
|
"\n",
|
|||
|
"/* Pipeline and ColumnTransformer style (default) */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-toggleable {\n",
|
|||
|
" /* Default theme specific background. It is overwritten whether we have a\n",
|
|||
|
" specific estimator or a Pipeline/ColumnTransformer */\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Toggleable label */\n",
|
|||
|
"#sk-container-id-7 label.sk-toggleable__label {\n",
|
|||
|
" cursor: pointer;\n",
|
|||
|
" display: block;\n",
|
|||
|
" width: 100%;\n",
|
|||
|
" margin-bottom: 0;\n",
|
|||
|
" padding: 0.5em;\n",
|
|||
|
" box-sizing: border-box;\n",
|
|||
|
" text-align: center;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 label.sk-toggleable__label-arrow:before {\n",
|
|||
|
" /* Arrow on the left of the label */\n",
|
|||
|
" content: \"▸\";\n",
|
|||
|
" float: left;\n",
|
|||
|
" margin-right: 0.25em;\n",
|
|||
|
" color: var(--sklearn-color-icon);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 label.sk-toggleable__label-arrow:hover:before {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Toggleable content - dropdown */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-toggleable__content {\n",
|
|||
|
" max-height: 0;\n",
|
|||
|
" max-width: 0;\n",
|
|||
|
" overflow: hidden;\n",
|
|||
|
" text-align: left;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-toggleable__content.fitted {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-toggleable__content pre {\n",
|
|||
|
" margin: 0.2em;\n",
|
|||
|
" border-radius: 0.25em;\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-toggleable__content.fitted pre {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
|
|||
|
" /* Expand drop-down */\n",
|
|||
|
" max-height: 200px;\n",
|
|||
|
" max-width: 100%;\n",
|
|||
|
" overflow: auto;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
|
|||
|
" content: \"▾\";\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Pipeline/ColumnTransformer-specific style */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Estimator-specific style */\n",
|
|||
|
"\n",
|
|||
|
"/* Colorize estimator box */\n",
|
|||
|
"#sk-container-id-7 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-label label.sk-toggleable__label,\n",
|
|||
|
"#sk-container-id-7 div.sk-label label {\n",
|
|||
|
" /* The background is the default theme color */\n",
|
|||
|
" color: var(--sklearn-color-text-on-default-background);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* On hover, darken the color of the background */\n",
|
|||
|
"#sk-container-id-7 div.sk-label:hover label.sk-toggleable__label {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Label box, darken color on hover, fitted */\n",
|
|||
|
"#sk-container-id-7 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Estimator label */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-label label {\n",
|
|||
|
" font-family: monospace;\n",
|
|||
|
" font-weight: bold;\n",
|
|||
|
" display: inline-block;\n",
|
|||
|
" line-height: 1.2em;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-label-container {\n",
|
|||
|
" text-align: center;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Estimator-specific */\n",
|
|||
|
"#sk-container-id-7 div.sk-estimator {\n",
|
|||
|
" font-family: monospace;\n",
|
|||
|
" border: 1px dotted var(--sklearn-color-border-box);\n",
|
|||
|
" border-radius: 0.25em;\n",
|
|||
|
" box-sizing: border-box;\n",
|
|||
|
" margin-bottom: 0.5em;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-estimator.fitted {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* on hover */\n",
|
|||
|
"#sk-container-id-7 div.sk-estimator:hover {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 div.sk-estimator.fitted:hover {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
|
|||
|
"\n",
|
|||
|
"/* Common style for \"i\" and \"?\" */\n",
|
|||
|
"\n",
|
|||
|
".sk-estimator-doc-link,\n",
|
|||
|
"a:link.sk-estimator-doc-link,\n",
|
|||
|
"a:visited.sk-estimator-doc-link {\n",
|
|||
|
" float: right;\n",
|
|||
|
" font-size: smaller;\n",
|
|||
|
" line-height: 1em;\n",
|
|||
|
" font-family: monospace;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
" border-radius: 1em;\n",
|
|||
|
" height: 1em;\n",
|
|||
|
" width: 1em;\n",
|
|||
|
" text-decoration: none !important;\n",
|
|||
|
" margin-left: 1ex;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
|||
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
".sk-estimator-doc-link.fitted,\n",
|
|||
|
"a:link.sk-estimator-doc-link.fitted,\n",
|
|||
|
"a:visited.sk-estimator-doc-link.fitted {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
|||
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* On hover */\n",
|
|||
|
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
|
|||
|
".sk-estimator-doc-link:hover,\n",
|
|||
|
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
|
|||
|
".sk-estimator-doc-link:hover {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
|||
|
" color: var(--sklearn-color-background);\n",
|
|||
|
" text-decoration: none;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
|
|||
|
".sk-estimator-doc-link.fitted:hover,\n",
|
|||
|
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
|
|||
|
".sk-estimator-doc-link.fitted:hover {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
|||
|
" color: var(--sklearn-color-background);\n",
|
|||
|
" text-decoration: none;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Span, style for the box shown on hovering the info icon */\n",
|
|||
|
".sk-estimator-doc-link span {\n",
|
|||
|
" display: none;\n",
|
|||
|
" z-index: 9999;\n",
|
|||
|
" position: relative;\n",
|
|||
|
" font-weight: normal;\n",
|
|||
|
" right: .2ex;\n",
|
|||
|
" padding: .5ex;\n",
|
|||
|
" margin: .5ex;\n",
|
|||
|
" width: min-content;\n",
|
|||
|
" min-width: 20ex;\n",
|
|||
|
" max-width: 50ex;\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" box-shadow: 2pt 2pt 4pt #999;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background: var(--sklearn-color-unfitted-level-0);\n",
|
|||
|
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
".sk-estimator-doc-link.fitted span {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background: var(--sklearn-color-fitted-level-0);\n",
|
|||
|
" border: var(--sklearn-color-fitted-level-3);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
".sk-estimator-doc-link:hover span {\n",
|
|||
|
" display: block;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 a.estimator_doc_link {\n",
|
|||
|
" float: right;\n",
|
|||
|
" font-size: 1rem;\n",
|
|||
|
" line-height: 1em;\n",
|
|||
|
" font-family: monospace;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
" border-radius: 1rem;\n",
|
|||
|
" height: 1rem;\n",
|
|||
|
" width: 1rem;\n",
|
|||
|
" text-decoration: none;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
|||
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 a.estimator_doc_link.fitted {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
|||
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* On hover */\n",
|
|||
|
"#sk-container-id-7 a.estimator_doc_link:hover {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
|||
|
" color: var(--sklearn-color-background);\n",
|
|||
|
" text-decoration: none;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-7 a.estimator_doc_link.fitted:hover {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
|||
|
"}\n",
|
|||
|
"</style><div id=\"sk-container-id-7\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>GridSearchCV(cv=5,\n",
|
|||
|
" estimator=Pipeline(steps=[('features_preprocessing',\n",
|
|||
|
" ColumnTransformer(force_int_remainder_cols=False,\n",
|
|||
|
" remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler',\n",
|
|||
|
" StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unkno...\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('drop_columns',\n",
|
|||
|
" ColumnTransformer(remainder='passthrough',\n",
|
|||
|
" transformers=[('drop_columns',\n",
|
|||
|
" 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('model',\n",
|
|||
|
" DecisionTreeClassifier(random_state=9))]),\n",
|
|||
|
" n_jobs=-1,\n",
|
|||
|
" param_grid={'model__max_depth': [5, 10, None],\n",
|
|||
|
" 'model__min_samples_leaf': [1, 2, 4],\n",
|
|||
|
" 'model__min_samples_split': [2, 5, 10]})</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-103\" type=\"checkbox\" ><label for=\"sk-estimator-id-103\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> GridSearchCV<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.model_selection.GridSearchCV.html\">?<span>Documentation for GridSearchCV</span></a><span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></label><div class=\"sk-toggleable__content fitted\"><pre>GridSearchCV(cv=5,\n",
|
|||
|
" estimator=Pipeline(steps=[('features_preprocessing',\n",
|
|||
|
" ColumnTransformer(force_int_remainder_cols=False,\n",
|
|||
|
" remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler',\n",
|
|||
|
" StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unkno...\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('drop_columns',\n",
|
|||
|
" ColumnTransformer(remainder='passthrough',\n",
|
|||
|
" transformers=[('drop_columns',\n",
|
|||
|
" 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('model',\n",
|
|||
|
" DecisionTreeClassifier(random_state=9))]),\n",
|
|||
|
" n_jobs=-1,\n",
|
|||
|
" param_grid={'model__max_depth': [5, 10, None],\n",
|
|||
|
" 'model__min_samples_leaf': [1, 2, 4],\n",
|
|||
|
" 'model__min_samples_split': [2, 5, 10]})</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-104\" type=\"checkbox\" ><label for=\"sk-estimator-id-104\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">best_estimator_: Pipeline</label><div class=\"sk-toggleable__content fitted\"><pre>Pipeline(steps=[('features_preprocessing',\n",
|
|||
|
" ColumnTransformer(force_int_remainder_cols=False,\n",
|
|||
|
" remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler',\n",
|
|||
|
" StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unknown',\n",
|
|||
|
" strategy='constant')),\n",
|
|||
|
" ('...\n",
|
|||
|
" ['gender', 'race/ethnicity',\n",
|
|||
|
" 'parental level of '\n",
|
|||
|
" 'education',\n",
|
|||
|
" 'lunch',\n",
|
|||
|
" 'test preparation course'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('drop_columns',\n",
|
|||
|
" ColumnTransformer(remainder='passthrough',\n",
|
|||
|
" transformers=[('drop_columns', 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('model',\n",
|
|||
|
" DecisionTreeClassifier(max_depth=10, min_samples_split=10,\n",
|
|||
|
" random_state=9))])</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-serial\"><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-105\" type=\"checkbox\" ><label for=\"sk-estimator-id-105\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> features_preprocessing: ColumnTransformer<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.compose.ColumnTransformer.html\">?<span>Documentation for features_preprocessing: ColumnTransformer</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>ColumnTransformer(force_int_remainder_cols=False, remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler', StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unknown',\n",
|
|||
|
" strategy='constant')),\n",
|
|||
|
" ('encoder',\n",
|
|||
|
" OneHotEncoder(drop='first',\n",
|
|||
|
" handle_unknown='ignore',\n",
|
|||
|
" sparse_output=False))]),\n",
|
|||
|
" ['gender', 'race/ethnicity',\n",
|
|||
|
" 'parental level of education', 'lunch',\n",
|
|||
|
" 'test preparation course'])],\n",
|
|||
|
" verbose_feature_names_out=False)</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-106\" type=\"checkbox\" ><label for=\"sk-estimator-id-106\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">prepocessing_num</label><div class=\"sk-toggleable__content fitted\"><pre>[]</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-107\" type=\"checkbox\" ><label for=\"sk-estimator-id-107\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> SimpleImputer<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.impute.SimpleImputer.html\">?<span>Documentation for SimpleImputer</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>SimpleImputer(strategy='median')</pre></div> </div></div><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-108\" type=\"checkbox\" ><label for=\"sk-estimator-id-108\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> StandardScaler<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.preprocessing.StandardScaler.html\">?<span>Documentation for StandardScaler</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>StandardScaler()</pre></div> </div></div></div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-109\" type=\"checkbox\" ><label for=\"sk-estimator-id-109\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">prepocessing_cat</label><div class=\"sk-toggleable__content fitted\"><pre>['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course']</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-110\" type=\"checkbox\" ><label for=\"sk-estimator-id-110\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> SimpleImputer<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.impute.SimpleImputer.html\">?<span>Documentation for SimpleImputer</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>SimpleImputer(fill_value='unknown', strategy='constant')</pre></div> </div></div><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-111\" type=\"checkbox\" ><label for=\"sk-estimator-id-111\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> OneHotEncoder<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.preprocessing.OneHotEncoder.html\">?<span>Documentation for OneHotEncoder</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>OneHotEncoder(drop='first', handle_unknown='ignore', sparse_output=False)</pre></div> </div></div></div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label
|
|||
|
" transformers=[('drop_columns', 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-115\" type=\"checkbox\" ><label for=\"sk-estimator-id-115\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">drop_columns</label><div class=\"sk-toggleable__content fitted\"><pre>['total_score_discrete']</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-116\" type=\"checkbox\" ><label for=\"sk-estimator-id-116\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">drop</label><div class=\"sk-toggleable__content fitted\"><pre>drop</pre></div> </div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-117\" type=\"checkbox\" ><label for=\"sk-estimator-id-117\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">remainder</label><div class=\"sk-toggleable__content fitted\"><pre>['gender_male', 'race/ethnicity_group B', 'race/ethnicity_group C', 'race/ethnicity_group D', 'race/ethnicity_group E', "parental level of education_bachelor's degree", 'parental level of education_high school', "parental level of education_master's degree", 'parental level of education_some college', 'parental level of education_some high school', 'lunch_standard', 'test preparation course_none']</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-118\" type=\"checkbox\" ><label for=\"sk-estimator-id-118\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">passthrough</label><div class=\"sk-toggleable__content fitted\"><pre>passthrough</pre></div> </div></div></div></div></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-119\" type=\"checkbox\" ><label for=\"sk-estimator-id-119\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> DecisionTreeClassifier<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.tree.DecisionTreeClassifier.html\">?<span>Documentation for DecisionTreeClassifier</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>DecisionTreeClassifier(max_depth=10, min_samples_split=10, random_state=9)</pre></div> </div></div></div></div></div></div></div></div></div></div></div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"GridSearchCV(cv=5,\n",
|
|||
|
" estimator=Pipeline(steps=[('features_preprocessing',\n",
|
|||
|
" ColumnTransformer(force_int_remainder_cols=False,\n",
|
|||
|
" remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler',\n",
|
|||
|
" StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unkno...\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('drop_columns',\n",
|
|||
|
" ColumnTransformer(remainder='passthrough',\n",
|
|||
|
" transformers=[('drop_columns',\n",
|
|||
|
" 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('model',\n",
|
|||
|
" DecisionTreeClassifier(random_state=9))]),\n",
|
|||
|
" n_jobs=-1,\n",
|
|||
|
" param_grid={'model__max_depth': [5, 10, None],\n",
|
|||
|
" 'model__min_samples_leaf': [1, 2, 4],\n",
|
|||
|
" 'model__min_samples_split': [2, 5, 10]})"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 680,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.tree import DecisionTreeClassifier\n",
|
|||
|
"\n",
|
|||
|
"from sklearn.linear_model import LogisticRegression\n",
|
|||
|
"from sklearn.model_selection import GridSearchCV\n",
|
|||
|
"from sklearn.pipeline import Pipeline\n",
|
|||
|
"\n",
|
|||
|
"# Модель дерева решений\n",
|
|||
|
"decision_tree_model = DecisionTreeClassifier(random_state=random_state)\n",
|
|||
|
"\n",
|
|||
|
"# Создаём пайплайн, который сначала применяет preprocessing, а потом обучает модель\n",
|
|||
|
"decision_tree_pipeline = Pipeline([\n",
|
|||
|
" (\"features_preprocessing\", features_preprocessing),\n",
|
|||
|
" (\"drop_columns\", drop_columns),\n",
|
|||
|
" (\"model\", decision_tree_model) # Здесь добавляем модель в пайплайн\n",
|
|||
|
"])\n",
|
|||
|
"\n",
|
|||
|
"# Параметры для настройки\n",
|
|||
|
"tree_param_grid = {\n",
|
|||
|
" 'model__max_depth': [5, 10, None], # Глубина дерева\n",
|
|||
|
" 'model__min_samples_split': [2, 5, 10], # Минимальное количество объектов для разделения\n",
|
|||
|
" 'model__min_samples_leaf': [1, 2, 4], # Минимальное количество объектов в листе\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"tree_search = GridSearchCV(decision_tree_pipeline, tree_param_grid, cv=5, n_jobs=-1)\n",
|
|||
|
"\n",
|
|||
|
"# Обучаем модель\n",
|
|||
|
"tree_search.fit(X_train, y_train.values.ravel())\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"##### 3. Градиентный бустинг\n",
|
|||
|
"\n",
|
|||
|
"Для градиентного бустинга будем настраивать параметры, такие как количество деревьев и скорость обучения."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 681,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"d:\\5semestr\\AIM\\aimvenv\\Lib\\site-packages\\sklearn\\compose\\_column_transformer.py:1623: FutureWarning: \n",
|
|||
|
"The format of the columns of the 'remainder' transformer in ColumnTransformer.transformers_ will change in version 1.7 to match the format of the other transformers.\n",
|
|||
|
"At the moment the remainder columns are stored as indices (of type int). With the same ColumnTransformer configuration, in the future they will be stored as column names (of type str).\n",
|
|||
|
"To use the new behavior now and suppress this warning, use ColumnTransformer(force_int_remainder_cols=False).\n",
|
|||
|
"\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<style>#sk-container-id-8 {\n",
|
|||
|
" /* Definition of color scheme common for light and dark mode */\n",
|
|||
|
" --sklearn-color-text: black;\n",
|
|||
|
" --sklearn-color-line: gray;\n",
|
|||
|
" /* Definition of color scheme for unfitted estimators */\n",
|
|||
|
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
|
|||
|
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
|
|||
|
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
|
|||
|
" --sklearn-color-unfitted-level-3: chocolate;\n",
|
|||
|
" /* Definition of color scheme for fitted estimators */\n",
|
|||
|
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
|
|||
|
" --sklearn-color-fitted-level-1: #d4ebff;\n",
|
|||
|
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
|
|||
|
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
|
|||
|
"\n",
|
|||
|
" /* Specific color for light theme */\n",
|
|||
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
|||
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
|
|||
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
|||
|
" --sklearn-color-icon: #696969;\n",
|
|||
|
"\n",
|
|||
|
" @media (prefers-color-scheme: dark) {\n",
|
|||
|
" /* Redefinition of color scheme for dark theme */\n",
|
|||
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
|||
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
|
|||
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
|||
|
" --sklearn-color-icon: #878787;\n",
|
|||
|
" }\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 pre {\n",
|
|||
|
" padding: 0;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 input.sk-hidden--visually {\n",
|
|||
|
" border: 0;\n",
|
|||
|
" clip: rect(1px 1px 1px 1px);\n",
|
|||
|
" clip: rect(1px, 1px, 1px, 1px);\n",
|
|||
|
" height: 1px;\n",
|
|||
|
" margin: -1px;\n",
|
|||
|
" overflow: hidden;\n",
|
|||
|
" padding: 0;\n",
|
|||
|
" position: absolute;\n",
|
|||
|
" width: 1px;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-dashed-wrapped {\n",
|
|||
|
" border: 1px dashed var(--sklearn-color-line);\n",
|
|||
|
" margin: 0 0.4em 0.5em 0.4em;\n",
|
|||
|
" box-sizing: border-box;\n",
|
|||
|
" padding-bottom: 0.4em;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-container {\n",
|
|||
|
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
|
|||
|
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
|
|||
|
" so we also need the `!important` here to be able to override the\n",
|
|||
|
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
|
|||
|
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
|
|||
|
" display: inline-block !important;\n",
|
|||
|
" position: relative;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-text-repr-fallback {\n",
|
|||
|
" display: none;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"div.sk-parallel-item,\n",
|
|||
|
"div.sk-serial,\n",
|
|||
|
"div.sk-item {\n",
|
|||
|
" /* draw centered vertical line to link estimators */\n",
|
|||
|
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
|
|||
|
" background-size: 2px 100%;\n",
|
|||
|
" background-repeat: no-repeat;\n",
|
|||
|
" background-position: center center;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Parallel-specific style estimator block */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-parallel-item::after {\n",
|
|||
|
" content: \"\";\n",
|
|||
|
" width: 100%;\n",
|
|||
|
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
|
|||
|
" flex-grow: 1;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-parallel {\n",
|
|||
|
" display: flex;\n",
|
|||
|
" align-items: stretch;\n",
|
|||
|
" justify-content: center;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
" position: relative;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-parallel-item {\n",
|
|||
|
" display: flex;\n",
|
|||
|
" flex-direction: column;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-parallel-item:first-child::after {\n",
|
|||
|
" align-self: flex-end;\n",
|
|||
|
" width: 50%;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-parallel-item:last-child::after {\n",
|
|||
|
" align-self: flex-start;\n",
|
|||
|
" width: 50%;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-parallel-item:only-child::after {\n",
|
|||
|
" width: 0;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Serial-specific style estimator block */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-serial {\n",
|
|||
|
" display: flex;\n",
|
|||
|
" flex-direction: column;\n",
|
|||
|
" align-items: center;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
" padding-right: 1em;\n",
|
|||
|
" padding-left: 1em;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
|
|||
|
"clickable and can be expanded/collapsed.\n",
|
|||
|
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
|
|||
|
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
|
|||
|
"*/\n",
|
|||
|
"\n",
|
|||
|
"/* Pipeline and ColumnTransformer style (default) */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-toggleable {\n",
|
|||
|
" /* Default theme specific background. It is overwritten whether we have a\n",
|
|||
|
" specific estimator or a Pipeline/ColumnTransformer */\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Toggleable label */\n",
|
|||
|
"#sk-container-id-8 label.sk-toggleable__label {\n",
|
|||
|
" cursor: pointer;\n",
|
|||
|
" display: block;\n",
|
|||
|
" width: 100%;\n",
|
|||
|
" margin-bottom: 0;\n",
|
|||
|
" padding: 0.5em;\n",
|
|||
|
" box-sizing: border-box;\n",
|
|||
|
" text-align: center;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 label.sk-toggleable__label-arrow:before {\n",
|
|||
|
" /* Arrow on the left of the label */\n",
|
|||
|
" content: \"▸\";\n",
|
|||
|
" float: left;\n",
|
|||
|
" margin-right: 0.25em;\n",
|
|||
|
" color: var(--sklearn-color-icon);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 label.sk-toggleable__label-arrow:hover:before {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Toggleable content - dropdown */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-toggleable__content {\n",
|
|||
|
" max-height: 0;\n",
|
|||
|
" max-width: 0;\n",
|
|||
|
" overflow: hidden;\n",
|
|||
|
" text-align: left;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-toggleable__content.fitted {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-toggleable__content pre {\n",
|
|||
|
" margin: 0.2em;\n",
|
|||
|
" border-radius: 0.25em;\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-toggleable__content.fitted pre {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
|
|||
|
" /* Expand drop-down */\n",
|
|||
|
" max-height: 200px;\n",
|
|||
|
" max-width: 100%;\n",
|
|||
|
" overflow: auto;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
|
|||
|
" content: \"▾\";\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Pipeline/ColumnTransformer-specific style */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Estimator-specific style */\n",
|
|||
|
"\n",
|
|||
|
"/* Colorize estimator box */\n",
|
|||
|
"#sk-container-id-8 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-label label.sk-toggleable__label,\n",
|
|||
|
"#sk-container-id-8 div.sk-label label {\n",
|
|||
|
" /* The background is the default theme color */\n",
|
|||
|
" color: var(--sklearn-color-text-on-default-background);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* On hover, darken the color of the background */\n",
|
|||
|
"#sk-container-id-8 div.sk-label:hover label.sk-toggleable__label {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Label box, darken color on hover, fitted */\n",
|
|||
|
"#sk-container-id-8 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Estimator label */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-label label {\n",
|
|||
|
" font-family: monospace;\n",
|
|||
|
" font-weight: bold;\n",
|
|||
|
" display: inline-block;\n",
|
|||
|
" line-height: 1.2em;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-label-container {\n",
|
|||
|
" text-align: center;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Estimator-specific */\n",
|
|||
|
"#sk-container-id-8 div.sk-estimator {\n",
|
|||
|
" font-family: monospace;\n",
|
|||
|
" border: 1px dotted var(--sklearn-color-border-box);\n",
|
|||
|
" border-radius: 0.25em;\n",
|
|||
|
" box-sizing: border-box;\n",
|
|||
|
" margin-bottom: 0.5em;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-estimator.fitted {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* on hover */\n",
|
|||
|
"#sk-container-id-8 div.sk-estimator:hover {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 div.sk-estimator.fitted:hover {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
|
|||
|
"\n",
|
|||
|
"/* Common style for \"i\" and \"?\" */\n",
|
|||
|
"\n",
|
|||
|
".sk-estimator-doc-link,\n",
|
|||
|
"a:link.sk-estimator-doc-link,\n",
|
|||
|
"a:visited.sk-estimator-doc-link {\n",
|
|||
|
" float: right;\n",
|
|||
|
" font-size: smaller;\n",
|
|||
|
" line-height: 1em;\n",
|
|||
|
" font-family: monospace;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
" border-radius: 1em;\n",
|
|||
|
" height: 1em;\n",
|
|||
|
" width: 1em;\n",
|
|||
|
" text-decoration: none !important;\n",
|
|||
|
" margin-left: 1ex;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
|||
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
".sk-estimator-doc-link.fitted,\n",
|
|||
|
"a:link.sk-estimator-doc-link.fitted,\n",
|
|||
|
"a:visited.sk-estimator-doc-link.fitted {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
|||
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* On hover */\n",
|
|||
|
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
|
|||
|
".sk-estimator-doc-link:hover,\n",
|
|||
|
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
|
|||
|
".sk-estimator-doc-link:hover {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
|||
|
" color: var(--sklearn-color-background);\n",
|
|||
|
" text-decoration: none;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
|
|||
|
".sk-estimator-doc-link.fitted:hover,\n",
|
|||
|
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
|
|||
|
".sk-estimator-doc-link.fitted:hover {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
|||
|
" color: var(--sklearn-color-background);\n",
|
|||
|
" text-decoration: none;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* Span, style for the box shown on hovering the info icon */\n",
|
|||
|
".sk-estimator-doc-link span {\n",
|
|||
|
" display: none;\n",
|
|||
|
" z-index: 9999;\n",
|
|||
|
" position: relative;\n",
|
|||
|
" font-weight: normal;\n",
|
|||
|
" right: .2ex;\n",
|
|||
|
" padding: .5ex;\n",
|
|||
|
" margin: .5ex;\n",
|
|||
|
" width: min-content;\n",
|
|||
|
" min-width: 20ex;\n",
|
|||
|
" max-width: 50ex;\n",
|
|||
|
" color: var(--sklearn-color-text);\n",
|
|||
|
" box-shadow: 2pt 2pt 4pt #999;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background: var(--sklearn-color-unfitted-level-0);\n",
|
|||
|
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
".sk-estimator-doc-link.fitted span {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background: var(--sklearn-color-fitted-level-0);\n",
|
|||
|
" border: var(--sklearn-color-fitted-level-3);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
".sk-estimator-doc-link:hover span {\n",
|
|||
|
" display: block;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 a.estimator_doc_link {\n",
|
|||
|
" float: right;\n",
|
|||
|
" font-size: 1rem;\n",
|
|||
|
" line-height: 1em;\n",
|
|||
|
" font-family: monospace;\n",
|
|||
|
" background-color: var(--sklearn-color-background);\n",
|
|||
|
" border-radius: 1rem;\n",
|
|||
|
" height: 1rem;\n",
|
|||
|
" width: 1rem;\n",
|
|||
|
" text-decoration: none;\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
|||
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 a.estimator_doc_link.fitted {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
|||
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"/* On hover */\n",
|
|||
|
"#sk-container-id-8 a.estimator_doc_link:hover {\n",
|
|||
|
" /* unfitted */\n",
|
|||
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
|||
|
" color: var(--sklearn-color-background);\n",
|
|||
|
" text-decoration: none;\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"#sk-container-id-8 a.estimator_doc_link.fitted:hover {\n",
|
|||
|
" /* fitted */\n",
|
|||
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
|||
|
"}\n",
|
|||
|
"</style><div id=\"sk-container-id-8\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>GridSearchCV(cv=5,\n",
|
|||
|
" estimator=Pipeline(steps=[('features_preprocessing',\n",
|
|||
|
" ColumnTransformer(force_int_remainder_cols=False,\n",
|
|||
|
" remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler',\n",
|
|||
|
" StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unkno...\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('drop_columns',\n",
|
|||
|
" ColumnTransformer(remainder='passthrough',\n",
|
|||
|
" transformers=[('drop_columns',\n",
|
|||
|
" 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('model',\n",
|
|||
|
" GradientBoostingClassifier(random_state=9))]),\n",
|
|||
|
" n_jobs=-1,\n",
|
|||
|
" param_grid={'model__learning_rate': [0.05, 0.1, 0.2],\n",
|
|||
|
" 'model__max_depth': [3, 5],\n",
|
|||
|
" 'model__n_estimators': [100, 200]})</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-120\" type=\"checkbox\" ><label for=\"sk-estimator-id-120\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> GridSearchCV<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.model_selection.GridSearchCV.html\">?<span>Documentation for GridSearchCV</span></a><span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></label><div class=\"sk-toggleable__content fitted\"><pre>GridSearchCV(cv=5,\n",
|
|||
|
" estimator=Pipeline(steps=[('features_preprocessing',\n",
|
|||
|
" ColumnTransformer(force_int_remainder_cols=False,\n",
|
|||
|
" remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler',\n",
|
|||
|
" StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unkno...\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('drop_columns',\n",
|
|||
|
" ColumnTransformer(remainder='passthrough',\n",
|
|||
|
" transformers=[('drop_columns',\n",
|
|||
|
" 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('model',\n",
|
|||
|
" GradientBoostingClassifier(random_state=9))]),\n",
|
|||
|
" n_jobs=-1,\n",
|
|||
|
" param_grid={'model__learning_rate': [0.05, 0.1, 0.2],\n",
|
|||
|
" 'model__max_depth': [3, 5],\n",
|
|||
|
" 'model__n_estimators': [100, 200]})</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-121\" type=\"checkbox\" ><label for=\"sk-estimator-id-121\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">best_estimator_: Pipeline</label><div class=\"sk-toggleable__content fitted\"><pre>Pipeline(steps=[('features_preprocessing',\n",
|
|||
|
" ColumnTransformer(force_int_remainder_cols=False,\n",
|
|||
|
" remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler',\n",
|
|||
|
" StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unknown',\n",
|
|||
|
" strategy='constant')),\n",
|
|||
|
" ('...\n",
|
|||
|
" sparse_output=False))]),\n",
|
|||
|
" ['gender', 'race/ethnicity',\n",
|
|||
|
" 'parental level of '\n",
|
|||
|
" 'education',\n",
|
|||
|
" 'lunch',\n",
|
|||
|
" 'test preparation course'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('drop_columns',\n",
|
|||
|
" ColumnTransformer(remainder='passthrough',\n",
|
|||
|
" transformers=[('drop_columns', 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('model',\n",
|
|||
|
" GradientBoostingClassifier(max_depth=5, random_state=9))])</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-serial\"><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-122\" type=\"checkbox\" ><label for=\"sk-estimator-id-122\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> features_preprocessing: ColumnTransformer<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.compose.ColumnTransformer.html\">?<span>Documentation for features_preprocessing: ColumnTransformer</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>ColumnTransformer(force_int_remainder_cols=False, remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler', StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unknown',\n",
|
|||
|
" strategy='constant')),\n",
|
|||
|
" ('encoder',\n",
|
|||
|
" OneHotEncoder(drop='first',\n",
|
|||
|
" handle_unknown='ignore',\n",
|
|||
|
" sparse_output=False))]),\n",
|
|||
|
" ['gender', 'race/ethnicity',\n",
|
|||
|
" 'parental level of education', 'lunch',\n",
|
|||
|
" 'test preparation course'])],\n",
|
|||
|
" verbose_feature_names_out=False)</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-123\" type=\"checkbox\" ><label for=\"sk-estimator-id-123\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">prepocessing_num</label><div class=\"sk-toggleable__content fitted\"><pre>[]</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-124\" type=\"checkbox\" ><label for=\"sk-estimator-id-124\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> SimpleImputer<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.impute.SimpleImputer.html\">?<span>Documentation for SimpleImputer</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>SimpleImputer(strategy='median')</pre></div> </div></div><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-125\" type=\"checkbox\" ><label for=\"sk-estimator-id-125\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> StandardScaler<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.preprocessing.StandardScaler.html\">?<span>Documentation for StandardScaler</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>StandardScaler()</pre></div> </div></div></div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-126\" type=\"checkbox\" ><label for=\"sk-estimator-id-126\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">prepocessing_cat</label><div class=\"sk-toggleable__content fitted\"><pre>['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course']</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-127\" type=\"checkbox\" ><label for=\"sk-estimator-id-127\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> SimpleImputer<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.impute.SimpleImputer.html\">?<span>Documentation for SimpleImputer</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>SimpleImputer(fill_value='unknown', strategy='constant')</pre></div> </div></div><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-128\" type=\"checkbox\" ><label for=\"sk-estimator-id-128\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> OneHotEncoder<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.preprocessing.OneHotEncoder.html\">?<span>Documentation for OneHotEncoder</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>OneHotEncoder(drop='first', handle_unknown='ignore', sparse_output=False)</pre></div> </div></div></div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label
|
|||
|
" transformers=[('drop_columns', 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-132\" type=\"checkbox\" ><label for=\"sk-estimator-id-132\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">drop_columns</label><div class=\"sk-toggleable__content fitted\"><pre>['total_score_discrete']</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-133\" type=\"checkbox\" ><label for=\"sk-estimator-id-133\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">drop</label><div class=\"sk-toggleable__content fitted\"><pre>drop</pre></div> </div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-134\" type=\"checkbox\" ><label for=\"sk-estimator-id-134\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">remainder</label><div class=\"sk-toggleable__content fitted\"><pre>['gender_male', 'race/ethnicity_group B', 'race/ethnicity_group C', 'race/ethnicity_group D', 'race/ethnicity_group E', "parental level of education_bachelor's degree", 'parental level of education_high school', "parental level of education_master's degree", 'parental level of education_some college', 'parental level of education_some high school', 'lunch_standard', 'test preparation course_none']</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-135\" type=\"checkbox\" ><label for=\"sk-estimator-id-135\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">passthrough</label><div class=\"sk-toggleable__content fitted\"><pre>passthrough</pre></div> </div></div></div></div></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-136\" type=\"checkbox\" ><label for=\"sk-estimator-id-136\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> GradientBoostingClassifier<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html\">?<span>Documentation for GradientBoostingClassifier</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>GradientBoostingClassifier(max_depth=5, random_state=9)</pre></div> </div></div></div></div></div></div></div></div></div></div></div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"GridSearchCV(cv=5,\n",
|
|||
|
" estimator=Pipeline(steps=[('features_preprocessing',\n",
|
|||
|
" ColumnTransformer(force_int_remainder_cols=False,\n",
|
|||
|
" remainder='passthrough',\n",
|
|||
|
" transformers=[('prepocessing_num',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler',\n",
|
|||
|
" StandardScaler())]),\n",
|
|||
|
" []),\n",
|
|||
|
" ('prepocessing_cat',\n",
|
|||
|
" Pipeline(steps=[('imputer',\n",
|
|||
|
" SimpleImputer(fill_value='unkno...\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('drop_columns',\n",
|
|||
|
" ColumnTransformer(remainder='passthrough',\n",
|
|||
|
" transformers=[('drop_columns',\n",
|
|||
|
" 'drop',\n",
|
|||
|
" ['total_score_discrete'])],\n",
|
|||
|
" verbose_feature_names_out=False)),\n",
|
|||
|
" ('model',\n",
|
|||
|
" GradientBoostingClassifier(random_state=9))]),\n",
|
|||
|
" n_jobs=-1,\n",
|
|||
|
" param_grid={'model__learning_rate': [0.05, 0.1, 0.2],\n",
|
|||
|
" 'model__max_depth': [3, 5],\n",
|
|||
|
" 'model__n_estimators': [100, 200]})"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 681,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.ensemble import GradientBoostingClassifier\n",
|
|||
|
"\n",
|
|||
|
"# Модель градиентного бустинга\n",
|
|||
|
"gradient_boosting_model = GradientBoostingClassifier(random_state=random_state)\n",
|
|||
|
"\n",
|
|||
|
"# Создаём пайплайн, который сначала применяет preprocessing, а потом обучает модель\n",
|
|||
|
"gradient_boosting_pipeline = Pipeline([\n",
|
|||
|
" (\"features_preprocessing\", features_preprocessing),\n",
|
|||
|
" (\"drop_columns\", drop_columns),\n",
|
|||
|
" (\"model\", gradient_boosting_model) # Здесь добавляем модель в пайплайн\n",
|
|||
|
"])\n",
|
|||
|
"\n",
|
|||
|
"# Параметры для настройки\n",
|
|||
|
"gb_param_grid = {\n",
|
|||
|
" 'model__n_estimators': [100, 200], # Количество деревьев\n",
|
|||
|
" 'model__learning_rate': [0.05, 0.1, 0.2], # Темп обучения\n",
|
|||
|
" 'model__max_depth': [3, 5], # Глубина деревьев\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"gb_search = GridSearchCV(gradient_boosting_pipeline, gb_param_grid, cv=5, n_jobs=-1)\n",
|
|||
|
"\n",
|
|||
|
"# Обучаем модель\n",
|
|||
|
"gb_search.fit(X_train, y_train.values.ravel())\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Оценка качества моделей\n",
|
|||
|
"\n",
|
|||
|
"Для оценки качества моделей будем использовать следующие метрики:\n",
|
|||
|
"\n",
|
|||
|
"Accuracy — это базовая метрика, которая подходит для сбалансированных данных. Она поможет понять, какой процент всех предсказаний был верным.\n",
|
|||
|
"\n",
|
|||
|
"Precision и Recall — эти метрики важны, когда данные могут быть несбалансированными. Для многоклассовых задач важно использовать macro-average, что означает вычисление этих метрик для каждого класса и усреднение.\n",
|
|||
|
"\n",
|
|||
|
"F1-Score — хорошая метрика для задач с несбалансированными классами, так как она учитывает и точность, и полноту. Это важно, если ложные положительные и ложные отрицательные ошибки одинаково важны.\n",
|
|||
|
"\n",
|
|||
|
"ROC AUC — используется для оценки качества модели в контексте разделения классов, особенно если у нас есть вероятности для каждого класса. Это даст дополнительную информацию о том, насколько хорошо модель различает классы.\n",
|
|||
|
"\n",
|
|||
|
"MCC — это особенно полезно для оценки качества модели в случае несбалансированных данных. Это метрика, которая дает более сбалансированное представление о том, как модель предсказывает все классы."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 682,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Accuracy: 0.4550\n",
|
|||
|
"Precision (macro): 0.4669\n",
|
|||
|
"Recall (macro): 0.5094\n",
|
|||
|
"F1-Score (macro): 0.4323\n",
|
|||
|
"ROC AUC (macro): 0.6811\n",
|
|||
|
"MCC: 0.2188\n",
|
|||
|
"Confusion Matrix:\n",
|
|||
|
"[[51 12 26]\n",
|
|||
|
" [30 20 33]\n",
|
|||
|
" [ 3 5 20]]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix, matthews_corrcoef\n",
|
|||
|
"\n",
|
|||
|
"# Получаем предсказания\n",
|
|||
|
"y_pred = logistic_search.predict(X_test)\n",
|
|||
|
"\n",
|
|||
|
"# Оценка качества модели\n",
|
|||
|
"accuracy = accuracy_score(y_test, y_pred)\n",
|
|||
|
"precision = precision_score(y_test, y_pred, average='macro') # Для многоклассовой задачи используем macro\n",
|
|||
|
"recall = recall_score(y_test, y_pred, average='macro')\n",
|
|||
|
"f1 = f1_score(y_test, y_pred, average='macro')\n",
|
|||
|
"roc_auc = roc_auc_score(y_test, logistic_search.predict_proba(X_test), multi_class='ovr', average='macro')\n",
|
|||
|
"mcc = matthews_corrcoef(y_test, y_pred)\n",
|
|||
|
"\n",
|
|||
|
"# Матрица ошибок\n",
|
|||
|
"conf_matrix = confusion_matrix(y_test, y_pred)\n",
|
|||
|
"\n",
|
|||
|
"# Печать метрик\n",
|
|||
|
"print(f\"Accuracy: {accuracy:.4f}\")\n",
|
|||
|
"print(f\"Precision (macro): {precision:.4f}\")\n",
|
|||
|
"print(f\"Recall (macro): {recall:.4f}\")\n",
|
|||
|
"print(f\"F1-Score (macro): {f1:.4f}\")\n",
|
|||
|
"print(f\"ROC AUC (macro): {roc_auc:.4f}\")\n",
|
|||
|
"print(f\"MCC: {mcc:.4f}\")\n",
|
|||
|
"print(f\"Confusion Matrix:\\n{conf_matrix}\")\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 683,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Accuracy: 0.4050\n",
|
|||
|
"Precision (macro): 0.3830\n",
|
|||
|
"Recall (macro): 0.4010\n",
|
|||
|
"F1-Score (macro): 0.3819\n",
|
|||
|
"ROC AUC (macro): 0.5808\n",
|
|||
|
"MCC: 0.0763\n",
|
|||
|
"Confusion Matrix:\n",
|
|||
|
"[[41 32 16]\n",
|
|||
|
" [30 29 24]\n",
|
|||
|
" [ 8 9 11]]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix, matthews_corrcoef\n",
|
|||
|
"\n",
|
|||
|
"# Получаем предсказания\n",
|
|||
|
"y_pred = tree_search.predict(X_test)\n",
|
|||
|
"\n",
|
|||
|
"# Оценка качества модели\n",
|
|||
|
"accuracy = accuracy_score(y_test, y_pred)\n",
|
|||
|
"precision = precision_score(y_test, y_pred, average='macro') # Для многоклассовой задачи используем macro\n",
|
|||
|
"recall = recall_score(y_test, y_pred, average='macro')\n",
|
|||
|
"f1 = f1_score(y_test, y_pred, average='macro')\n",
|
|||
|
"roc_auc = roc_auc_score(y_test, tree_search.predict_proba(X_test), multi_class='ovr', average='macro')\n",
|
|||
|
"mcc = matthews_corrcoef(y_test, y_pred)\n",
|
|||
|
"\n",
|
|||
|
"# Матрица ошибок\n",
|
|||
|
"conf_matrix = confusion_matrix(y_test, y_pred)\n",
|
|||
|
"\n",
|
|||
|
"# Печать метрик\n",
|
|||
|
"print(f\"Accuracy: {accuracy:.4f}\")\n",
|
|||
|
"print(f\"Precision (macro): {precision:.4f}\")\n",
|
|||
|
"print(f\"Recall (macro): {recall:.4f}\")\n",
|
|||
|
"print(f\"F1-Score (macro): {f1:.4f}\")\n",
|
|||
|
"print(f\"ROC AUC (macro): {roc_auc:.4f}\")\n",
|
|||
|
"print(f\"MCC: {mcc:.4f}\")\n",
|
|||
|
"print(f\"Confusion Matrix:\\n{conf_matrix}\")\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 684,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Accuracy: 0.4100\n",
|
|||
|
"Precision (macro): 0.3873\n",
|
|||
|
"Recall (macro): 0.3889\n",
|
|||
|
"F1-Score (macro): 0.3783\n",
|
|||
|
"ROC AUC (macro): 0.5806\n",
|
|||
|
"MCC: 0.0895\n",
|
|||
|
"Confusion Matrix:\n",
|
|||
|
"[[42 30 17]\n",
|
|||
|
" [25 31 27]\n",
|
|||
|
" [ 7 12 9]]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix, matthews_corrcoef\n",
|
|||
|
"\n",
|
|||
|
"# Получаем предсказания\n",
|
|||
|
"y_pred = gb_search.predict(X_test)\n",
|
|||
|
"\n",
|
|||
|
"# Оценка качества модели\n",
|
|||
|
"accuracy = accuracy_score(y_test, y_pred)\n",
|
|||
|
"precision = precision_score(y_test, y_pred, average='macro') # Для многоклассовой задачи используем macro\n",
|
|||
|
"recall = recall_score(y_test, y_pred, average='macro')\n",
|
|||
|
"f1 = f1_score(y_test, y_pred, average='macro')\n",
|
|||
|
"roc_auc = roc_auc_score(y_test, gb_search.predict_proba(X_test), multi_class='ovr', average='macro')\n",
|
|||
|
"mcc = matthews_corrcoef(y_test, y_pred)\n",
|
|||
|
"\n",
|
|||
|
"# Матрица ошибок\n",
|
|||
|
"conf_matrix = confusion_matrix(y_test, y_pred)\n",
|
|||
|
"\n",
|
|||
|
"# Печать метрик\n",
|
|||
|
"print(f\"Accuracy: {accuracy:.4f}\")\n",
|
|||
|
"print(f\"Precision (macro): {precision:.4f}\")\n",
|
|||
|
"print(f\"Recall (macro): {recall:.4f}\")\n",
|
|||
|
"print(f\"F1-Score (macro): {f1:.4f}\")\n",
|
|||
|
"print(f\"ROC AUC (macro): {roc_auc:.4f}\")\n",
|
|||
|
"print(f\"MCC: {mcc:.4f}\")\n",
|
|||
|
"print(f\"Confusion Matrix:\\n{conf_matrix}\")\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Логистическая регрессия показывает наилучшие результаты по большинству метрик, включая точность, полноту, F1-Score и ROC AUC. Она также имеет лучший MCC, что говорит о лучшем качестве предсказаний с учетом всех классов. Хотя все модели показывают относительно низкие значения, логистическая регрессия явно выделяется среди других.\n",
|
|||
|
"\n",
|
|||
|
"Лучшей моделью является логистическая регрессия на основе анализа метрик."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Оценка смещения и дисперсии лучшей модели (логистическая регрессия)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 685,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Средняя точность на обучающих данных (кросс-валидация): 0.5023474178403756\n",
|
|||
|
"Точность на обучающей выборке: 0.536150234741784\n",
|
|||
|
"Точность на тестовой выборке: 0.47\n",
|
|||
|
"Смещение (Bias): 0.53\n",
|
|||
|
"Дисперсия (Variance): 0.000581895126628314\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.model_selection import cross_val_score\n",
|
|||
|
"from sklearn.metrics import accuracy_score\n",
|
|||
|
"from sklearn.model_selection import KFold\n",
|
|||
|
"\n",
|
|||
|
"# Кросс-валидация на обучающих данных\n",
|
|||
|
"kf = KFold(n_splits=5, shuffle=True, random_state=random_state)\n",
|
|||
|
"train_accuracies = cross_val_score(logistic_pipeline, X_train, y_train.values.ravel(), cv=kf, scoring=\"accuracy\")\n",
|
|||
|
"\n",
|
|||
|
"# Прогнозирование на обучающих и тестовых данных\n",
|
|||
|
"logistic_pipeline.fit(X_train, y_train.values.ravel())\n",
|
|||
|
"train_accuracy = accuracy_score(y_train, logistic_pipeline.predict(X_train))\n",
|
|||
|
"test_accuracy = accuracy_score(y_test, logistic_pipeline.predict(X_test))\n",
|
|||
|
"\n",
|
|||
|
"# Смещение (Bias)\n",
|
|||
|
"bias = 1 - test_accuracy # Ошибка на тестовой выборке\n",
|
|||
|
"\n",
|
|||
|
"# Дисперсия (Variance)\n",
|
|||
|
"variance = np.var(train_accuracies) # Дисперсия на обучающей выборке\n",
|
|||
|
"\n",
|
|||
|
"# Выводим результаты\n",
|
|||
|
"print(f\"Средняя точность на обучающих данных (кросс-валидация): {np.mean(train_accuracies)}\")\n",
|
|||
|
"print(f\"Точность на обучающей выборке: {train_accuracy}\")\n",
|
|||
|
"print(f\"Точность на тестовой выборке: {test_accuracy}\")\n",
|
|||
|
"print(f\"Смещение (Bias): {bias}\")\n",
|
|||
|
"print(f\"Дисперсия (Variance): {variance}\")\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Оценка модели: \n",
|
|||
|
"\n",
|
|||
|
"Смещение высокое (Bias = 53%) — это указывает на недообучение модели. Модель не может хорошо предсказать целевой признак на тестовых данных, что означает, что она слишком простая для данного набора данных или её регуляризация слишком сильна.\n",
|
|||
|
"\n",
|
|||
|
"Дисперсия низкая (Variance = 0.00058) — это также подтверждает, что модель не переобучена. Она не слишком чувствительна к изменениям в обучающих данных."
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "aimvenv",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.12.6"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|