{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Регрессия.\n", "\n", " - Прогнозирование вероятности IT-направления:\n", " Цель: Используя такие параметры, как уровень образования, тип учебного заведения, финансовое положение, возраст и уровень гибкости, можно предсказать занятие IT-направления.\n", "\n", "Классификация.\n", "\n", " - Распределение студентов по типам учебных заведений\n", " Цель: распределить студентов по различным типам учреждений (например, государственные/частные университеты), используя данные об их образовании, возрасте, месте проживания и финансовых возможностях.\n", "\n" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Education Level | \n", "Institution Type | \n", "Gender | \n", "Age | \n", "Device | \n", "IT Student | \n", "Location | \n", "Financial Condition | \n", "Internet Type | \n", "Network Type | \n", "Flexibility Level | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "University | \n", "1 | \n", "Male | \n", "23 | \n", "Tab | \n", "No | \n", "Town | \n", "Mid | \n", "Wifi | \n", "4G | \n", "Moderate | \n", "
1 | \n", "University | \n", "1 | \n", "Female | \n", "23 | \n", "Mobile | \n", "No | \n", "Town | \n", "Mid | \n", "Mobile Data | \n", "4G | \n", "Moderate | \n", "
2 | \n", "College | \n", "0 | \n", "Female | \n", "18 | \n", "Mobile | \n", "No | \n", "Town | \n", "Mid | \n", "Wifi | \n", "4G | \n", "Moderate | \n", "
3 | \n", "School | \n", "1 | \n", "Female | \n", "11 | \n", "Mobile | \n", "No | \n", "Town | \n", "Mid | \n", "Mobile Data | \n", "4G | \n", "Moderate | \n", "
4 | \n", "School | \n", "1 | \n", "Female | \n", "18 | \n", "Mobile | \n", "No | \n", "Town | \n", "Poor | \n", "Mobile Data | \n", "3G | \n", "Low | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1200 | \n", "College | \n", "1 | \n", "Female | \n", "18 | \n", "Mobile | \n", "No | \n", "Town | \n", "Mid | \n", "Wifi | \n", "4G | \n", "Low | \n", "
1201 | \n", "College | \n", "1 | \n", "Female | \n", "18 | \n", "Mobile | \n", "No | \n", "Rural | \n", "Mid | \n", "Wifi | \n", "4G | \n", "Moderate | \n", "
1202 | \n", "School | \n", "1 | \n", "Male | \n", "11 | \n", "Mobile | \n", "No | \n", "Town | \n", "Mid | \n", "Mobile Data | \n", "3G | \n", "Moderate | \n", "
1203 | \n", "College | \n", "1 | \n", "Female | \n", "18 | \n", "Mobile | \n", "No | \n", "Rural | \n", "Mid | \n", "Wifi | \n", "4G | \n", "Low | \n", "
1204 | \n", "School | \n", "1 | \n", "Female | \n", "11 | \n", "Mobile | \n", "No | \n", "Town | \n", "Poor | \n", "Mobile Data | \n", "3G | \n", "Moderate | \n", "
1205 rows × 11 columns
\n", "\n", " | Education Level | \n", "Gender | \n", "Age | \n", "Device | \n", "IT Student | \n", "Location | \n", "Financial Condition | \n", "Internet Type | \n", "Network Type | \n", "Flexibility Level | \n", "
---|---|---|---|---|---|---|---|---|---|---|
294 | \n", "1 | \n", "0 | \n", "9 | \n", "1 | \n", "0 | \n", "1 | \n", "2 | \n", "0 | \n", "2 | \n", "1 | \n", "
876 | \n", "1 | \n", "1 | \n", "11 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "2 | \n", "
382 | \n", "1 | \n", "1 | \n", "11 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "
634 | \n", "2 | \n", "0 | \n", "23 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "
906 | \n", "1 | \n", "0 | \n", "11 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1044 | \n", "0 | \n", "0 | \n", "18 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "2 | \n", "2 | \n", "
1095 | \n", "2 | \n", "0 | \n", "23 | \n", "0 | \n", "1 | \n", "1 | \n", "2 | \n", "1 | \n", "2 | \n", "0 | \n", "
1130 | \n", "1 | \n", "1 | \n", "11 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "2 | \n", "1 | \n", "
860 | \n", "2 | \n", "1 | \n", "23 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "2 | \n", "1 | \n", "
1126 | \n", "2 | \n", "1 | \n", "23 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "
964 rows × 10 columns
\n", "\n", " | Education Level | \n", "Gender | \n", "Age | \n", "Device | \n", "IT Student | \n", "Location | \n", "Financial Condition | \n", "Internet Type | \n", "Network Type | \n", "Flexibility Level | \n", "
---|---|---|---|---|---|---|---|---|---|---|
101 | \n", "1 | \n", "0 | \n", "11 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "2 | \n", "2 | \n", "
946 | \n", "0 | \n", "1 | \n", "18 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "2 | \n", "2 | \n", "
306 | \n", "0 | \n", "1 | \n", "18 | \n", "2 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "2 | \n", "2 | \n", "
109 | \n", "2 | \n", "0 | \n", "23 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "
1061 | \n", "2 | \n", "1 | \n", "23 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "2 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
908 | \n", "1 | \n", "1 | \n", "10 | \n", "1 | \n", "0 | \n", "1 | \n", "2 | \n", "1 | \n", "2 | \n", "2 | \n", "
1135 | \n", "2 | \n", "0 | \n", "18 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "2 | \n", "2 | \n", "
894 | \n", "1 | \n", "0 | \n", "10 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "
866 | \n", "1 | \n", "1 | \n", "11 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "
1006 | \n", "2 | \n", "0 | \n", "23 | \n", "0 | \n", "0 | \n", "1 | \n", "2 | \n", "1 | \n", "2 | \n", "2 | \n", "
241 rows × 10 columns
\n", "GridSearchCV(cv=5,\n", " estimator=Pipeline(steps=[('preprocessor',\n", " ColumnTransformer(transformers=[('cat',\n", " OneHotEncoder(sparse_output=False),\n", " ['Education '\n", " 'Level',\n", " 'Gender',\n", " 'Device',\n", " 'IT '\n", " 'Student',\n", " 'Location',\n", " 'Financial '\n", " 'Condition',\n", " 'Internet '\n", " 'Type',\n", " 'Network '\n", " 'Type',\n", " 'Flexibility '\n", " 'Level']),\n", " ('num',\n", " 'passthrough',\n", " ['Age'])])),\n", " ('classifier',\n", " SGDClassifier(loss='log_loss',\n", " max_iter=2000,\n", " random_state=42))]),\n", " param_grid={'classifier__alpha': [0.0001, 0.001, 0.01],\n", " 'classifier__eta0': [0.01, 0.1],\n", " 'classifier__learning_rate': ['constant', 'adaptive']},\n", " scoring='accuracy')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
GridSearchCV(cv=5,\n", " estimator=Pipeline(steps=[('preprocessor',\n", " ColumnTransformer(transformers=[('cat',\n", " OneHotEncoder(sparse_output=False),\n", " ['Education '\n", " 'Level',\n", " 'Gender',\n", " 'Device',\n", " 'IT '\n", " 'Student',\n", " 'Location',\n", " 'Financial '\n", " 'Condition',\n", " 'Internet '\n", " 'Type',\n", " 'Network '\n", " 'Type',\n", " 'Flexibility '\n", " 'Level']),\n", " ('num',\n", " 'passthrough',\n", " ['Age'])])),\n", " ('classifier',\n", " SGDClassifier(loss='log_loss',\n", " max_iter=2000,\n", " random_state=42))]),\n", " param_grid={'classifier__alpha': [0.0001, 0.001, 0.01],\n", " 'classifier__eta0': [0.01, 0.1],\n", " 'classifier__learning_rate': ['constant', 'adaptive']},\n", " scoring='accuracy')
Pipeline(steps=[('preprocessor',\n", " ColumnTransformer(transformers=[('cat',\n", " OneHotEncoder(sparse_output=False),\n", " ['Education Level', 'Gender',\n", " 'Device', 'IT Student',\n", " 'Location',\n", " 'Financial Condition',\n", " 'Internet Type',\n", " 'Network Type',\n", " 'Flexibility Level']),\n", " ('num', 'passthrough',\n", " ['Age'])])),\n", " ('classifier',\n", " SGDClassifier(eta0=0.1, learning_rate='adaptive',\n", " loss='log_loss', max_iter=2000,\n", " random_state=42))])
ColumnTransformer(transformers=[('cat', OneHotEncoder(sparse_output=False),\n", " ['Education Level', 'Gender', 'Device',\n", " 'IT Student', 'Location',\n", " 'Financial Condition', 'Internet Type',\n", " 'Network Type', 'Flexibility Level']),\n", " ('num', 'passthrough', ['Age'])])
['Education Level', 'Gender', 'Device', 'IT Student', 'Location', 'Financial Condition', 'Internet Type', 'Network Type', 'Flexibility Level']
OneHotEncoder(sparse_output=False)
['Age']
passthrough
SGDClassifier(eta0=0.1, learning_rate='adaptive', loss='log_loss',\n", " max_iter=2000, random_state=42)
GridSearchCV(cv=5,\n", " estimator=Pipeline(steps=[('preprocessor',\n", " ColumnTransformer(transformers=[('cat',\n", " OneHotEncoder(sparse_output=False),\n", " ['Education '\n", " 'Level',\n", " 'Gender',\n", " 'Device',\n", " 'IT '\n", " 'Student',\n", " 'Location',\n", " 'Financial '\n", " 'Condition',\n", " 'Internet '\n", " 'Type',\n", " 'Network '\n", " 'Type',\n", " 'Flexibility '\n", " 'Level']),\n", " ('num',\n", " 'passthrough',\n", " ['Age'])])),\n", " ('regressor',\n", " SGDRegressor(max_iter=2000,\n", " random_state=42))]),\n", " param_grid={'regressor__alpha': [0.0001, 0.001, 0.01],\n", " 'regressor__eta0': [0.01, 0.1],\n", " 'regressor__learning_rate': ['constant', 'adaptive']},\n", " scoring='r2')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
GridSearchCV(cv=5,\n", " estimator=Pipeline(steps=[('preprocessor',\n", " ColumnTransformer(transformers=[('cat',\n", " OneHotEncoder(sparse_output=False),\n", " ['Education '\n", " 'Level',\n", " 'Gender',\n", " 'Device',\n", " 'IT '\n", " 'Student',\n", " 'Location',\n", " 'Financial '\n", " 'Condition',\n", " 'Internet '\n", " 'Type',\n", " 'Network '\n", " 'Type',\n", " 'Flexibility '\n", " 'Level']),\n", " ('num',\n", " 'passthrough',\n", " ['Age'])])),\n", " ('regressor',\n", " SGDRegressor(max_iter=2000,\n", " random_state=42))]),\n", " param_grid={'regressor__alpha': [0.0001, 0.001, 0.01],\n", " 'regressor__eta0': [0.01, 0.1],\n", " 'regressor__learning_rate': ['constant', 'adaptive']},\n", " scoring='r2')
Pipeline(steps=[('preprocessor',\n", " ColumnTransformer(transformers=[('cat',\n", " OneHotEncoder(sparse_output=False),\n", " ['Education Level', 'Gender',\n", " 'Device', 'IT Student',\n", " 'Location',\n", " 'Financial Condition',\n", " 'Internet Type',\n", " 'Network Type',\n", " 'Flexibility Level']),\n", " ('num', 'passthrough',\n", " ['Age'])])),\n", " ('regressor',\n", " SGDRegressor(alpha=0.001, learning_rate='adaptive',\n", " max_iter=2000, random_state=42))])
ColumnTransformer(transformers=[('cat', OneHotEncoder(sparse_output=False),\n", " ['Education Level', 'Gender', 'Device',\n", " 'IT Student', 'Location',\n", " 'Financial Condition', 'Internet Type',\n", " 'Network Type', 'Flexibility Level']),\n", " ('num', 'passthrough', ['Age'])])
['Education Level', 'Gender', 'Device', 'IT Student', 'Location', 'Financial Condition', 'Internet Type', 'Network Type', 'Flexibility Level']
OneHotEncoder(sparse_output=False)
['Age']
passthrough
SGDRegressor(alpha=0.001, learning_rate='adaptive', max_iter=2000,\n", " random_state=42)