lab_3 #3

Merged
Arutunyan-Dmitry merged 6 commits from lab_3 into main 2024-12-06 15:56:15 +04:00
Showing only changes of commit 962fc3c2ed - Show all commits

View File

@ -7,6 +7,26 @@
"# Вариант 2. Показатели сердечных заболеваний"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Этот датасет представляет собой данные, собранные в ходе ежегодного опроса CDC о состоянии здоровья более 400 тысяч взрослых в США. Он включает информацию о различных факторах риска сердечных заболеваний, таких как гипертония, высокий уровень холестерина, курение, диабет, ожирение, недостаток физической активности и злоупотребление алкоголем. Также содержатся данные о состоянии здоровья респондентов, наличии хронических заболеваний (например, диабет, артрит, астма), уровне физической активности, психологическом здоровье, а также о социальных и демографических характеристиках, таких как пол, возраст, этническая принадлежность и место проживания. Датасет предоставляет информацию, которая может быть использована для анализа и предсказания риска сердечных заболеваний, а также для разработки программ профилактики и улучшения общественного здоровья."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Бизнес-цели:\n",
"- Предсказание риска сердечных заболеваний: создание модели для определения вероятности заболевания сердечными болезнями на основе факторов риска.\n",
"- Идентификация ключевых факторов, влияющих на здоровье: выявление наиболее значимых факторов, влияющих на риск сердечных заболеваний, чтобы разработать программы профилактики.\n",
"\n",
"#### Цели технического проекта:\n",
"- Предсказание риска сердечных заболеваний: разработка модели машинного обучения (например, логистической регрессии, случайного леса) для классификации респондентов по риску сердечных заболеваний (с использованием функции \"HadHeartAttack\").\n",
"- Идентификация ключевых факторов: анализ факторов, влияющих на развитие сердечных заболеваний, чтобы выявить наиболее значимые признаки для предсказания."
]
},
{
"cell_type": "code",
"execution_count": 248,
@ -1831,6 +1851,436 @@
"plt.ylabel('Истинный класс')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Ручное конструирование признаков"
]
},
{
"cell_type": "code",
"execution_count": 361,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>State</th>\n",
" <th>Sex</th>\n",
" <th>GeneralHealth</th>\n",
" <th>LastCheckupTime</th>\n",
" <th>PhysicalActivities</th>\n",
" <th>RemovedTeeth</th>\n",
" <th>HadHeartAttack</th>\n",
" <th>HadAngina</th>\n",
" <th>HadStroke</th>\n",
" <th>...</th>\n",
" <th>PneumoVaxEver</th>\n",
" <th>TetanusLast10Tdap</th>\n",
" <th>HighRiskLastYear</th>\n",
" <th>CovidPos</th>\n",
" <th>PhysicalHealthDaysNorm</th>\n",
" <th>MentalHealthDaysNorm</th>\n",
" <th>SleepHoursNorm</th>\n",
" <th>HeightInMetersNorm</th>\n",
" <th>WeightInKilogramsNorm</th>\n",
" <th>BMINorm</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>Alabama</td>\n",
" <td>Female</td>\n",
" <td>Very good</td>\n",
" <td>Within past year (anytime less than 12 months ...</td>\n",
" <td>True</td>\n",
" <td>None of them</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Yes, received Tdap</td>\n",
" <td>False</td>\n",
" <td>No</td>\n",
" <td>0.533333</td>\n",
" <td>0.0</td>\n",
" <td>0.750</td>\n",
" <td>0.325000</td>\n",
" <td>0.403446</td>\n",
" <td>0.497047</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>Alabama</td>\n",
" <td>Male</td>\n",
" <td>Very good</td>\n",
" <td>Within past year (anytime less than 12 months ...</td>\n",
" <td>True</td>\n",
" <td>None of them</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Yes, received tetanus shot but not sure what type</td>\n",
" <td>False</td>\n",
" <td>No</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.375</td>\n",
" <td>0.625000</td>\n",
" <td>0.621891</td>\n",
" <td>0.567257</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>Alabama</td>\n",
" <td>Male</td>\n",
" <td>Very good</td>\n",
" <td>Within past year (anytime less than 12 months ...</td>\n",
" <td>False</td>\n",
" <td>6 or more, but not all</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>No, did not receive any tetanus shot in the pa...</td>\n",
" <td>False</td>\n",
" <td>Yes</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.625</td>\n",
" <td>0.741667</td>\n",
" <td>0.747974</td>\n",
" <td>0.617454</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>Alabama</td>\n",
" <td>Female</td>\n",
" <td>Fair</td>\n",
" <td>Within past year (anytime less than 12 months ...</td>\n",
" <td>True</td>\n",
" <td>None of them</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>No, did not receive any tetanus shot in the pa...</td>\n",
" <td>False</td>\n",
" <td>Yes</td>\n",
" <td>0.666667</td>\n",
" <td>0.0</td>\n",
" <td>0.750</td>\n",
" <td>0.491667</td>\n",
" <td>0.579925</td>\n",
" <td>0.606299</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>Alabama</td>\n",
" <td>Female</td>\n",
" <td>Good</td>\n",
" <td>Within past year (anytime less than 12 months ...</td>\n",
" <td>True</td>\n",
" <td>1 to 5</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>No, did not receive any tetanus shot in the pa...</td>\n",
" <td>False</td>\n",
" <td>No</td>\n",
" <td>0.400000</td>\n",
" <td>1.0</td>\n",
" <td>0.250</td>\n",
" <td>0.241667</td>\n",
" <td>0.474871</td>\n",
" <td>0.663714</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>246017</th>\n",
" <td>246017</td>\n",
" <td>Virgin Islands</td>\n",
" <td>Male</td>\n",
" <td>Very good</td>\n",
" <td>Within past 2 years (1 year but less than 2 ye...</td>\n",
" <td>True</td>\n",
" <td>None of them</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>False</td>\n",
" <td>Yes, received tetanus shot but not sure what type</td>\n",
" <td>False</td>\n",
" <td>No</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.375</td>\n",
" <td>0.625000</td>\n",
" <td>0.684978</td>\n",
" <td>0.637795</td>\n",
" </tr>\n",
" <tr>\n",
" <th>246018</th>\n",
" <td>246018</td>\n",
" <td>Virgin Islands</td>\n",
" <td>Female</td>\n",
" <td>Fair</td>\n",
" <td>Within past year (anytime less than 12 months ...</td>\n",
" <td>True</td>\n",
" <td>None of them</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>False</td>\n",
" <td>No, did not receive any tetanus shot in the pa...</td>\n",
" <td>False</td>\n",
" <td>Yes</td>\n",
" <td>0.000000</td>\n",
" <td>0.7</td>\n",
" <td>0.500</td>\n",
" <td>0.875000</td>\n",
" <td>0.579925</td>\n",
" <td>0.377297</td>\n",
" </tr>\n",
" <tr>\n",
" <th>246019</th>\n",
" <td>246019</td>\n",
" <td>Virgin Islands</td>\n",
" <td>Male</td>\n",
" <td>Good</td>\n",
" <td>Within past year (anytime less than 12 months ...</td>\n",
" <td>True</td>\n",
" <td>1 to 5</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Yes, received tetanus shot but not sure what type</td>\n",
" <td>False</td>\n",
" <td>Yes</td>\n",
" <td>0.000000</td>\n",
" <td>1.0</td>\n",
" <td>0.500</td>\n",
" <td>0.458333</td>\n",
" <td>0.516837</td>\n",
" <td>0.558399</td>\n",
" </tr>\n",
" <tr>\n",
" <th>246020</th>\n",
" <td>246020</td>\n",
" <td>Virgin Islands</td>\n",
" <td>Female</td>\n",
" <td>Excellent</td>\n",
" <td>Within past year (anytime less than 12 months ...</td>\n",
" <td>True</td>\n",
" <td>None of them</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>False</td>\n",
" <td>Yes, received tetanus shot but not sure what type</td>\n",
" <td>False</td>\n",
" <td>No</td>\n",
" <td>0.266667</td>\n",
" <td>0.2</td>\n",
" <td>0.500</td>\n",
" <td>0.491667</td>\n",
" <td>0.508500</td>\n",
" <td>0.519029</td>\n",
" </tr>\n",
" <tr>\n",
" <th>246021</th>\n",
" <td>246021</td>\n",
" <td>Virgin Islands</td>\n",
" <td>Male</td>\n",
" <td>Very good</td>\n",
" <td>Within past year (anytime less than 12 months ...</td>\n",
" <td>False</td>\n",
" <td>None of them</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>No, did not receive any tetanus shot in the pa...</td>\n",
" <td>False</td>\n",
" <td>Yes</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.250</td>\n",
" <td>0.708333</td>\n",
" <td>0.747974</td>\n",
" <td>0.646654</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>246022 rows × 41 columns</p>\n",
"</div>"
],
"text/plain": [
" id State Sex GeneralHealth \\\n",
"0 0 Alabama Female Very good \n",
"1 1 Alabama Male Very good \n",
"2 2 Alabama Male Very good \n",
"3 3 Alabama Female Fair \n",
"4 4 Alabama Female Good \n",
"... ... ... ... ... \n",
"246017 246017 Virgin Islands Male Very good \n",
"246018 246018 Virgin Islands Female Fair \n",
"246019 246019 Virgin Islands Male Good \n",
"246020 246020 Virgin Islands Female Excellent \n",
"246021 246021 Virgin Islands Male Very good \n",
"\n",
" LastCheckupTime PhysicalActivities \\\n",
"0 Within past year (anytime less than 12 months ... True \n",
"1 Within past year (anytime less than 12 months ... True \n",
"2 Within past year (anytime less than 12 months ... False \n",
"3 Within past year (anytime less than 12 months ... True \n",
"4 Within past year (anytime less than 12 months ... True \n",
"... ... ... \n",
"246017 Within past 2 years (1 year but less than 2 ye... True \n",
"246018 Within past year (anytime less than 12 months ... True \n",
"246019 Within past year (anytime less than 12 months ... True \n",
"246020 Within past year (anytime less than 12 months ... True \n",
"246021 Within past year (anytime less than 12 months ... False \n",
"\n",
" RemovedTeeth HadHeartAttack HadAngina HadStroke ... \\\n",
"0 None of them False False False ... \n",
"1 None of them False False False ... \n",
"2 6 or more, but not all False False False ... \n",
"3 None of them False False False ... \n",
"4 1 to 5 False False False ... \n",
"... ... ... ... ... ... \n",
"246017 None of them False False False ... \n",
"246018 None of them False False False ... \n",
"246019 1 to 5 False False True ... \n",
"246020 None of them False False False ... \n",
"246021 None of them True False False ... \n",
"\n",
" PneumoVaxEver TetanusLast10Tdap \\\n",
"0 True Yes, received Tdap \n",
"1 True Yes, received tetanus shot but not sure what type \n",
"2 True No, did not receive any tetanus shot in the pa... \n",
"3 True No, did not receive any tetanus shot in the pa... \n",
"4 True No, did not receive any tetanus shot in the pa... \n",
"... ... ... \n",
"246017 False Yes, received tetanus shot but not sure what type \n",
"246018 False No, did not receive any tetanus shot in the pa... \n",
"246019 True Yes, received tetanus shot but not sure what type \n",
"246020 False Yes, received tetanus shot but not sure what type \n",
"246021 True No, did not receive any tetanus shot in the pa... \n",
"\n",
" HighRiskLastYear CovidPos PhysicalHealthDaysNorm \\\n",
"0 False No 0.533333 \n",
"1 False No 0.000000 \n",
"2 False Yes 0.000000 \n",
"3 False Yes 0.666667 \n",
"4 False No 0.400000 \n",
"... ... ... ... \n",
"246017 False No 0.000000 \n",
"246018 False Yes 0.000000 \n",
"246019 False Yes 0.000000 \n",
"246020 False No 0.266667 \n",
"246021 False Yes 0.000000 \n",
"\n",
" MentalHealthDaysNorm SleepHoursNorm HeightInMetersNorm \\\n",
"0 0.0 0.750 0.325000 \n",
"1 0.0 0.375 0.625000 \n",
"2 0.0 0.625 0.741667 \n",
"3 0.0 0.750 0.491667 \n",
"4 1.0 0.250 0.241667 \n",
"... ... ... ... \n",
"246017 0.0 0.375 0.625000 \n",
"246018 0.7 0.500 0.875000 \n",
"246019 1.0 0.500 0.458333 \n",
"246020 0.2 0.500 0.491667 \n",
"246021 0.0 0.250 0.708333 \n",
"\n",
" WeightInKilogramsNorm BMINorm \n",
"0 0.403446 0.497047 \n",
"1 0.621891 0.567257 \n",
"2 0.747974 0.617454 \n",
"3 0.579925 0.606299 \n",
"4 0.474871 0.663714 \n",
"... ... ... \n",
"246017 0.684978 0.637795 \n",
"246018 0.579925 0.377297 \n",
"246019 0.516837 0.558399 \n",
"246020 0.508500 0.519029 \n",
"246021 0.747974 0.646654 \n",
"\n",
"[246022 rows x 41 columns]"
]
},
"execution_count": 361,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_norm"
]
}
],
"metadata": {