lab_3 #3
450
lab_3/lab3.ipynb
450
lab_3/lab3.ipynb
@ -7,6 +7,26 @@
|
||||
"# Вариант 2. Показатели сердечных заболеваний"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Этот датасет представляет собой данные, собранные в ходе ежегодного опроса CDC о состоянии здоровья более 400 тысяч взрослых в США. Он включает информацию о различных факторах риска сердечных заболеваний, таких как гипертония, высокий уровень холестерина, курение, диабет, ожирение, недостаток физической активности и злоупотребление алкоголем. Также содержатся данные о состоянии здоровья респондентов, наличии хронических заболеваний (например, диабет, артрит, астма), уровне физической активности, психологическом здоровье, а также о социальных и демографических характеристиках, таких как пол, возраст, этническая принадлежность и место проживания. Датасет предоставляет информацию, которая может быть использована для анализа и предсказания риска сердечных заболеваний, а также для разработки программ профилактики и улучшения общественного здоровья."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Бизнес-цели:\n",
|
||||
"- Предсказание риска сердечных заболеваний: создание модели для определения вероятности заболевания сердечными болезнями на основе факторов риска.\n",
|
||||
"- Идентификация ключевых факторов, влияющих на здоровье: выявление наиболее значимых факторов, влияющих на риск сердечных заболеваний, чтобы разработать программы профилактики.\n",
|
||||
"\n",
|
||||
"#### Цели технического проекта:\n",
|
||||
"- Предсказание риска сердечных заболеваний: разработка модели машинного обучения (например, логистической регрессии, случайного леса) для классификации респондентов по риску сердечных заболеваний (с использованием функции \"HadHeartAttack\").\n",
|
||||
"- Идентификация ключевых факторов: анализ факторов, влияющих на развитие сердечных заболеваний, чтобы выявить наиболее значимые признаки для предсказания."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 248,
|
||||
@ -1831,6 +1851,436 @@
|
||||
"plt.ylabel('Истинный класс')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Ручное конструирование признаков"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 361,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>id</th>\n",
|
||||
" <th>State</th>\n",
|
||||
" <th>Sex</th>\n",
|
||||
" <th>GeneralHealth</th>\n",
|
||||
" <th>LastCheckupTime</th>\n",
|
||||
" <th>PhysicalActivities</th>\n",
|
||||
" <th>RemovedTeeth</th>\n",
|
||||
" <th>HadHeartAttack</th>\n",
|
||||
" <th>HadAngina</th>\n",
|
||||
" <th>HadStroke</th>\n",
|
||||
" <th>...</th>\n",
|
||||
" <th>PneumoVaxEver</th>\n",
|
||||
" <th>TetanusLast10Tdap</th>\n",
|
||||
" <th>HighRiskLastYear</th>\n",
|
||||
" <th>CovidPos</th>\n",
|
||||
" <th>PhysicalHealthDaysNorm</th>\n",
|
||||
" <th>MentalHealthDaysNorm</th>\n",
|
||||
" <th>SleepHoursNorm</th>\n",
|
||||
" <th>HeightInMetersNorm</th>\n",
|
||||
" <th>WeightInKilogramsNorm</th>\n",
|
||||
" <th>BMINorm</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>Alabama</td>\n",
|
||||
" <td>Female</td>\n",
|
||||
" <td>Very good</td>\n",
|
||||
" <td>Within past year (anytime less than 12 months ...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>None of them</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>Yes, received Tdap</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>No</td>\n",
|
||||
" <td>0.533333</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>0.750</td>\n",
|
||||
" <td>0.325000</td>\n",
|
||||
" <td>0.403446</td>\n",
|
||||
" <td>0.497047</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>Alabama</td>\n",
|
||||
" <td>Male</td>\n",
|
||||
" <td>Very good</td>\n",
|
||||
" <td>Within past year (anytime less than 12 months ...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>None of them</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>Yes, received tetanus shot but not sure what type</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>No</td>\n",
|
||||
" <td>0.000000</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>0.375</td>\n",
|
||||
" <td>0.625000</td>\n",
|
||||
" <td>0.621891</td>\n",
|
||||
" <td>0.567257</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>Alabama</td>\n",
|
||||
" <td>Male</td>\n",
|
||||
" <td>Very good</td>\n",
|
||||
" <td>Within past year (anytime less than 12 months ...</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>6 or more, but not all</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>No, did not receive any tetanus shot in the pa...</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>Yes</td>\n",
|
||||
" <td>0.000000</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>0.625</td>\n",
|
||||
" <td>0.741667</td>\n",
|
||||
" <td>0.747974</td>\n",
|
||||
" <td>0.617454</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>3</td>\n",
|
||||
" <td>Alabama</td>\n",
|
||||
" <td>Female</td>\n",
|
||||
" <td>Fair</td>\n",
|
||||
" <td>Within past year (anytime less than 12 months ...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>None of them</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>No, did not receive any tetanus shot in the pa...</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>Yes</td>\n",
|
||||
" <td>0.666667</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>0.750</td>\n",
|
||||
" <td>0.491667</td>\n",
|
||||
" <td>0.579925</td>\n",
|
||||
" <td>0.606299</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>4</td>\n",
|
||||
" <td>Alabama</td>\n",
|
||||
" <td>Female</td>\n",
|
||||
" <td>Good</td>\n",
|
||||
" <td>Within past year (anytime less than 12 months ...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>1 to 5</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>No, did not receive any tetanus shot in the pa...</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>No</td>\n",
|
||||
" <td>0.400000</td>\n",
|
||||
" <td>1.0</td>\n",
|
||||
" <td>0.250</td>\n",
|
||||
" <td>0.241667</td>\n",
|
||||
" <td>0.474871</td>\n",
|
||||
" <td>0.663714</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>...</th>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>246017</th>\n",
|
||||
" <td>246017</td>\n",
|
||||
" <td>Virgin Islands</td>\n",
|
||||
" <td>Male</td>\n",
|
||||
" <td>Very good</td>\n",
|
||||
" <td>Within past 2 years (1 year but less than 2 ye...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>None of them</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>Yes, received tetanus shot but not sure what type</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>No</td>\n",
|
||||
" <td>0.000000</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>0.375</td>\n",
|
||||
" <td>0.625000</td>\n",
|
||||
" <td>0.684978</td>\n",
|
||||
" <td>0.637795</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>246018</th>\n",
|
||||
" <td>246018</td>\n",
|
||||
" <td>Virgin Islands</td>\n",
|
||||
" <td>Female</td>\n",
|
||||
" <td>Fair</td>\n",
|
||||
" <td>Within past year (anytime less than 12 months ...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>None of them</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>No, did not receive any tetanus shot in the pa...</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>Yes</td>\n",
|
||||
" <td>0.000000</td>\n",
|
||||
" <td>0.7</td>\n",
|
||||
" <td>0.500</td>\n",
|
||||
" <td>0.875000</td>\n",
|
||||
" <td>0.579925</td>\n",
|
||||
" <td>0.377297</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>246019</th>\n",
|
||||
" <td>246019</td>\n",
|
||||
" <td>Virgin Islands</td>\n",
|
||||
" <td>Male</td>\n",
|
||||
" <td>Good</td>\n",
|
||||
" <td>Within past year (anytime less than 12 months ...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>1 to 5</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>Yes, received tetanus shot but not sure what type</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>Yes</td>\n",
|
||||
" <td>0.000000</td>\n",
|
||||
" <td>1.0</td>\n",
|
||||
" <td>0.500</td>\n",
|
||||
" <td>0.458333</td>\n",
|
||||
" <td>0.516837</td>\n",
|
||||
" <td>0.558399</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>246020</th>\n",
|
||||
" <td>246020</td>\n",
|
||||
" <td>Virgin Islands</td>\n",
|
||||
" <td>Female</td>\n",
|
||||
" <td>Excellent</td>\n",
|
||||
" <td>Within past year (anytime less than 12 months ...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>None of them</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>Yes, received tetanus shot but not sure what type</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>No</td>\n",
|
||||
" <td>0.266667</td>\n",
|
||||
" <td>0.2</td>\n",
|
||||
" <td>0.500</td>\n",
|
||||
" <td>0.491667</td>\n",
|
||||
" <td>0.508500</td>\n",
|
||||
" <td>0.519029</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>246021</th>\n",
|
||||
" <td>246021</td>\n",
|
||||
" <td>Virgin Islands</td>\n",
|
||||
" <td>Male</td>\n",
|
||||
" <td>Very good</td>\n",
|
||||
" <td>Within past year (anytime less than 12 months ...</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>None of them</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" <td>No, did not receive any tetanus shot in the pa...</td>\n",
|
||||
" <td>False</td>\n",
|
||||
" <td>Yes</td>\n",
|
||||
" <td>0.000000</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>0.250</td>\n",
|
||||
" <td>0.708333</td>\n",
|
||||
" <td>0.747974</td>\n",
|
||||
" <td>0.646654</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"<p>246022 rows × 41 columns</p>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" id State Sex GeneralHealth \\\n",
|
||||
"0 0 Alabama Female Very good \n",
|
||||
"1 1 Alabama Male Very good \n",
|
||||
"2 2 Alabama Male Very good \n",
|
||||
"3 3 Alabama Female Fair \n",
|
||||
"4 4 Alabama Female Good \n",
|
||||
"... ... ... ... ... \n",
|
||||
"246017 246017 Virgin Islands Male Very good \n",
|
||||
"246018 246018 Virgin Islands Female Fair \n",
|
||||
"246019 246019 Virgin Islands Male Good \n",
|
||||
"246020 246020 Virgin Islands Female Excellent \n",
|
||||
"246021 246021 Virgin Islands Male Very good \n",
|
||||
"\n",
|
||||
" LastCheckupTime PhysicalActivities \\\n",
|
||||
"0 Within past year (anytime less than 12 months ... True \n",
|
||||
"1 Within past year (anytime less than 12 months ... True \n",
|
||||
"2 Within past year (anytime less than 12 months ... False \n",
|
||||
"3 Within past year (anytime less than 12 months ... True \n",
|
||||
"4 Within past year (anytime less than 12 months ... True \n",
|
||||
"... ... ... \n",
|
||||
"246017 Within past 2 years (1 year but less than 2 ye... True \n",
|
||||
"246018 Within past year (anytime less than 12 months ... True \n",
|
||||
"246019 Within past year (anytime less than 12 months ... True \n",
|
||||
"246020 Within past year (anytime less than 12 months ... True \n",
|
||||
"246021 Within past year (anytime less than 12 months ... False \n",
|
||||
"\n",
|
||||
" RemovedTeeth HadHeartAttack HadAngina HadStroke ... \\\n",
|
||||
"0 None of them False False False ... \n",
|
||||
"1 None of them False False False ... \n",
|
||||
"2 6 or more, but not all False False False ... \n",
|
||||
"3 None of them False False False ... \n",
|
||||
"4 1 to 5 False False False ... \n",
|
||||
"... ... ... ... ... ... \n",
|
||||
"246017 None of them False False False ... \n",
|
||||
"246018 None of them False False False ... \n",
|
||||
"246019 1 to 5 False False True ... \n",
|
||||
"246020 None of them False False False ... \n",
|
||||
"246021 None of them True False False ... \n",
|
||||
"\n",
|
||||
" PneumoVaxEver TetanusLast10Tdap \\\n",
|
||||
"0 True Yes, received Tdap \n",
|
||||
"1 True Yes, received tetanus shot but not sure what type \n",
|
||||
"2 True No, did not receive any tetanus shot in the pa... \n",
|
||||
"3 True No, did not receive any tetanus shot in the pa... \n",
|
||||
"4 True No, did not receive any tetanus shot in the pa... \n",
|
||||
"... ... ... \n",
|
||||
"246017 False Yes, received tetanus shot but not sure what type \n",
|
||||
"246018 False No, did not receive any tetanus shot in the pa... \n",
|
||||
"246019 True Yes, received tetanus shot but not sure what type \n",
|
||||
"246020 False Yes, received tetanus shot but not sure what type \n",
|
||||
"246021 True No, did not receive any tetanus shot in the pa... \n",
|
||||
"\n",
|
||||
" HighRiskLastYear CovidPos PhysicalHealthDaysNorm \\\n",
|
||||
"0 False No 0.533333 \n",
|
||||
"1 False No 0.000000 \n",
|
||||
"2 False Yes 0.000000 \n",
|
||||
"3 False Yes 0.666667 \n",
|
||||
"4 False No 0.400000 \n",
|
||||
"... ... ... ... \n",
|
||||
"246017 False No 0.000000 \n",
|
||||
"246018 False Yes 0.000000 \n",
|
||||
"246019 False Yes 0.000000 \n",
|
||||
"246020 False No 0.266667 \n",
|
||||
"246021 False Yes 0.000000 \n",
|
||||
"\n",
|
||||
" MentalHealthDaysNorm SleepHoursNorm HeightInMetersNorm \\\n",
|
||||
"0 0.0 0.750 0.325000 \n",
|
||||
"1 0.0 0.375 0.625000 \n",
|
||||
"2 0.0 0.625 0.741667 \n",
|
||||
"3 0.0 0.750 0.491667 \n",
|
||||
"4 1.0 0.250 0.241667 \n",
|
||||
"... ... ... ... \n",
|
||||
"246017 0.0 0.375 0.625000 \n",
|
||||
"246018 0.7 0.500 0.875000 \n",
|
||||
"246019 1.0 0.500 0.458333 \n",
|
||||
"246020 0.2 0.500 0.491667 \n",
|
||||
"246021 0.0 0.250 0.708333 \n",
|
||||
"\n",
|
||||
" WeightInKilogramsNorm BMINorm \n",
|
||||
"0 0.403446 0.497047 \n",
|
||||
"1 0.621891 0.567257 \n",
|
||||
"2 0.747974 0.617454 \n",
|
||||
"3 0.579925 0.606299 \n",
|
||||
"4 0.474871 0.663714 \n",
|
||||
"... ... ... \n",
|
||||
"246017 0.684978 0.637795 \n",
|
||||
"246018 0.579925 0.377297 \n",
|
||||
"246019 0.516837 0.558399 \n",
|
||||
"246020 0.508500 0.519029 \n",
|
||||
"246021 0.747974 0.646654 \n",
|
||||
"\n",
|
||||
"[246022 rows x 41 columns]"
|
||||
]
|
||||
},
|
||||
"execution_count": 361,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"df_norm"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
Loading…
Reference in New Issue
Block a user