{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Лабораторная работа №2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Анализ нескольких датасетов" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.Выбрать три набора данных, которые не соответствуют Вашему варианту задания\n", "### 2. Провести анализ сведений о каждом наборе данных со страницы загрузки в Kaggle. Какова проблемная область?\n", "\n", "Магазины, Цены на автомобиль, Инсульты" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Инсульты " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Данный датасет используется для предсказания вероятности возникновения инсульта у пациента на основе различных параметров, таких как пол, возраст, наличие заболеваний и статус курения. Инсульт является второй по значимости причиной смерти в мире, по данным Всемирной организации здравоохранения (ВОЗ), и ответственен за около 11% всех случаев смерти.\n", "\n", "Информация о колонках\n", "\n", "- id: уникальный идентификатор пациента (int)\n", "- gender: пол пациента, возможные значения — \"Male\" (мужчина), \"Female\" (женщина) или \"Other\" (другое) (object, строковый)\n", "- age: возраст пациента (float)\n", "- hypertension: наличие гипертензии; 0 — если гипертензии нет, 1 — если гипертензия есть (int)\n", "- heart_disease: наличие сердечных заболеваний; 0 — если заболеваний нет, 1 — если есть (int)\n", "- ever_married: статус брака; \"No\" (нет) или \"Yes\" (да) (object, строковый)\n", "- work_type: тип работы; возможные значения — \"children\" (дети), \"Govt_job\" (государственная работа), \"Never_worked\" (никогда не работал), \"Private\" (частный сектор) или \"Self-employed\" (самозанятый) (object, строковый)\n", "- Residence_type: тип проживания; \"Rural\" (сельская местность) или \"Urban\" (городская местность) (object, строковый)\n", "- avg_glucose_level: средний уровень глюкозы в крови (float)\n", "- bmi: индекс массы тела (ИМТ) (float)\n", "- smoking_status: статус курения; возможные значения — \"formerly smoked\" (курил раньше), \"never smoked\" (никогда не курил), \"smokes\" (курит) или \"Unknown\" (неизвестно). Значение \"Unknown\" указывает на недоступность информации о статусе курения пациента (object, строковый) \n", "- stroke: наличие инсульта; 1 — если инсульт был, 0 — если не был (int)\n", "\n", "Каждая строка в датасете содержит соответствующую информацию о пациенте, что позволяет проводить анализ и строить модели для предсказания риска инсульта." ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | id | \n", "gender | \n", "age | \n", "hypertension | \n", "heart_disease | \n", "ever_married | \n", "work_type | \n", "Residence_type | \n", "avg_glucose_level | \n", "bmi | \n", "smoking_status | \n", "stroke | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "9046 | \n", "Male | \n", "67.0 | \n", "0 | \n", "1 | \n", "Yes | \n", "Private | \n", "Urban | \n", "228.69 | \n", "36.6 | \n", "formerly smoked | \n", "1 | \n", "
1 | \n", "51676 | \n", "Female | \n", "61.0 | \n", "0 | \n", "0 | \n", "Yes | \n", "Self-employed | \n", "Rural | \n", "202.21 | \n", "NaN | \n", "never smoked | \n", "1 | \n", "
2 | \n", "31112 | \n", "Male | \n", "80.0 | \n", "0 | \n", "1 | \n", "Yes | \n", "Private | \n", "Rural | \n", "105.92 | \n", "32.5 | \n", "never smoked | \n", "1 | \n", "
3 | \n", "60182 | \n", "Female | \n", "49.0 | \n", "0 | \n", "0 | \n", "Yes | \n", "Private | \n", "Urban | \n", "171.23 | \n", "34.4 | \n", "smokes | \n", "1 | \n", "
4 | \n", "1665 | \n", "Female | \n", "79.0 | \n", "1 | \n", "0 | \n", "Yes | \n", "Self-employed | \n", "Rural | \n", "174.12 | \n", "24.0 | \n", "never smoked | \n", "1 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
5105 | \n", "18234 | \n", "Female | \n", "80.0 | \n", "1 | \n", "0 | \n", "Yes | \n", "Private | \n", "Urban | \n", "83.75 | \n", "NaN | \n", "never smoked | \n", "0 | \n", "
5106 | \n", "44873 | \n", "Female | \n", "81.0 | \n", "0 | \n", "0 | \n", "Yes | \n", "Self-employed | \n", "Urban | \n", "125.20 | \n", "40.0 | \n", "never smoked | \n", "0 | \n", "
5107 | \n", "19723 | \n", "Female | \n", "35.0 | \n", "0 | \n", "0 | \n", "Yes | \n", "Self-employed | \n", "Rural | \n", "82.99 | \n", "30.6 | \n", "never smoked | \n", "0 | \n", "
5108 | \n", "37544 | \n", "Male | \n", "51.0 | \n", "0 | \n", "0 | \n", "Yes | \n", "Private | \n", "Rural | \n", "166.29 | \n", "25.6 | \n", "formerly smoked | \n", "0 | \n", "
5109 | \n", "44679 | \n", "Female | \n", "44.0 | \n", "0 | \n", "0 | \n", "Yes | \n", "Govt_job | \n", "Urban | \n", "85.28 | \n", "26.2 | \n", "Unknown | \n", "0 | \n", "
5110 rows × 12 columns
\n", "\n", " | ID | \n", "Price | \n", "Levy | \n", "Manufacturer | \n", "Model | \n", "Prod. year | \n", "Category | \n", "Leather interior | \n", "Fuel type | \n", "Engine volume | \n", "Mileage | \n", "Cylinders | \n", "Gear box type | \n", "Drive wheels | \n", "Doors | \n", "Wheel | \n", "Color | \n", "Airbags | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "45654403 | \n", "13328 | \n", "1399 | \n", "LEXUS | \n", "RX 450 | \n", "2010 | \n", "Jeep | \n", "Yes | \n", "Hybrid | \n", "3.5 | \n", "186005 km | \n", "6.0 | \n", "Automatic | \n", "4x4 | \n", "04-May | \n", "Left wheel | \n", "Silver | \n", "12 | \n", "
1 | \n", "44731507 | \n", "16621 | \n", "1018 | \n", "CHEVROLET | \n", "Equinox | \n", "2011 | \n", "Jeep | \n", "No | \n", "Petrol | \n", "3 | \n", "192000 km | \n", "6.0 | \n", "Tiptronic | \n", "4x4 | \n", "04-May | \n", "Left wheel | \n", "Black | \n", "8 | \n", "
2 | \n", "45774419 | \n", "8467 | \n", "- | \n", "HONDA | \n", "FIT | \n", "2006 | \n", "Hatchback | \n", "No | \n", "Petrol | \n", "1.3 | \n", "200000 km | \n", "4.0 | \n", "Variator | \n", "Front | \n", "04-May | \n", "Right-hand drive | \n", "Black | \n", "2 | \n", "
3 | \n", "45769185 | \n", "3607 | \n", "862 | \n", "FORD | \n", "Escape | \n", "2011 | \n", "Jeep | \n", "Yes | \n", "Hybrid | \n", "2.5 | \n", "168966 km | \n", "4.0 | \n", "Automatic | \n", "4x4 | \n", "04-May | \n", "Left wheel | \n", "White | \n", "0 | \n", "
4 | \n", "45809263 | \n", "11726 | \n", "446 | \n", "HONDA | \n", "FIT | \n", "2014 | \n", "Hatchback | \n", "Yes | \n", "Petrol | \n", "1.3 | \n", "91901 km | \n", "4.0 | \n", "Automatic | \n", "Front | \n", "04-May | \n", "Left wheel | \n", "Silver | \n", "4 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
19232 | \n", "45798355 | \n", "8467 | \n", "- | \n", "MERCEDES-BENZ | \n", "CLK 200 | \n", "1999 | \n", "Coupe | \n", "Yes | \n", "CNG | \n", "2.0 Turbo | \n", "300000 km | \n", "4.0 | \n", "Manual | \n", "Rear | \n", "02-Mar | \n", "Left wheel | \n", "Silver | \n", "5 | \n", "
19233 | \n", "45778856 | \n", "15681 | \n", "831 | \n", "HYUNDAI | \n", "Sonata | \n", "2011 | \n", "Sedan | \n", "Yes | \n", "Petrol | \n", "2.4 | \n", "161600 km | \n", "4.0 | \n", "Tiptronic | \n", "Front | \n", "04-May | \n", "Left wheel | \n", "Red | \n", "8 | \n", "
19234 | \n", "45804997 | \n", "26108 | \n", "836 | \n", "HYUNDAI | \n", "Tucson | \n", "2010 | \n", "Jeep | \n", "Yes | \n", "Diesel | \n", "2 | \n", "116365 km | \n", "4.0 | \n", "Automatic | \n", "Front | \n", "04-May | \n", "Left wheel | \n", "Grey | \n", "4 | \n", "
19235 | \n", "45793526 | \n", "5331 | \n", "1288 | \n", "CHEVROLET | \n", "Captiva | \n", "2007 | \n", "Jeep | \n", "Yes | \n", "Diesel | \n", "2 | \n", "51258 km | \n", "4.0 | \n", "Automatic | \n", "Front | \n", "04-May | \n", "Left wheel | \n", "Black | \n", "4 | \n", "
19236 | \n", "45813273 | \n", "470 | \n", "753 | \n", "HYUNDAI | \n", "Sonata | \n", "2012 | \n", "Sedan | \n", "Yes | \n", "Hybrid | \n", "2.4 | \n", "186923 km | \n", "4.0 | \n", "Automatic | \n", "Front | \n", "04-May | \n", "Left wheel | \n", "White | \n", "12 | \n", "
19237 rows × 18 columns
\n", "\n", " | Store ID | \n", "Store_Area | \n", "Items_Available | \n", "Daily_Customer_Count | \n", "Store_Sales | \n", "
---|---|---|---|---|---|
0 | \n", "1 | \n", "1659 | \n", "1961 | \n", "530 | \n", "66490 | \n", "
1 | \n", "2 | \n", "1461 | \n", "1752 | \n", "210 | \n", "39820 | \n", "
2 | \n", "3 | \n", "1340 | \n", "1609 | \n", "720 | \n", "54010 | \n", "
3 | \n", "4 | \n", "1451 | \n", "1748 | \n", "620 | \n", "53730 | \n", "
4 | \n", "5 | \n", "1770 | \n", "2111 | \n", "450 | \n", "46620 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
891 | \n", "892 | \n", "1582 | \n", "1910 | \n", "1080 | \n", "66390 | \n", "
892 | \n", "893 | \n", "1387 | \n", "1663 | \n", "850 | \n", "82080 | \n", "
893 | \n", "894 | \n", "1200 | \n", "1436 | \n", "1060 | \n", "76440 | \n", "
894 | \n", "895 | \n", "1299 | \n", "1560 | \n", "770 | \n", "96610 | \n", "
895 | \n", "896 | \n", "1174 | \n", "1429 | \n", "1110 | \n", "54340 | \n", "
896 rows × 5 columns
\n", "