MAI_ISE-31_Andrikhov-A-S/lab2.ipynb
2024-10-19 13:14:28 +04:00

1302 lines
48 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Лабораторная работа 2. Анализ нескольких датасетов."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.Выбрать три набора данных, которые не соответствуют Вашему варианту задания\n",
"Выбранны варианты: Данные по инсультам(Вариант 4), Продажи домов(Вариант 6), Цены на мобильные устройства (Вариант 18)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Провести анализ сведений о каждом наборе данных со страницы загрузки в Kaggle. Какова проблемная область?\n",
"\n",
"#### Данные по инсультам:\n",
"- **Проблемная область:** Анализ данных о пациентах с инсультом\n",
"- **Цели:** Анализ данных о пациентах с инсультом, определение факторов, влияющих на исход лечения\n",
"- **Набор данных:** 5111 записей, 12 переменных:\n",
" - id\n",
" - gender\n",
" - age\n",
" - hypertension\n",
" - heart_disease\n",
" - ever_married\n",
" - work_type\n",
" - residence_typr\n",
" - avg_glucose_level\n",
" - bmi\n",
" - smoking_status\n",
" - stroke\n",
"- **Описание данных:** Сведения о пациентах с инсультом, их лечении и исходе лечения\n",
"\n",
"#### Продажи домов:\n",
"- **Проблемная область:** Анализ продаж домов и их цен в зависисмости от различных факторов \n",
"- **Цели:** Анализ продаж домов, определение факторов, влияющих на цены\n",
"- **Набор данных:** 21614 записей, 21 переменная:\n",
" - id\n",
" - date\n",
" - price\n",
" - bedrooms\n",
" - bathrooms\n",
" - sqft_living\n",
" - sqft_loft\n",
" - floors\n",
" - waterfront\n",
" - view\n",
" - condition\n",
" - grade\n",
" - sqft_above\n",
" - sqft_basment\n",
" - yr_build\n",
" - yr_renovated\n",
" - zipcode\n",
" - lat\n",
" - longsqft_living15\n",
" - sqft_lot15\n",
"- **Описание данных:** Сведения о проданных домах в King County, США\n",
"\n",
"#### Цены на мобильные устройства:\n",
"- **Проблемная область:** Анализ цен на мобильные устройства\n",
"- **Цели:** Анализ цен на мобильные устройства, определение факторов, влияющих на цены\n",
"- **Набор данных:** 1371 записей, 18 переменных:\n",
" - id\n",
" - name\n",
" - rating\n",
" - spec_score\n",
" - no_of_sim\n",
" - ram\n",
" - battery\n",
" - camera\n",
" - external_memory\n",
" - android_version\n",
" - price\n",
" - company\n",
" - inbuild_memory\n",
" - fast_charging\n",
" - screen_resolution\n",
" - processor\n",
" - processor_name\n",
"- **Описание данных:** Сведения о ценах на мобильные устройства в зависимости от различных факторов"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Данные по инсультам:\n",
"Каждая строка в датасете содержит соответствующую информацию о пациенте, что позволяет проводить анализ и строить модели для предсказания риска инсульта."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>gender</th>\n",
" <th>age</th>\n",
" <th>hypertension</th>\n",
" <th>heart_disease</th>\n",
" <th>ever_married</th>\n",
" <th>work_type</th>\n",
" <th>Residence_type</th>\n",
" <th>avg_glucose_level</th>\n",
" <th>bmi</th>\n",
" <th>smoking_status</th>\n",
" <th>stroke</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>9046</td>\n",
" <td>Male</td>\n",
" <td>67.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Yes</td>\n",
" <td>Private</td>\n",
" <td>Urban</td>\n",
" <td>228.69</td>\n",
" <td>36.6</td>\n",
" <td>formerly smoked</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>51676</td>\n",
" <td>Female</td>\n",
" <td>61.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Self-employed</td>\n",
" <td>Rural</td>\n",
" <td>202.21</td>\n",
" <td>NaN</td>\n",
" <td>never smoked</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>31112</td>\n",
" <td>Male</td>\n",
" <td>80.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Yes</td>\n",
" <td>Private</td>\n",
" <td>Rural</td>\n",
" <td>105.92</td>\n",
" <td>32.5</td>\n",
" <td>never smoked</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>60182</td>\n",
" <td>Female</td>\n",
" <td>49.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Private</td>\n",
" <td>Urban</td>\n",
" <td>171.23</td>\n",
" <td>34.4</td>\n",
" <td>smokes</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1665</td>\n",
" <td>Female</td>\n",
" <td>79.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Self-employed</td>\n",
" <td>Rural</td>\n",
" <td>174.12</td>\n",
" <td>24.0</td>\n",
" <td>never smoked</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5105</th>\n",
" <td>18234</td>\n",
" <td>Female</td>\n",
" <td>80.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Private</td>\n",
" <td>Urban</td>\n",
" <td>83.75</td>\n",
" <td>NaN</td>\n",
" <td>never smoked</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5106</th>\n",
" <td>44873</td>\n",
" <td>Female</td>\n",
" <td>81.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Self-employed</td>\n",
" <td>Urban</td>\n",
" <td>125.20</td>\n",
" <td>40.0</td>\n",
" <td>never smoked</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5107</th>\n",
" <td>19723</td>\n",
" <td>Female</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Self-employed</td>\n",
" <td>Rural</td>\n",
" <td>82.99</td>\n",
" <td>30.6</td>\n",
" <td>never smoked</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5108</th>\n",
" <td>37544</td>\n",
" <td>Male</td>\n",
" <td>51.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Private</td>\n",
" <td>Rural</td>\n",
" <td>166.29</td>\n",
" <td>25.6</td>\n",
" <td>formerly smoked</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5109</th>\n",
" <td>44679</td>\n",
" <td>Female</td>\n",
" <td>44.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Govt_job</td>\n",
" <td>Urban</td>\n",
" <td>85.28</td>\n",
" <td>26.2</td>\n",
" <td>Unknown</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5110 rows × 12 columns</p>\n",
"</div>"
],
"text/plain": [
" id gender age hypertension heart_disease ever_married \\\n",
"0 9046 Male 67.0 0 1 Yes \n",
"1 51676 Female 61.0 0 0 Yes \n",
"2 31112 Male 80.0 0 1 Yes \n",
"3 60182 Female 49.0 0 0 Yes \n",
"4 1665 Female 79.0 1 0 Yes \n",
"... ... ... ... ... ... ... \n",
"5105 18234 Female 80.0 1 0 Yes \n",
"5106 44873 Female 81.0 0 0 Yes \n",
"5107 19723 Female 35.0 0 0 Yes \n",
"5108 37544 Male 51.0 0 0 Yes \n",
"5109 44679 Female 44.0 0 0 Yes \n",
"\n",
" work_type Residence_type avg_glucose_level bmi smoking_status \\\n",
"0 Private Urban 228.69 36.6 formerly smoked \n",
"1 Self-employed Rural 202.21 NaN never smoked \n",
"2 Private Rural 105.92 32.5 never smoked \n",
"3 Private Urban 171.23 34.4 smokes \n",
"4 Self-employed Rural 174.12 24.0 never smoked \n",
"... ... ... ... ... ... \n",
"5105 Private Urban 83.75 NaN never smoked \n",
"5106 Self-employed Urban 125.20 40.0 never smoked \n",
"5107 Self-employed Rural 82.99 30.6 never smoked \n",
"5108 Private Rural 166.29 25.6 formerly smoked \n",
"5109 Govt_job Urban 85.28 26.2 Unknown \n",
"\n",
" stroke \n",
"0 1 \n",
"1 1 \n",
"2 1 \n",
"3 1 \n",
"4 1 \n",
"... ... \n",
"5105 0 \n",
"5106 0 \n",
"5107 0 \n",
"5108 0 \n",
"5109 0 \n",
"\n",
"[5110 rows x 12 columns]"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"var4 = pd.read_csv(\"./datasets/var4/healthcare-dataset-stroke-data.csv\")\n",
"\n",
"var4"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"id int64\n",
"gender object\n",
"age float64\n",
"hypertension int64\n",
"heart_disease int64\n",
"ever_married object\n",
"work_type object\n",
"Residence_type object\n",
"avg_glucose_level float64\n",
"bmi float64\n",
"smoking_status object\n",
"stroke int64\n",
"dtype: object"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var4.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Продажи домов\n",
"Каждая строка в датасете содержит соответствующую информацию о доме, что позволяет проводить анализ и строить модели для предсказания его цены."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>date</th>\n",
" <th>price</th>\n",
" <th>bedrooms</th>\n",
" <th>bathrooms</th>\n",
" <th>sqft_living</th>\n",
" <th>sqft_lot</th>\n",
" <th>floors</th>\n",
" <th>waterfront</th>\n",
" <th>view</th>\n",
" <th>...</th>\n",
" <th>grade</th>\n",
" <th>sqft_above</th>\n",
" <th>sqft_basement</th>\n",
" <th>yr_built</th>\n",
" <th>yr_renovated</th>\n",
" <th>zipcode</th>\n",
" <th>lat</th>\n",
" <th>long</th>\n",
" <th>sqft_living15</th>\n",
" <th>sqft_lot15</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>7129300520</td>\n",
" <td>20141013T000000</td>\n",
" <td>221900.0</td>\n",
" <td>3</td>\n",
" <td>1.00</td>\n",
" <td>1180</td>\n",
" <td>5650</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>7</td>\n",
" <td>1180</td>\n",
" <td>0</td>\n",
" <td>1955</td>\n",
" <td>0</td>\n",
" <td>98178</td>\n",
" <td>47.5112</td>\n",
" <td>-122.257</td>\n",
" <td>1340</td>\n",
" <td>5650</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>6414100192</td>\n",
" <td>20141209T000000</td>\n",
" <td>538000.0</td>\n",
" <td>3</td>\n",
" <td>2.25</td>\n",
" <td>2570</td>\n",
" <td>7242</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>7</td>\n",
" <td>2170</td>\n",
" <td>400</td>\n",
" <td>1951</td>\n",
" <td>1991</td>\n",
" <td>98125</td>\n",
" <td>47.7210</td>\n",
" <td>-122.319</td>\n",
" <td>1690</td>\n",
" <td>7639</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>5631500400</td>\n",
" <td>20150225T000000</td>\n",
" <td>180000.0</td>\n",
" <td>2</td>\n",
" <td>1.00</td>\n",
" <td>770</td>\n",
" <td>10000</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>6</td>\n",
" <td>770</td>\n",
" <td>0</td>\n",
" <td>1933</td>\n",
" <td>0</td>\n",
" <td>98028</td>\n",
" <td>47.7379</td>\n",
" <td>-122.233</td>\n",
" <td>2720</td>\n",
" <td>8062</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2487200875</td>\n",
" <td>20141209T000000</td>\n",
" <td>604000.0</td>\n",
" <td>4</td>\n",
" <td>3.00</td>\n",
" <td>1960</td>\n",
" <td>5000</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>7</td>\n",
" <td>1050</td>\n",
" <td>910</td>\n",
" <td>1965</td>\n",
" <td>0</td>\n",
" <td>98136</td>\n",
" <td>47.5208</td>\n",
" <td>-122.393</td>\n",
" <td>1360</td>\n",
" <td>5000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1954400510</td>\n",
" <td>20150218T000000</td>\n",
" <td>510000.0</td>\n",
" <td>3</td>\n",
" <td>2.00</td>\n",
" <td>1680</td>\n",
" <td>8080</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>8</td>\n",
" <td>1680</td>\n",
" <td>0</td>\n",
" <td>1987</td>\n",
" <td>0</td>\n",
" <td>98074</td>\n",
" <td>47.6168</td>\n",
" <td>-122.045</td>\n",
" <td>1800</td>\n",
" <td>7503</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21608</th>\n",
" <td>263000018</td>\n",
" <td>20140521T000000</td>\n",
" <td>360000.0</td>\n",
" <td>3</td>\n",
" <td>2.50</td>\n",
" <td>1530</td>\n",
" <td>1131</td>\n",
" <td>3.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>8</td>\n",
" <td>1530</td>\n",
" <td>0</td>\n",
" <td>2009</td>\n",
" <td>0</td>\n",
" <td>98103</td>\n",
" <td>47.6993</td>\n",
" <td>-122.346</td>\n",
" <td>1530</td>\n",
" <td>1509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21609</th>\n",
" <td>6600060120</td>\n",
" <td>20150223T000000</td>\n",
" <td>400000.0</td>\n",
" <td>4</td>\n",
" <td>2.50</td>\n",
" <td>2310</td>\n",
" <td>5813</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>8</td>\n",
" <td>2310</td>\n",
" <td>0</td>\n",
" <td>2014</td>\n",
" <td>0</td>\n",
" <td>98146</td>\n",
" <td>47.5107</td>\n",
" <td>-122.362</td>\n",
" <td>1830</td>\n",
" <td>7200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21610</th>\n",
" <td>1523300141</td>\n",
" <td>20140623T000000</td>\n",
" <td>402101.0</td>\n",
" <td>2</td>\n",
" <td>0.75</td>\n",
" <td>1020</td>\n",
" <td>1350</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>7</td>\n",
" <td>1020</td>\n",
" <td>0</td>\n",
" <td>2009</td>\n",
" <td>0</td>\n",
" <td>98144</td>\n",
" <td>47.5944</td>\n",
" <td>-122.299</td>\n",
" <td>1020</td>\n",
" <td>2007</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21611</th>\n",
" <td>291310100</td>\n",
" <td>20150116T000000</td>\n",
" <td>400000.0</td>\n",
" <td>3</td>\n",
" <td>2.50</td>\n",
" <td>1600</td>\n",
" <td>2388</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>8</td>\n",
" <td>1600</td>\n",
" <td>0</td>\n",
" <td>2004</td>\n",
" <td>0</td>\n",
" <td>98027</td>\n",
" <td>47.5345</td>\n",
" <td>-122.069</td>\n",
" <td>1410</td>\n",
" <td>1287</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21612</th>\n",
" <td>1523300157</td>\n",
" <td>20141015T000000</td>\n",
" <td>325000.0</td>\n",
" <td>2</td>\n",
" <td>0.75</td>\n",
" <td>1020</td>\n",
" <td>1076</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>7</td>\n",
" <td>1020</td>\n",
" <td>0</td>\n",
" <td>2008</td>\n",
" <td>0</td>\n",
" <td>98144</td>\n",
" <td>47.5941</td>\n",
" <td>-122.299</td>\n",
" <td>1020</td>\n",
" <td>1357</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>21613 rows × 21 columns</p>\n",
"</div>"
],
"text/plain": [
" id date price bedrooms bathrooms \\\n",
"0 7129300520 20141013T000000 221900.0 3 1.00 \n",
"1 6414100192 20141209T000000 538000.0 3 2.25 \n",
"2 5631500400 20150225T000000 180000.0 2 1.00 \n",
"3 2487200875 20141209T000000 604000.0 4 3.00 \n",
"4 1954400510 20150218T000000 510000.0 3 2.00 \n",
"... ... ... ... ... ... \n",
"21608 263000018 20140521T000000 360000.0 3 2.50 \n",
"21609 6600060120 20150223T000000 400000.0 4 2.50 \n",
"21610 1523300141 20140623T000000 402101.0 2 0.75 \n",
"21611 291310100 20150116T000000 400000.0 3 2.50 \n",
"21612 1523300157 20141015T000000 325000.0 2 0.75 \n",
"\n",
" sqft_living sqft_lot floors waterfront view ... grade \\\n",
"0 1180 5650 1.0 0 0 ... 7 \n",
"1 2570 7242 2.0 0 0 ... 7 \n",
"2 770 10000 1.0 0 0 ... 6 \n",
"3 1960 5000 1.0 0 0 ... 7 \n",
"4 1680 8080 1.0 0 0 ... 8 \n",
"... ... ... ... ... ... ... ... \n",
"21608 1530 1131 3.0 0 0 ... 8 \n",
"21609 2310 5813 2.0 0 0 ... 8 \n",
"21610 1020 1350 2.0 0 0 ... 7 \n",
"21611 1600 2388 2.0 0 0 ... 8 \n",
"21612 1020 1076 2.0 0 0 ... 7 \n",
"\n",
" sqft_above sqft_basement yr_built yr_renovated zipcode lat \\\n",
"0 1180 0 1955 0 98178 47.5112 \n",
"1 2170 400 1951 1991 98125 47.7210 \n",
"2 770 0 1933 0 98028 47.7379 \n",
"3 1050 910 1965 0 98136 47.5208 \n",
"4 1680 0 1987 0 98074 47.6168 \n",
"... ... ... ... ... ... ... \n",
"21608 1530 0 2009 0 98103 47.6993 \n",
"21609 2310 0 2014 0 98146 47.5107 \n",
"21610 1020 0 2009 0 98144 47.5944 \n",
"21611 1600 0 2004 0 98027 47.5345 \n",
"21612 1020 0 2008 0 98144 47.5941 \n",
"\n",
" long sqft_living15 sqft_lot15 \n",
"0 -122.257 1340 5650 \n",
"1 -122.319 1690 7639 \n",
"2 -122.233 2720 8062 \n",
"3 -122.393 1360 5000 \n",
"4 -122.045 1800 7503 \n",
"... ... ... ... \n",
"21608 -122.346 1530 1509 \n",
"21609 -122.362 1830 7200 \n",
"21610 -122.299 1020 2007 \n",
"21611 -122.069 1410 1287 \n",
"21612 -122.299 1020 1357 \n",
"\n",
"[21613 rows x 21 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var6 = pd.read_csv(\"./datasets/var6/kc_house_data.csv\")\n",
"var6"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"id int64\n",
"date object\n",
"price float64\n",
"bedrooms int64\n",
"bathrooms float64\n",
"sqft_living int64\n",
"sqft_lot int64\n",
"floors float64\n",
"waterfront int64\n",
"view int64\n",
"condition int64\n",
"grade int64\n",
"sqft_above int64\n",
"sqft_basement int64\n",
"yr_built int64\n",
"yr_renovated int64\n",
"zipcode int64\n",
"lat float64\n",
"long float64\n",
"sqft_living15 int64\n",
"sqft_lot15 int64\n",
"dtype: object"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var6.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Цены на мобильные устройства\n",
"Каждая строка в датасете содержит соответствующую информацию о мобильном устройстве, что позволяет проводить анализ и строить модели для предсказания его цены."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>Name</th>\n",
" <th>Rating</th>\n",
" <th>Spec_score</th>\n",
" <th>No_of_sim</th>\n",
" <th>Ram</th>\n",
" <th>Battery</th>\n",
" <th>Display</th>\n",
" <th>Camera</th>\n",
" <th>External_Memory</th>\n",
" <th>Android_version</th>\n",
" <th>Price</th>\n",
" <th>company</th>\n",
" <th>Inbuilt_memory</th>\n",
" <th>fast_charging</th>\n",
" <th>Screen_resolution</th>\n",
" <th>Processor</th>\n",
" <th>Processor_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>Samsung Galaxy F14 5G</td>\n",
" <td>4.65</td>\n",
" <td>68</td>\n",
" <td>Dual Sim, 3G, 4G, 5G, VoLTE,</td>\n",
" <td>4 GB RAM</td>\n",
" <td>6000 mAh Battery</td>\n",
" <td>6.6 inches</td>\n",
" <td>50 MP + 2 MP Dual Rear &amp;amp; 13 MP Front Camera</td>\n",
" <td>Memory Card Supported, upto 1 TB</td>\n",
" <td>13</td>\n",
" <td>9,999</td>\n",
" <td>Samsung</td>\n",
" <td>128 GB inbuilt</td>\n",
" <td>25W Fast Charging</td>\n",
" <td>2408 x 1080 px Display with Water Drop Notch</td>\n",
" <td>Octa Core Processor</td>\n",
" <td>Exynos 1330</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>Samsung Galaxy A11</td>\n",
" <td>4.20</td>\n",
" <td>63</td>\n",
" <td>Dual Sim, 3G, 4G, VoLTE,</td>\n",
" <td>2 GB RAM</td>\n",
" <td>4000 mAh Battery</td>\n",
" <td>6.4 inches</td>\n",
" <td>13 MP + 5 MP + 2 MP Triple Rear &amp;amp; 8 MP Fro...</td>\n",
" <td>Memory Card Supported, upto 512 GB</td>\n",
" <td>10</td>\n",
" <td>9,990</td>\n",
" <td>Samsung</td>\n",
" <td>32 GB inbuilt</td>\n",
" <td>15W Fast Charging</td>\n",
" <td>720 x 1560 px Display with Punch Hole</td>\n",
" <td>1.8 GHz Processor</td>\n",
" <td>Octa Core</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>Samsung Galaxy A13</td>\n",
" <td>4.30</td>\n",
" <td>75</td>\n",
" <td>Dual Sim, 3G, 4G, VoLTE,</td>\n",
" <td>4 GB RAM</td>\n",
" <td>5000 mAh Battery</td>\n",
" <td>6.6 inches</td>\n",
" <td>50 MP Quad Rear &amp;amp; 8 MP Front Camera</td>\n",
" <td>Memory Card Supported, upto 1 TB</td>\n",
" <td>12</td>\n",
" <td>11,999</td>\n",
" <td>Samsung</td>\n",
" <td>64 GB inbuilt</td>\n",
" <td>25W Fast Charging</td>\n",
" <td>1080 x 2408 px Display with Water Drop Notch</td>\n",
" <td>2 GHz Processor</td>\n",
" <td>Octa Core</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>Samsung Galaxy F23</td>\n",
" <td>4.10</td>\n",
" <td>73</td>\n",
" <td>Dual Sim, 3G, 4G, VoLTE,</td>\n",
" <td>4 GB RAM</td>\n",
" <td>6000 mAh Battery</td>\n",
" <td>6.4 inches</td>\n",
" <td>48 MP Quad Rear &amp;amp; 13 MP Front Camera</td>\n",
" <td>Memory Card Supported, upto 1 TB</td>\n",
" <td>12</td>\n",
" <td>11,999</td>\n",
" <td>Samsung</td>\n",
" <td>64 GB inbuilt</td>\n",
" <td>NaN</td>\n",
" <td>720 x 1600 px</td>\n",
" <td>Octa Core</td>\n",
" <td>Helio G88</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>Samsung Galaxy A03s (4GB RAM + 64GB)</td>\n",
" <td>4.10</td>\n",
" <td>69</td>\n",
" <td>Dual Sim, 3G, 4G, VoLTE,</td>\n",
" <td>4 GB RAM</td>\n",
" <td>5000 mAh Battery</td>\n",
" <td>6.5 inches</td>\n",
" <td>13 MP + 2 MP + 2 MP Triple Rear &amp;amp; 5 MP Fro...</td>\n",
" <td>Memory Card Supported, upto 1 TB</td>\n",
" <td>11</td>\n",
" <td>11,999</td>\n",
" <td>Samsung</td>\n",
" <td>64 GB inbuilt</td>\n",
" <td>15W Fast Charging</td>\n",
" <td>720 x 1600 px Display with Water Drop Notch</td>\n",
" <td>Octa Core</td>\n",
" <td>Helio P35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1365</th>\n",
" <td>1365</td>\n",
" <td>TCL 40R</td>\n",
" <td>4.05</td>\n",
" <td>75</td>\n",
" <td>Dual Sim, 3G, 4G, 5G, VoLTE,</td>\n",
" <td>4 GB RAM</td>\n",
" <td>5000 mAh Battery</td>\n",
" <td>6.6 inches</td>\n",
" <td>50 MP + 2 MP + 2 MP Triple Rear &amp;amp; 8 MP Fro...</td>\n",
" <td>Memory Card (Hybrid)</td>\n",
" <td>12</td>\n",
" <td>18,999</td>\n",
" <td>TCL</td>\n",
" <td>64 GB inbuilt</td>\n",
" <td>15W Fast Charging</td>\n",
" <td>720 x 1612 px</td>\n",
" <td>Octa Core</td>\n",
" <td>Dimensity 700 5G</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1366</th>\n",
" <td>1366</td>\n",
" <td>TCL 50 XL NxtPaper 5G</td>\n",
" <td>4.10</td>\n",
" <td>80</td>\n",
" <td>Dual Sim, 3G, 4G, VoLTE,</td>\n",
" <td>8 GB RAM</td>\n",
" <td>5000 mAh Battery</td>\n",
" <td>6.8 inches</td>\n",
" <td>50 MP + 2 MP Dual Rear &amp;amp; 16 MP Front Camera</td>\n",
" <td>Memory Card (Hybrid)</td>\n",
" <td>14</td>\n",
" <td>24,990</td>\n",
" <td>TCL</td>\n",
" <td>128 GB inbuilt</td>\n",
" <td>33W Fast Charging</td>\n",
" <td>1200 x 2400 px</td>\n",
" <td>Octa Core</td>\n",
" <td>Dimensity 7050</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1367</th>\n",
" <td>1367</td>\n",
" <td>TCL 50 XE NxtPaper 5G</td>\n",
" <td>4.00</td>\n",
" <td>80</td>\n",
" <td>Dual Sim, 3G, 4G, 5G, VoLTE,</td>\n",
" <td>6 GB RAM</td>\n",
" <td>5000 mAh Battery</td>\n",
" <td>6.6 inches</td>\n",
" <td>50 MP + 2 MP Dual Rear &amp;amp; 16 MP Front Camera</td>\n",
" <td>Memory Card Supported, upto 1 TB</td>\n",
" <td>13</td>\n",
" <td>23,990</td>\n",
" <td>TCL</td>\n",
" <td>256 GB inbuilt</td>\n",
" <td>18W Fast Charging</td>\n",
" <td>720 x 1612 px</td>\n",
" <td>Octa Core</td>\n",
" <td>Dimensity 6080</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1368</th>\n",
" <td>1368</td>\n",
" <td>TCL 40 NxtPaper 5G</td>\n",
" <td>4.50</td>\n",
" <td>79</td>\n",
" <td>Dual Sim, 3G, 4G, 5G, VoLTE,</td>\n",
" <td>6 GB RAM</td>\n",
" <td>5000 mAh Battery</td>\n",
" <td>6.6 inches</td>\n",
" <td>50 MP + 2 MP + 2 MP Triple Rear &amp;amp; 8 MP Fro...</td>\n",
" <td>Memory Card Supported, upto 1 TB</td>\n",
" <td>13</td>\n",
" <td>22,499</td>\n",
" <td>TCL</td>\n",
" <td>256 GB inbuilt</td>\n",
" <td>15W Fast Charging</td>\n",
" <td>720 x 1612 px</td>\n",
" <td>Octa Core</td>\n",
" <td>Dimensity 6020</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1369</th>\n",
" <td>1369</td>\n",
" <td>TCL Trifold</td>\n",
" <td>4.65</td>\n",
" <td>93</td>\n",
" <td>Dual Sim, 3G, 4G, 5G, VoLTE, Vo5G,</td>\n",
" <td>12 GB RAM</td>\n",
" <td>4600 mAh Battery</td>\n",
" <td>10 inches</td>\n",
" <td>Foldable Display, Dual Display</td>\n",
" <td>50 MP + 48 MP + 8 MP Triple Rear &amp;amp; 32 MP F...</td>\n",
" <td>13</td>\n",
" <td>1,19,990</td>\n",
" <td>TCL</td>\n",
" <td>256 GB inbuilt</td>\n",
" <td>67W Fast Charging</td>\n",
" <td>1916 x 2160 px</td>\n",
" <td>Octa Core</td>\n",
" <td>Snapdragon 8 Gen2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1370 rows × 18 columns</p>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 Name Rating Spec_score \\\n",
"0 0 Samsung Galaxy F14 5G 4.65 68 \n",
"1 1 Samsung Galaxy A11 4.20 63 \n",
"2 2 Samsung Galaxy A13 4.30 75 \n",
"3 3 Samsung Galaxy F23 4.10 73 \n",
"4 4 Samsung Galaxy A03s (4GB RAM + 64GB) 4.10 69 \n",
"... ... ... ... ... \n",
"1365 1365 TCL 40R 4.05 75 \n",
"1366 1366 TCL 50 XL NxtPaper 5G 4.10 80 \n",
"1367 1367 TCL 50 XE NxtPaper 5G 4.00 80 \n",
"1368 1368 TCL 40 NxtPaper 5G 4.50 79 \n",
"1369 1369 TCL Trifold 4.65 93 \n",
"\n",
" No_of_sim Ram Battery \\\n",
"0 Dual Sim, 3G, 4G, 5G, VoLTE, 4 GB RAM 6000 mAh Battery \n",
"1 Dual Sim, 3G, 4G, VoLTE, 2 GB RAM 4000 mAh Battery \n",
"2 Dual Sim, 3G, 4G, VoLTE, 4 GB RAM 5000 mAh Battery \n",
"3 Dual Sim, 3G, 4G, VoLTE, 4 GB RAM 6000 mAh Battery \n",
"4 Dual Sim, 3G, 4G, VoLTE, 4 GB RAM 5000 mAh Battery \n",
"... ... ... ... \n",
"1365 Dual Sim, 3G, 4G, 5G, VoLTE, 4 GB RAM 5000 mAh Battery \n",
"1366 Dual Sim, 3G, 4G, VoLTE, 8 GB RAM 5000 mAh Battery \n",
"1367 Dual Sim, 3G, 4G, 5G, VoLTE, 6 GB RAM 5000 mAh Battery \n",
"1368 Dual Sim, 3G, 4G, 5G, VoLTE, 6 GB RAM 5000 mAh Battery \n",
"1369 Dual Sim, 3G, 4G, 5G, VoLTE, Vo5G, 12 GB RAM 4600 mAh Battery \n",
"\n",
" Display Camera \\\n",
"0 6.6 inches 50 MP + 2 MP Dual Rear &amp; 13 MP Front Camera \n",
"1 6.4 inches 13 MP + 5 MP + 2 MP Triple Rear &amp; 8 MP Fro... \n",
"2 6.6 inches 50 MP Quad Rear &amp; 8 MP Front Camera \n",
"3 6.4 inches 48 MP Quad Rear &amp; 13 MP Front Camera \n",
"4 6.5 inches 13 MP + 2 MP + 2 MP Triple Rear &amp; 5 MP Fro... \n",
"... ... ... \n",
"1365 6.6 inches 50 MP + 2 MP + 2 MP Triple Rear &amp; 8 MP Fro... \n",
"1366 6.8 inches 50 MP + 2 MP Dual Rear &amp; 16 MP Front Camera \n",
"1367 6.6 inches 50 MP + 2 MP Dual Rear &amp; 16 MP Front Camera \n",
"1368 6.6 inches 50 MP + 2 MP + 2 MP Triple Rear &amp; 8 MP Fro... \n",
"1369 10 inches Foldable Display, Dual Display \n",
"\n",
" External_Memory Android_version \\\n",
"0 Memory Card Supported, upto 1 TB 13 \n",
"1 Memory Card Supported, upto 512 GB 10 \n",
"2 Memory Card Supported, upto 1 TB 12 \n",
"3 Memory Card Supported, upto 1 TB 12 \n",
"4 Memory Card Supported, upto 1 TB 11 \n",
"... ... ... \n",
"1365 Memory Card (Hybrid) 12 \n",
"1366 Memory Card (Hybrid) 14 \n",
"1367 Memory Card Supported, upto 1 TB 13 \n",
"1368 Memory Card Supported, upto 1 TB 13 \n",
"1369 50 MP + 48 MP + 8 MP Triple Rear &amp; 32 MP F... 13 \n",
"\n",
" Price company Inbuilt_memory fast_charging \\\n",
"0 9,999 Samsung 128 GB inbuilt 25W Fast Charging \n",
"1 9,990 Samsung 32 GB inbuilt 15W Fast Charging \n",
"2 11,999 Samsung 64 GB inbuilt 25W Fast Charging \n",
"3 11,999 Samsung 64 GB inbuilt NaN \n",
"4 11,999 Samsung 64 GB inbuilt 15W Fast Charging \n",
"... ... ... ... ... \n",
"1365 18,999 TCL 64 GB inbuilt 15W Fast Charging \n",
"1366 24,990 TCL 128 GB inbuilt 33W Fast Charging \n",
"1367 23,990 TCL 256 GB inbuilt 18W Fast Charging \n",
"1368 22,499 TCL 256 GB inbuilt 15W Fast Charging \n",
"1369 1,19,990 TCL 256 GB inbuilt 67W Fast Charging \n",
"\n",
" Screen_resolution Processor \\\n",
"0 2408 x 1080 px Display with Water Drop Notch Octa Core Processor \n",
"1 720 x 1560 px Display with Punch Hole 1.8 GHz Processor \n",
"2 1080 x 2408 px Display with Water Drop Notch 2 GHz Processor \n",
"3 720 x 1600 px Octa Core \n",
"4 720 x 1600 px Display with Water Drop Notch Octa Core \n",
"... ... ... \n",
"1365 720 x 1612 px Octa Core \n",
"1366 1200 x 2400 px Octa Core \n",
"1367 720 x 1612 px Octa Core \n",
"1368 720 x 1612 px Octa Core \n",
"1369 1916 x 2160 px Octa Core \n",
"\n",
" Processor_name \n",
"0 Exynos 1330 \n",
"1 Octa Core \n",
"2 Octa Core \n",
"3 Helio G88 \n",
"4 Helio P35 \n",
"... ... \n",
"1365 Dimensity 700 5G \n",
"1366 Dimensity 7050 \n",
"1367 Dimensity 6080 \n",
"1368 Dimensity 6020 \n",
"1369 Snapdragon 8 Gen2 \n",
"\n",
"[1370 rows x 18 columns]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var18 = pd.read_csv(\"./datasets/var18/mobile_phone_price_prediction.csv\")\n",
"var18"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Unnamed: 0 int64\n",
"Name object\n",
"Rating float64\n",
"Spec_score int64\n",
"No_of_sim object\n",
"Ram object\n",
"Battery object\n",
"Display object\n",
"Camera object\n",
"External_Memory object\n",
"Android_version object\n",
"Price object\n",
"company object\n",
"Inbuilt_memory object\n",
"fast_charging object\n",
"Screen_resolution object\n",
"Processor object\n",
"Processor_name object\n",
"dtype: object"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var18.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Провести анализ содержимого каждого набора данных. Что является объектом/объектами наблюдения? Каковы атрибуты объектов? Есть ли связи между объектами?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}