MAI_ISE-31_Andrikhov-A-S/lab1.ipynb
2024-10-19 13:14:28 +04:00

470 KiB
Raw Blame History

  1. Основные возможности работы с библиотекой pandas
In [27]:
import pandas as pd

Загрузка и сохранение данных

In [28]:
df = pd.read_csv("./datasets/var2/2022/heart_2022_no_nans.csv")
In [58]:
df.tail(10)
Out[58]:
State Sex GeneralHealth PhysicalHealthDays MentalHealthDays LastCheckupTime PhysicalActivities SleepHours RemovedTeeth HadHeartAttack ... WeightInKilograms BMI AlcoholDrinkers HIVTesting FluVaxLast12 PneumoVaxEver TetanusLast10Tdap HighRiskLastYear CovidPos SleepHours-HeightInMeters
246012 Virgin Islands Male Fair 7.0 30.0 Within past year (anytime less than 12 months ... No 4.0 None of them Yes ... 117.93 33.38 Yes Yes No No No, did not receive any tetanus shot in the pa... No Yes 2.12
246013 Virgin Islands Male Excellent 0.0 7.0 Within past year (anytime less than 12 months ... No 4.0 None of them No ... 49.90 18.30 Yes No No No No, did not receive any tetanus shot in the pa... No No 2.35
246014 Virgin Islands Female Good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 12.0 1 to 5 No ... 52.16 19.14 No No No Yes Yes, received Tdap No No 10.35
246015 Virgin Islands Female Very good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 7.0 1 to 5 No ... 77.11 28.29 Yes Yes No No No, did not receive any tetanus shot in the pa... No No 5.35
246016 Virgin Islands Male Good 0.0 0.0 Within past year (anytime less than 12 months ... No 6.0 1 to 5 Yes ... 118.84 36.54 Yes Yes Yes No Yes, received tetanus shot but not sure what type No No 4.20
246017 Virgin Islands Male Very good 0.0 0.0 Within past 2 years (1 year but less than 2 ye... Yes 6.0 None of them No ... 102.06 32.28 Yes No No No Yes, received tetanus shot but not sure what type No No 4.22
246018 Virgin Islands Female Fair 0.0 7.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 90.72 24.34 No No No No No, did not receive any tetanus shot in the pa... No Yes 5.07
246019 Virgin Islands Male Good 0.0 15.0 Within past year (anytime less than 12 months ... Yes 7.0 1 to 5 No ... 83.91 29.86 Yes Yes Yes Yes Yes, received tetanus shot but not sure what type No Yes 5.32
246020 Virgin Islands Female Excellent 2.0 2.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 83.01 28.66 No Yes Yes No Yes, received tetanus shot but not sure what type No No 5.30
246021 Virgin Islands Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... No 5.0 None of them Yes ... 108.86 32.55 No Yes Yes Yes No, did not receive any tetanus shot in the pa... No Yes 3.17

10 rows × 41 columns

In [57]:
df.head(10)
Out[57]:
State Sex GeneralHealth PhysicalHealthDays MentalHealthDays LastCheckupTime PhysicalActivities SleepHours RemovedTeeth HadHeartAttack ... WeightInKilograms BMI AlcoholDrinkers HIVTesting FluVaxLast12 PneumoVaxEver TetanusLast10Tdap HighRiskLastYear CovidPos SleepHours-HeightInMeters
0 Alabama Female Very good 4.0 0.0 Within past year (anytime less than 12 months ... Yes 9.0 None of them No ... 71.67 27.99 No No Yes Yes Yes, received Tdap No No 7.40
1 Alabama Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 6.0 None of them No ... 95.25 30.13 No No Yes Yes Yes, received tetanus shot but not sure what type No No 4.22
2 Alabama Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... No 8.0 6 or more, but not all No ... 108.86 31.66 Yes No No Yes No, did not receive any tetanus shot in the pa... No Yes 6.15
3 Alabama Female Fair 5.0 0.0 Within past year (anytime less than 12 months ... Yes 9.0 None of them No ... 90.72 31.32 No No Yes Yes No, did not receive any tetanus shot in the pa... No Yes 7.30
4 Alabama Female Good 3.0 15.0 Within past year (anytime less than 12 months ... Yes 5.0 1 to 5 No ... 79.38 33.07 No No Yes Yes No, did not receive any tetanus shot in the pa... No No 3.45
5 Alabama Male Good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 120.20 34.96 Yes Yes Yes No Yes, received tetanus shot but not sure what type No No 5.15
6 Alabama Female Good 3.0 0.0 Within past year (anytime less than 12 months ... Yes 8.0 6 or more, but not all No ... 88.00 33.30 No No Yes Yes No, did not receive any tetanus shot in the pa... No No 6.37
7 Alabama Male Fair 5.0 0.0 Within past year (anytime less than 12 months ... Yes 8.0 1 to 5 Yes ... 74.84 24.37 No Yes Yes Yes No, did not receive any tetanus shot in the pa... No Yes 6.25
8 Alabama Male Good 2.0 0.0 5 or more years ago No 6.0 None of them No ... 78.02 26.94 No No No No No, did not receive any tetanus shot in the pa... No Yes 4.30
9 Alabama Female Very good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 63.50 22.60 No No Yes Yes No, did not receive any tetanus shot in the pa... No No 5.32

10 rows × 41 columns

In [30]:
df.to_csv("new.csv", index=False)

Получение сведений о датафрейме с данными¶

In [31]:
df.describe()
Out[31]:
PhysicalHealthDays MentalHealthDays SleepHours HeightInMeters WeightInKilograms BMI
count 246022.000000 246022.000000 246022.000000 246022.000000 246022.000000 246022.000000
mean 4.119026 4.167140 7.021331 1.705150 83.615179 28.668136
std 8.405844 8.102687 1.440681 0.106654 21.323156 6.513973
min 0.000000 0.000000 1.000000 0.910000 28.120000 12.020000
25% 0.000000 0.000000 6.000000 1.630000 68.040000 24.270000
50% 0.000000 0.000000 7.000000 1.700000 81.650000 27.460000
75% 3.000000 4.000000 8.000000 1.780000 95.250000 31.890000
max 30.000000 30.000000 24.000000 2.410000 292.570000 97.650000
In [32]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 246022 entries, 0 to 246021
Data columns (total 40 columns):
 #   Column                     Non-Null Count   Dtype  
---  ------                     --------------   -----  
 0   State                      246022 non-null  object 
 1   Sex                        246022 non-null  object 
 2   GeneralHealth              246022 non-null  object 
 3   PhysicalHealthDays         246022 non-null  float64
 4   MentalHealthDays           246022 non-null  float64
 5   LastCheckupTime            246022 non-null  object 
 6   PhysicalActivities         246022 non-null  object 
 7   SleepHours                 246022 non-null  float64
 8   RemovedTeeth               246022 non-null  object 
 9   HadHeartAttack             246022 non-null  object 
 10  HadAngina                  246022 non-null  object 
 11  HadStroke                  246022 non-null  object 
 12  HadAsthma                  246022 non-null  object 
 13  HadSkinCancer              246022 non-null  object 
 14  HadCOPD                    246022 non-null  object 
 15  HadDepressiveDisorder      246022 non-null  object 
 16  HadKidneyDisease           246022 non-null  object 
 17  HadArthritis               246022 non-null  object 
 18  HadDiabetes                246022 non-null  object 
 19  DeafOrHardOfHearing        246022 non-null  object 
 20  BlindOrVisionDifficulty    246022 non-null  object 
 21  DifficultyConcentrating    246022 non-null  object 
 22  DifficultyWalking          246022 non-null  object 
 23  DifficultyDressingBathing  246022 non-null  object 
 24  DifficultyErrands          246022 non-null  object 
 25  SmokerStatus               246022 non-null  object 
 26  ECigaretteUsage            246022 non-null  object 
 27  ChestScan                  246022 non-null  object 
 28  RaceEthnicityCategory      246022 non-null  object 
 29  AgeCategory                246022 non-null  object 
 30  HeightInMeters             246022 non-null  float64
 31  WeightInKilograms          246022 non-null  float64
 32  BMI                        246022 non-null  float64
 33  AlcoholDrinkers            246022 non-null  object 
 34  HIVTesting                 246022 non-null  object 
 35  FluVaxLast12               246022 non-null  object 
 36  PneumoVaxEver              246022 non-null  object 
 37  TetanusLast10Tdap          246022 non-null  object 
 38  HighRiskLastYear           246022 non-null  object 
 39  CovidPos                   246022 non-null  object 
dtypes: float64(6), object(34)
memory usage: 75.1+ MB

Получение сведений о колонках датафрейма¶

In [33]:
df.columns
Out[33]:
Index(['State', 'Sex', 'GeneralHealth', 'PhysicalHealthDays',
       'MentalHealthDays', 'LastCheckupTime', 'PhysicalActivities',
       'SleepHours', 'RemovedTeeth', 'HadHeartAttack', 'HadAngina',
       'HadStroke', 'HadAsthma', 'HadSkinCancer', 'HadCOPD',
       'HadDepressiveDisorder', 'HadKidneyDisease', 'HadArthritis',
       'HadDiabetes', 'DeafOrHardOfHearing', 'BlindOrVisionDifficulty',
       'DifficultyConcentrating', 'DifficultyWalking',
       'DifficultyDressingBathing', 'DifficultyErrands', 'SmokerStatus',
       'ECigaretteUsage', 'ChestScan', 'RaceEthnicityCategory', 'AgeCategory',
       'HeightInMeters', 'WeightInKilograms', 'BMI', 'AlcoholDrinkers',
       'HIVTesting', 'FluVaxLast12', 'PneumoVaxEver', 'TetanusLast10Tdap',
       'HighRiskLastYear', 'CovidPos'],
      dtype='object')

Вывод отельных строки и столбцов из датафрейма

In [34]:
df[["Sex", "HadHeartAttack", "WeightInKilograms"]]
Out[34]:
Sex HadHeartAttack WeightInKilograms
0 Female No 71.67
1 Male No 95.25
2 Male No 108.86
3 Female No 90.72
4 Female No 79.38
... ... ... ...
246017 Male No 102.06
246018 Female No 90.72
246019 Male No 83.91
246020 Female No 83.01
246021 Male Yes 108.86

246022 rows × 3 columns

In [35]:
df.iloc[3:6]
Out[35]:
State Sex GeneralHealth PhysicalHealthDays MentalHealthDays LastCheckupTime PhysicalActivities SleepHours RemovedTeeth HadHeartAttack ... HeightInMeters WeightInKilograms BMI AlcoholDrinkers HIVTesting FluVaxLast12 PneumoVaxEver TetanusLast10Tdap HighRiskLastYear CovidPos
3 Alabama Female Fair 5.0 0.0 Within past year (anytime less than 12 months ... Yes 9.0 None of them No ... 1.70 90.72 31.32 No No Yes Yes No, did not receive any tetanus shot in the pa... No Yes
4 Alabama Female Good 3.0 15.0 Within past year (anytime less than 12 months ... Yes 5.0 1 to 5 No ... 1.55 79.38 33.07 No No Yes Yes No, did not receive any tetanus shot in the pa... No No
5 Alabama Male Good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 1.85 120.20 34.96 Yes Yes Yes No Yes, received tetanus shot but not sure what type No No

3 rows × 40 columns

In [36]:
df[df['WeightInKilograms'] > 100]
Out[36]:
State Sex GeneralHealth PhysicalHealthDays MentalHealthDays LastCheckupTime PhysicalActivities SleepHours RemovedTeeth HadHeartAttack ... HeightInMeters WeightInKilograms BMI AlcoholDrinkers HIVTesting FluVaxLast12 PneumoVaxEver TetanusLast10Tdap HighRiskLastYear CovidPos
2 Alabama Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... No 8.0 6 or more, but not all No ... 1.85 108.86 31.66 Yes No No Yes No, did not receive any tetanus shot in the pa... No Yes
5 Alabama Male Good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 1.85 120.20 34.96 Yes Yes Yes No Yes, received tetanus shot but not sure what type No No
10 Alabama Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 8.0 1 to 5 No ... 1.83 122.47 36.62 Yes No Yes Yes Yes, received Tdap No No
11 Alabama Female Good 3.0 4.0 Within past year (anytime less than 12 months ... Yes 5.0 None of them No ... 1.52 108.86 46.87 No No No No Yes, received tetanus shot, but not Tdap No Yes
12 Alabama Male Good 5.0 0.0 Within past year (anytime less than 12 months ... Yes 5.0 6 or more, but not all Yes ... 1.88 115.67 32.74 No No Yes Yes Yes, received tetanus shot but not sure what type No No
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
246002 Virgin Islands Male Good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 6.0 1 to 5 No ... 1.88 106.59 30.17 Yes No No No Yes, received tetanus shot but not sure what type No Tested positive using home test without a heal...
246012 Virgin Islands Male Fair 7.0 30.0 Within past year (anytime less than 12 months ... No 4.0 None of them Yes ... 1.88 117.93 33.38 Yes Yes No No No, did not receive any tetanus shot in the pa... No Yes
246016 Virgin Islands Male Good 0.0 0.0 Within past year (anytime less than 12 months ... No 6.0 1 to 5 Yes ... 1.80 118.84 36.54 Yes Yes Yes No Yes, received tetanus shot but not sure what type No No
246017 Virgin Islands Male Very good 0.0 0.0 Within past 2 years (1 year but less than 2 ye... Yes 6.0 None of them No ... 1.78 102.06 32.28 Yes No No No Yes, received tetanus shot but not sure what type No No
246021 Virgin Islands Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... No 5.0 None of them Yes ... 1.83 108.86 32.55 No Yes Yes Yes No, did not receive any tetanus shot in the pa... No Yes

44646 rows × 40 columns

Группировка и агрегация данных в датафрейме¶

In [37]:
group = df.groupby(['State'])['WeightInKilograms'].mean()
group.to_frame()
Out[37]:
WeightInKilograms
State
Alabama 85.225899
Alaska 83.937201
Arizona 82.626862
Arkansas 85.361796
California 81.334135
Colorado 80.805505
Connecticut 82.192881
Delaware 84.224436
District of Columbia 78.593038
Florida 83.155785
Georgia 84.332240
Guam 77.294261
Hawaii 76.419335
Idaho 84.648567
Illinois 83.459467
Indiana 85.703237
Iowa 86.970651
Kansas 85.864583
Kentucky 86.781960
Louisiana 85.162787
Maine 82.949232
Maryland 83.543344
Massachusetts 80.591010
Michigan 83.629868
Minnesota 84.954303
Mississippi 88.322797
Missouri 85.836119
Montana 84.231140
Nebraska 85.961696
Nevada 82.784771
New Hampshire 80.702764
New Jersey 81.270844
New Mexico 80.529087
New York 80.960180
North Carolina 83.730953
North Dakota 85.924972
Ohio 86.938279
Oklahoma 85.517429
Oregon 83.802043
Pennsylvania 83.831872
Puerto Rico 79.152187
Rhode Island 80.675832
South Carolina 84.046443
South Dakota 86.868195
Tennessee 86.237325
Texas 84.894035
Utah 83.888474
Vermont 80.557657
Virgin Islands 82.131440
Virginia 83.822634
Washington 83.077369
West Virginia 86.697505
Wisconsin 86.167571
Wyoming 83.844357

Сортировка данных в датафрейме

In [38]:
sorted_df = df.sort_values(by='WeightInKilograms', ascending = False)
sorted_df
Out[38]:
State Sex GeneralHealth PhysicalHealthDays MentalHealthDays LastCheckupTime PhysicalActivities SleepHours RemovedTeeth HadHeartAttack ... HeightInMeters WeightInKilograms BMI AlcoholDrinkers HIVTesting FluVaxLast12 PneumoVaxEver TetanusLast10Tdap HighRiskLastYear CovidPos
9060 Arizona Male Fair 15.0 15.0 Within past year (anytime less than 12 months ... No 8.0 None of them No ... 1.85 292.57 85.10 No No No No Yes, received Tdap No No
48969 Hawaii Male Poor 30.0 30.0 Within past year (anytime less than 12 months ... Yes 4.0 None of them No ... 1.93 276.24 74.13 No No No No No, did not receive any tetanus shot in the pa... No Yes
75697 Kentucky Male Very good 0.0 0.0 5 or more years ago No 7.0 None of them No ... 1.91 273.52 75.37 No No No No Yes, received tetanus shot but not sure what type No No
143147 New York Male Very good 3.0 1.0 Within past year (anytime less than 12 months ... Yes 8.0 None of them No ... 1.88 273.06 77.29 Yes No No No Yes, received tetanus shot but not sure what type No No
76244 Kentucky Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 1.83 272.16 81.37 No Yes No No Yes, received tetanus shot but not sure what type No Yes
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
203695 Vermont Female Poor 30.0 3.0 Within past 2 years (1 year but less than 2 ye... No 18.0 All No ... 1.60 30.84 12.05 No No No No No, did not receive any tetanus shot in the pa... No No
242632 Puerto Rico Female Fair 30.0 7.0 Within past year (anytime less than 12 months ... No 7.0 6 or more, but not all No ... 1.35 30.39 16.77 No No Yes Yes No, did not receive any tetanus shot in the pa... No No
11614 Arkansas Female Poor 30.0 30.0 Within past year (anytime less than 12 months ... No 8.0 All No ... 1.52 29.48 12.69 No No Yes Yes No, did not receive any tetanus shot in the pa... No Yes
127404 Nebraska Female Poor 30.0 0.0 Within past year (anytime less than 12 months ... Yes 6.0 None of them No ... 1.52 29.48 12.69 No No No Yes Yes, received tetanus shot but not sure what type No No
179326 South Carolina Female Very good 0.0 0.0 5 or more years ago No 8.0 None of them No ... 1.52 28.12 12.11 No No No No No, did not receive any tetanus shot in the pa... No No

246022 rows × 40 columns

Удаление строк/столбцов

In [39]:
df_dropped_columns = df.drop(columns=['AlcoholDrinkers', 'BMI'])  # Удаление столбцов 'AlcoholDrinkers' и 'BMI'
In [40]:
df_dropped_columns
Out[40]:
State Sex GeneralHealth PhysicalHealthDays MentalHealthDays LastCheckupTime PhysicalActivities SleepHours RemovedTeeth HadHeartAttack ... RaceEthnicityCategory AgeCategory HeightInMeters WeightInKilograms HIVTesting FluVaxLast12 PneumoVaxEver TetanusLast10Tdap HighRiskLastYear CovidPos
0 Alabama Female Very good 4.0 0.0 Within past year (anytime less than 12 months ... Yes 9.0 None of them No ... White only, Non-Hispanic Age 65 to 69 1.60 71.67 No Yes Yes Yes, received Tdap No No
1 Alabama Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 6.0 None of them No ... White only, Non-Hispanic Age 70 to 74 1.78 95.25 No Yes Yes Yes, received tetanus shot but not sure what type No No
2 Alabama Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... No 8.0 6 or more, but not all No ... White only, Non-Hispanic Age 75 to 79 1.85 108.86 No No Yes No, did not receive any tetanus shot in the pa... No Yes
3 Alabama Female Fair 5.0 0.0 Within past year (anytime less than 12 months ... Yes 9.0 None of them No ... White only, Non-Hispanic Age 80 or older 1.70 90.72 No Yes Yes No, did not receive any tetanus shot in the pa... No Yes
4 Alabama Female Good 3.0 15.0 Within past year (anytime less than 12 months ... Yes 5.0 1 to 5 No ... White only, Non-Hispanic Age 80 or older 1.55 79.38 No Yes Yes No, did not receive any tetanus shot in the pa... No No
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
246017 Virgin Islands Male Very good 0.0 0.0 Within past 2 years (1 year but less than 2 ye... Yes 6.0 None of them No ... White only, Non-Hispanic Age 60 to 64 1.78 102.06 No No No Yes, received tetanus shot but not sure what type No No
246018 Virgin Islands Female Fair 0.0 7.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... Black only, Non-Hispanic Age 25 to 29 1.93 90.72 No No No No, did not receive any tetanus shot in the pa... No Yes
246019 Virgin Islands Male Good 0.0 15.0 Within past year (anytime less than 12 months ... Yes 7.0 1 to 5 No ... Multiracial, Non-Hispanic Age 65 to 69 1.68 83.91 Yes Yes Yes Yes, received tetanus shot but not sure what type No Yes
246020 Virgin Islands Female Excellent 2.0 2.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... Black only, Non-Hispanic Age 50 to 54 1.70 83.01 Yes Yes No Yes, received tetanus shot but not sure what type No No
246021 Virgin Islands Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... No 5.0 None of them Yes ... Black only, Non-Hispanic Age 70 to 74 1.83 108.86 Yes Yes Yes No, did not receive any tetanus shot in the pa... No Yes

246022 rows × 38 columns

In [41]:
df_dropped_rows = df.drop([0, 1])  # Удаление строк с индексами 0 и 1
df_dropped_rows
Out[41]:
State Sex GeneralHealth PhysicalHealthDays MentalHealthDays LastCheckupTime PhysicalActivities SleepHours RemovedTeeth HadHeartAttack ... HeightInMeters WeightInKilograms BMI AlcoholDrinkers HIVTesting FluVaxLast12 PneumoVaxEver TetanusLast10Tdap HighRiskLastYear CovidPos
2 Alabama Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... No 8.0 6 or more, but not all No ... 1.85 108.86 31.66 Yes No No Yes No, did not receive any tetanus shot in the pa... No Yes
3 Alabama Female Fair 5.0 0.0 Within past year (anytime less than 12 months ... Yes 9.0 None of them No ... 1.70 90.72 31.32 No No Yes Yes No, did not receive any tetanus shot in the pa... No Yes
4 Alabama Female Good 3.0 15.0 Within past year (anytime less than 12 months ... Yes 5.0 1 to 5 No ... 1.55 79.38 33.07 No No Yes Yes No, did not receive any tetanus shot in the pa... No No
5 Alabama Male Good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 1.85 120.20 34.96 Yes Yes Yes No Yes, received tetanus shot but not sure what type No No
6 Alabama Female Good 3.0 0.0 Within past year (anytime less than 12 months ... Yes 8.0 6 or more, but not all No ... 1.63 88.00 33.30 No No Yes Yes No, did not receive any tetanus shot in the pa... No No
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
246017 Virgin Islands Male Very good 0.0 0.0 Within past 2 years (1 year but less than 2 ye... Yes 6.0 None of them No ... 1.78 102.06 32.28 Yes No No No Yes, received tetanus shot but not sure what type No No
246018 Virgin Islands Female Fair 0.0 7.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 1.93 90.72 24.34 No No No No No, did not receive any tetanus shot in the pa... No Yes
246019 Virgin Islands Male Good 0.0 15.0 Within past year (anytime less than 12 months ... Yes 7.0 1 to 5 No ... 1.68 83.91 29.86 Yes Yes Yes Yes Yes, received tetanus shot but not sure what type No Yes
246020 Virgin Islands Female Excellent 2.0 2.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 1.70 83.01 28.66 No Yes Yes No Yes, received tetanus shot but not sure what type No No
246021 Virgin Islands Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... No 5.0 None of them Yes ... 1.83 108.86 32.55 No Yes Yes Yes No, did not receive any tetanus shot in the pa... No Yes

246020 rows × 40 columns

Создание новых столбцов на основе данных из существующих столбцов датафрейма¶

In [42]:
df['SleepHours-HeightInMeters'] = df['SleepHours'] - df['HeightInMeters']
In [43]:
df
Out[43]:
State Sex GeneralHealth PhysicalHealthDays MentalHealthDays LastCheckupTime PhysicalActivities SleepHours RemovedTeeth HadHeartAttack ... WeightInKilograms BMI AlcoholDrinkers HIVTesting FluVaxLast12 PneumoVaxEver TetanusLast10Tdap HighRiskLastYear CovidPos SleepHours-HeightInMeters
0 Alabama Female Very good 4.0 0.0 Within past year (anytime less than 12 months ... Yes 9.0 None of them No ... 71.67 27.99 No No Yes Yes Yes, received Tdap No No 7.40
1 Alabama Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 6.0 None of them No ... 95.25 30.13 No No Yes Yes Yes, received tetanus shot but not sure what type No No 4.22
2 Alabama Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... No 8.0 6 or more, but not all No ... 108.86 31.66 Yes No No Yes No, did not receive any tetanus shot in the pa... No Yes 6.15
3 Alabama Female Fair 5.0 0.0 Within past year (anytime less than 12 months ... Yes 9.0 None of them No ... 90.72 31.32 No No Yes Yes No, did not receive any tetanus shot in the pa... No Yes 7.30
4 Alabama Female Good 3.0 15.0 Within past year (anytime less than 12 months ... Yes 5.0 1 to 5 No ... 79.38 33.07 No No Yes Yes No, did not receive any tetanus shot in the pa... No No 3.45
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
246017 Virgin Islands Male Very good 0.0 0.0 Within past 2 years (1 year but less than 2 ye... Yes 6.0 None of them No ... 102.06 32.28 Yes No No No Yes, received tetanus shot but not sure what type No No 4.22
246018 Virgin Islands Female Fair 0.0 7.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 90.72 24.34 No No No No No, did not receive any tetanus shot in the pa... No Yes 5.07
246019 Virgin Islands Male Good 0.0 15.0 Within past year (anytime less than 12 months ... Yes 7.0 1 to 5 No ... 83.91 29.86 Yes Yes Yes Yes Yes, received tetanus shot but not sure what type No Yes 5.32
246020 Virgin Islands Female Excellent 2.0 2.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 83.01 28.66 No Yes Yes No Yes, received tetanus shot but not sure what type No No 5.30
246021 Virgin Islands Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... No 5.0 None of them Yes ... 108.86 32.55 No Yes Yes Yes No, did not receive any tetanus shot in the pa... No Yes 3.17

246022 rows × 41 columns

Удаление строк с пустыми значениями

In [44]:
print(df.isna().sum())
State                        0
Sex                          0
GeneralHealth                0
PhysicalHealthDays           0
MentalHealthDays             0
LastCheckupTime              0
PhysicalActivities           0
SleepHours                   0
RemovedTeeth                 0
HadHeartAttack               0
HadAngina                    0
HadStroke                    0
HadAsthma                    0
HadSkinCancer                0
HadCOPD                      0
HadDepressiveDisorder        0
HadKidneyDisease             0
HadArthritis                 0
HadDiabetes                  0
DeafOrHardOfHearing          0
BlindOrVisionDifficulty      0
DifficultyConcentrating      0
DifficultyWalking            0
DifficultyDressingBathing    0
DifficultyErrands            0
SmokerStatus                 0
ECigaretteUsage              0
ChestScan                    0
RaceEthnicityCategory        0
AgeCategory                  0
HeightInMeters               0
WeightInKilograms            0
BMI                          0
AlcoholDrinkers              0
HIVTesting                   0
FluVaxLast12                 0
PneumoVaxEver                0
TetanusLast10Tdap            0
HighRiskLastYear             0
CovidPos                     0
SleepHours-HeightInMeters    0
dtype: int64
In [45]:
df.dropna() #Тк.пустых строк нет, мы ничего не удалили
Out[45]:
State Sex GeneralHealth PhysicalHealthDays MentalHealthDays LastCheckupTime PhysicalActivities SleepHours RemovedTeeth HadHeartAttack ... WeightInKilograms BMI AlcoholDrinkers HIVTesting FluVaxLast12 PneumoVaxEver TetanusLast10Tdap HighRiskLastYear CovidPos SleepHours-HeightInMeters
0 Alabama Female Very good 4.0 0.0 Within past year (anytime less than 12 months ... Yes 9.0 None of them No ... 71.67 27.99 No No Yes Yes Yes, received Tdap No No 7.40
1 Alabama Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... Yes 6.0 None of them No ... 95.25 30.13 No No Yes Yes Yes, received tetanus shot but not sure what type No No 4.22
2 Alabama Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... No 8.0 6 or more, but not all No ... 108.86 31.66 Yes No No Yes No, did not receive any tetanus shot in the pa... No Yes 6.15
3 Alabama Female Fair 5.0 0.0 Within past year (anytime less than 12 months ... Yes 9.0 None of them No ... 90.72 31.32 No No Yes Yes No, did not receive any tetanus shot in the pa... No Yes 7.30
4 Alabama Female Good 3.0 15.0 Within past year (anytime less than 12 months ... Yes 5.0 1 to 5 No ... 79.38 33.07 No No Yes Yes No, did not receive any tetanus shot in the pa... No No 3.45
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
246017 Virgin Islands Male Very good 0.0 0.0 Within past 2 years (1 year but less than 2 ye... Yes 6.0 None of them No ... 102.06 32.28 Yes No No No Yes, received tetanus shot but not sure what type No No 4.22
246018 Virgin Islands Female Fair 0.0 7.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 90.72 24.34 No No No No No, did not receive any tetanus shot in the pa... No Yes 5.07
246019 Virgin Islands Male Good 0.0 15.0 Within past year (anytime less than 12 months ... Yes 7.0 1 to 5 No ... 83.91 29.86 Yes Yes Yes Yes Yes, received tetanus shot but not sure what type No Yes 5.32
246020 Virgin Islands Female Excellent 2.0 2.0 Within past year (anytime less than 12 months ... Yes 7.0 None of them No ... 83.01 28.66 No Yes Yes No Yes, received tetanus shot but not sure what type No No 5.30
246021 Virgin Islands Male Very good 0.0 0.0 Within past year (anytime less than 12 months ... No 5.0 None of them Yes ... 108.86 32.55 No Yes Yes Yes No, did not receive any tetanus shot in the pa... No Yes 3.17

246022 rows × 41 columns

In [46]:
#df.fillna(df.mean(), inplace=True)
#df.fillna(df.median(), inplace=True)

Мы обрабатываем пустые значения для каждого столбца отдельно

Мы можем заполнить пропуски средним или медианой, если это числовой столбец

Мы заполняем средним, если в колонке нет выбросов

Если столбец категориальный, то мы можем заполнить пропуски модой (самым часто встречающимся значением)

Если пропусков мало, то их можно просто удалить.

  1. Возможности визуализации
In [47]:
import matplotlib.pyplot as plt
In [48]:
#Линейная диаграмма
plt.figure(figsize=(10, 5))
df['WeightInKilograms'].plot(title='Line Plot (WeightInKilograms)')
plt.show()
No description has been provided for this image
In [49]:
#Гистограмма
plt.figure(figsize=(8, 5))
df.plot.hist(column=["SleepHours"], bins=80)
plt.show()
<Figure size 800x500 with 0 Axes>
No description has been provided for this image
In [50]:
plt.figure(figsize=(8, 5))
df['AgeCategory'].value_counts().plot(kind='bar', title='Bar Plot (AgeCategory)')
plt.show()
No description has been provided for this image
In [53]:
plt.figure(figsize=(8, 5))
df["BMI"].plot(kind = "box", title='Ящик с усами')
plt.show()
No description has been provided for this image
In [ ]:
plt.figure(figsize=(8, 5))
df[['AgeCategory', 'BMI']].plot(kind='area', alpha=0.2, title='Area Plot (AgeCategory, BMI)')
plt.show()
<Figure size 800x500 with 0 Axes>
No description has been provided for this image
In [ ]:
df.plot.scatter(x="BMI", y="WeightInKilograms")
Out[ ]:
<Axes: xlabel='BMI', ylabel='WeightInKilograms'>
No description has been provided for this image
In [ ]:
plt.figure(figsize=(8, 5))
df['AgeCategory'].value_counts().plot(kind='pie', autopct='%1.1f%%', title='Pie Chart (AgeCategory)')
plt.show()
No description has been provided for this image