470 KiB
- Основные возможности работы с библиотекой pandas
import pandas as pd
Загрузка и сохранение данных
df = pd.read_csv("./datasets/var2/2022/heart_2022_no_nans.csv")
df.tail(10)
State | Sex | GeneralHealth | PhysicalHealthDays | MentalHealthDays | LastCheckupTime | PhysicalActivities | SleepHours | RemovedTeeth | HadHeartAttack | ... | WeightInKilograms | BMI | AlcoholDrinkers | HIVTesting | FluVaxLast12 | PneumoVaxEver | TetanusLast10Tdap | HighRiskLastYear | CovidPos | SleepHours-HeightInMeters | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
246012 | Virgin Islands | Male | Fair | 7.0 | 30.0 | Within past year (anytime less than 12 months ... | No | 4.0 | None of them | Yes | ... | 117.93 | 33.38 | Yes | Yes | No | No | No, did not receive any tetanus shot in the pa... | No | Yes | 2.12 |
246013 | Virgin Islands | Male | Excellent | 0.0 | 7.0 | Within past year (anytime less than 12 months ... | No | 4.0 | None of them | No | ... | 49.90 | 18.30 | Yes | No | No | No | No, did not receive any tetanus shot in the pa... | No | No | 2.35 |
246014 | Virgin Islands | Female | Good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 12.0 | 1 to 5 | No | ... | 52.16 | 19.14 | No | No | No | Yes | Yes, received Tdap | No | No | 10.35 |
246015 | Virgin Islands | Female | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | 1 to 5 | No | ... | 77.11 | 28.29 | Yes | Yes | No | No | No, did not receive any tetanus shot in the pa... | No | No | 5.35 |
246016 | Virgin Islands | Male | Good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 6.0 | 1 to 5 | Yes | ... | 118.84 | 36.54 | Yes | Yes | Yes | No | Yes, received tetanus shot but not sure what type | No | No | 4.20 |
246017 | Virgin Islands | Male | Very good | 0.0 | 0.0 | Within past 2 years (1 year but less than 2 ye... | Yes | 6.0 | None of them | No | ... | 102.06 | 32.28 | Yes | No | No | No | Yes, received tetanus shot but not sure what type | No | No | 4.22 |
246018 | Virgin Islands | Female | Fair | 0.0 | 7.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 90.72 | 24.34 | No | No | No | No | No, did not receive any tetanus shot in the pa... | No | Yes | 5.07 |
246019 | Virgin Islands | Male | Good | 0.0 | 15.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | 1 to 5 | No | ... | 83.91 | 29.86 | Yes | Yes | Yes | Yes | Yes, received tetanus shot but not sure what type | No | Yes | 5.32 |
246020 | Virgin Islands | Female | Excellent | 2.0 | 2.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 83.01 | 28.66 | No | Yes | Yes | No | Yes, received tetanus shot but not sure what type | No | No | 5.30 |
246021 | Virgin Islands | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 5.0 | None of them | Yes | ... | 108.86 | 32.55 | No | Yes | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes | 3.17 |
10 rows × 41 columns
df.head(10)
State | Sex | GeneralHealth | PhysicalHealthDays | MentalHealthDays | LastCheckupTime | PhysicalActivities | SleepHours | RemovedTeeth | HadHeartAttack | ... | WeightInKilograms | BMI | AlcoholDrinkers | HIVTesting | FluVaxLast12 | PneumoVaxEver | TetanusLast10Tdap | HighRiskLastYear | CovidPos | SleepHours-HeightInMeters | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Alabama | Female | Very good | 4.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 9.0 | None of them | No | ... | 71.67 | 27.99 | No | No | Yes | Yes | Yes, received Tdap | No | No | 7.40 |
1 | Alabama | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 6.0 | None of them | No | ... | 95.25 | 30.13 | No | No | Yes | Yes | Yes, received tetanus shot but not sure what type | No | No | 4.22 |
2 | Alabama | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 8.0 | 6 or more, but not all | No | ... | 108.86 | 31.66 | Yes | No | No | Yes | No, did not receive any tetanus shot in the pa... | No | Yes | 6.15 |
3 | Alabama | Female | Fair | 5.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 9.0 | None of them | No | ... | 90.72 | 31.32 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes | 7.30 |
4 | Alabama | Female | Good | 3.0 | 15.0 | Within past year (anytime less than 12 months ... | Yes | 5.0 | 1 to 5 | No | ... | 79.38 | 33.07 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | No | 3.45 |
5 | Alabama | Male | Good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 120.20 | 34.96 | Yes | Yes | Yes | No | Yes, received tetanus shot but not sure what type | No | No | 5.15 |
6 | Alabama | Female | Good | 3.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 8.0 | 6 or more, but not all | No | ... | 88.00 | 33.30 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | No | 6.37 |
7 | Alabama | Male | Fair | 5.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 8.0 | 1 to 5 | Yes | ... | 74.84 | 24.37 | No | Yes | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes | 6.25 |
8 | Alabama | Male | Good | 2.0 | 0.0 | 5 or more years ago | No | 6.0 | None of them | No | ... | 78.02 | 26.94 | No | No | No | No | No, did not receive any tetanus shot in the pa... | No | Yes | 4.30 |
9 | Alabama | Female | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 63.50 | 22.60 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | No | 5.32 |
10 rows × 41 columns
df.to_csv("new.csv", index=False)
Получение сведений о датафрейме с данными¶
df.describe()
PhysicalHealthDays | MentalHealthDays | SleepHours | HeightInMeters | WeightInKilograms | BMI | |
---|---|---|---|---|---|---|
count | 246022.000000 | 246022.000000 | 246022.000000 | 246022.000000 | 246022.000000 | 246022.000000 |
mean | 4.119026 | 4.167140 | 7.021331 | 1.705150 | 83.615179 | 28.668136 |
std | 8.405844 | 8.102687 | 1.440681 | 0.106654 | 21.323156 | 6.513973 |
min | 0.000000 | 0.000000 | 1.000000 | 0.910000 | 28.120000 | 12.020000 |
25% | 0.000000 | 0.000000 | 6.000000 | 1.630000 | 68.040000 | 24.270000 |
50% | 0.000000 | 0.000000 | 7.000000 | 1.700000 | 81.650000 | 27.460000 |
75% | 3.000000 | 4.000000 | 8.000000 | 1.780000 | 95.250000 | 31.890000 |
max | 30.000000 | 30.000000 | 24.000000 | 2.410000 | 292.570000 | 97.650000 |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 246022 entries, 0 to 246021 Data columns (total 40 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 State 246022 non-null object 1 Sex 246022 non-null object 2 GeneralHealth 246022 non-null object 3 PhysicalHealthDays 246022 non-null float64 4 MentalHealthDays 246022 non-null float64 5 LastCheckupTime 246022 non-null object 6 PhysicalActivities 246022 non-null object 7 SleepHours 246022 non-null float64 8 RemovedTeeth 246022 non-null object 9 HadHeartAttack 246022 non-null object 10 HadAngina 246022 non-null object 11 HadStroke 246022 non-null object 12 HadAsthma 246022 non-null object 13 HadSkinCancer 246022 non-null object 14 HadCOPD 246022 non-null object 15 HadDepressiveDisorder 246022 non-null object 16 HadKidneyDisease 246022 non-null object 17 HadArthritis 246022 non-null object 18 HadDiabetes 246022 non-null object 19 DeafOrHardOfHearing 246022 non-null object 20 BlindOrVisionDifficulty 246022 non-null object 21 DifficultyConcentrating 246022 non-null object 22 DifficultyWalking 246022 non-null object 23 DifficultyDressingBathing 246022 non-null object 24 DifficultyErrands 246022 non-null object 25 SmokerStatus 246022 non-null object 26 ECigaretteUsage 246022 non-null object 27 ChestScan 246022 non-null object 28 RaceEthnicityCategory 246022 non-null object 29 AgeCategory 246022 non-null object 30 HeightInMeters 246022 non-null float64 31 WeightInKilograms 246022 non-null float64 32 BMI 246022 non-null float64 33 AlcoholDrinkers 246022 non-null object 34 HIVTesting 246022 non-null object 35 FluVaxLast12 246022 non-null object 36 PneumoVaxEver 246022 non-null object 37 TetanusLast10Tdap 246022 non-null object 38 HighRiskLastYear 246022 non-null object 39 CovidPos 246022 non-null object dtypes: float64(6), object(34) memory usage: 75.1+ MB
Получение сведений о колонках датафрейма¶
df.columns
Index(['State', 'Sex', 'GeneralHealth', 'PhysicalHealthDays', 'MentalHealthDays', 'LastCheckupTime', 'PhysicalActivities', 'SleepHours', 'RemovedTeeth', 'HadHeartAttack', 'HadAngina', 'HadStroke', 'HadAsthma', 'HadSkinCancer', 'HadCOPD', 'HadDepressiveDisorder', 'HadKidneyDisease', 'HadArthritis', 'HadDiabetes', 'DeafOrHardOfHearing', 'BlindOrVisionDifficulty', 'DifficultyConcentrating', 'DifficultyWalking', 'DifficultyDressingBathing', 'DifficultyErrands', 'SmokerStatus', 'ECigaretteUsage', 'ChestScan', 'RaceEthnicityCategory', 'AgeCategory', 'HeightInMeters', 'WeightInKilograms', 'BMI', 'AlcoholDrinkers', 'HIVTesting', 'FluVaxLast12', 'PneumoVaxEver', 'TetanusLast10Tdap', 'HighRiskLastYear', 'CovidPos'], dtype='object')
Вывод отельных строки и столбцов из датафрейма
df[["Sex", "HadHeartAttack", "WeightInKilograms"]]
Sex | HadHeartAttack | WeightInKilograms | |
---|---|---|---|
0 | Female | No | 71.67 |
1 | Male | No | 95.25 |
2 | Male | No | 108.86 |
3 | Female | No | 90.72 |
4 | Female | No | 79.38 |
... | ... | ... | ... |
246017 | Male | No | 102.06 |
246018 | Female | No | 90.72 |
246019 | Male | No | 83.91 |
246020 | Female | No | 83.01 |
246021 | Male | Yes | 108.86 |
246022 rows × 3 columns
df.iloc[3:6]
State | Sex | GeneralHealth | PhysicalHealthDays | MentalHealthDays | LastCheckupTime | PhysicalActivities | SleepHours | RemovedTeeth | HadHeartAttack | ... | HeightInMeters | WeightInKilograms | BMI | AlcoholDrinkers | HIVTesting | FluVaxLast12 | PneumoVaxEver | TetanusLast10Tdap | HighRiskLastYear | CovidPos | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | Alabama | Female | Fair | 5.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 9.0 | None of them | No | ... | 1.70 | 90.72 | 31.32 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes |
4 | Alabama | Female | Good | 3.0 | 15.0 | Within past year (anytime less than 12 months ... | Yes | 5.0 | 1 to 5 | No | ... | 1.55 | 79.38 | 33.07 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | No |
5 | Alabama | Male | Good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 1.85 | 120.20 | 34.96 | Yes | Yes | Yes | No | Yes, received tetanus shot but not sure what type | No | No |
3 rows × 40 columns
df[df['WeightInKilograms'] > 100]
State | Sex | GeneralHealth | PhysicalHealthDays | MentalHealthDays | LastCheckupTime | PhysicalActivities | SleepHours | RemovedTeeth | HadHeartAttack | ... | HeightInMeters | WeightInKilograms | BMI | AlcoholDrinkers | HIVTesting | FluVaxLast12 | PneumoVaxEver | TetanusLast10Tdap | HighRiskLastYear | CovidPos | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Alabama | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 8.0 | 6 or more, but not all | No | ... | 1.85 | 108.86 | 31.66 | Yes | No | No | Yes | No, did not receive any tetanus shot in the pa... | No | Yes |
5 | Alabama | Male | Good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 1.85 | 120.20 | 34.96 | Yes | Yes | Yes | No | Yes, received tetanus shot but not sure what type | No | No |
10 | Alabama | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 8.0 | 1 to 5 | No | ... | 1.83 | 122.47 | 36.62 | Yes | No | Yes | Yes | Yes, received Tdap | No | No |
11 | Alabama | Female | Good | 3.0 | 4.0 | Within past year (anytime less than 12 months ... | Yes | 5.0 | None of them | No | ... | 1.52 | 108.86 | 46.87 | No | No | No | No | Yes, received tetanus shot, but not Tdap | No | Yes |
12 | Alabama | Male | Good | 5.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 5.0 | 6 or more, but not all | Yes | ... | 1.88 | 115.67 | 32.74 | No | No | Yes | Yes | Yes, received tetanus shot but not sure what type | No | No |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
246002 | Virgin Islands | Male | Good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 6.0 | 1 to 5 | No | ... | 1.88 | 106.59 | 30.17 | Yes | No | No | No | Yes, received tetanus shot but not sure what type | No | Tested positive using home test without a heal... |
246012 | Virgin Islands | Male | Fair | 7.0 | 30.0 | Within past year (anytime less than 12 months ... | No | 4.0 | None of them | Yes | ... | 1.88 | 117.93 | 33.38 | Yes | Yes | No | No | No, did not receive any tetanus shot in the pa... | No | Yes |
246016 | Virgin Islands | Male | Good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 6.0 | 1 to 5 | Yes | ... | 1.80 | 118.84 | 36.54 | Yes | Yes | Yes | No | Yes, received tetanus shot but not sure what type | No | No |
246017 | Virgin Islands | Male | Very good | 0.0 | 0.0 | Within past 2 years (1 year but less than 2 ye... | Yes | 6.0 | None of them | No | ... | 1.78 | 102.06 | 32.28 | Yes | No | No | No | Yes, received tetanus shot but not sure what type | No | No |
246021 | Virgin Islands | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 5.0 | None of them | Yes | ... | 1.83 | 108.86 | 32.55 | No | Yes | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes |
44646 rows × 40 columns
Группировка и агрегация данных в датафрейме¶
group = df.groupby(['State'])['WeightInKilograms'].mean()
group.to_frame()
WeightInKilograms | |
---|---|
State | |
Alabama | 85.225899 |
Alaska | 83.937201 |
Arizona | 82.626862 |
Arkansas | 85.361796 |
California | 81.334135 |
Colorado | 80.805505 |
Connecticut | 82.192881 |
Delaware | 84.224436 |
District of Columbia | 78.593038 |
Florida | 83.155785 |
Georgia | 84.332240 |
Guam | 77.294261 |
Hawaii | 76.419335 |
Idaho | 84.648567 |
Illinois | 83.459467 |
Indiana | 85.703237 |
Iowa | 86.970651 |
Kansas | 85.864583 |
Kentucky | 86.781960 |
Louisiana | 85.162787 |
Maine | 82.949232 |
Maryland | 83.543344 |
Massachusetts | 80.591010 |
Michigan | 83.629868 |
Minnesota | 84.954303 |
Mississippi | 88.322797 |
Missouri | 85.836119 |
Montana | 84.231140 |
Nebraska | 85.961696 |
Nevada | 82.784771 |
New Hampshire | 80.702764 |
New Jersey | 81.270844 |
New Mexico | 80.529087 |
New York | 80.960180 |
North Carolina | 83.730953 |
North Dakota | 85.924972 |
Ohio | 86.938279 |
Oklahoma | 85.517429 |
Oregon | 83.802043 |
Pennsylvania | 83.831872 |
Puerto Rico | 79.152187 |
Rhode Island | 80.675832 |
South Carolina | 84.046443 |
South Dakota | 86.868195 |
Tennessee | 86.237325 |
Texas | 84.894035 |
Utah | 83.888474 |
Vermont | 80.557657 |
Virgin Islands | 82.131440 |
Virginia | 83.822634 |
Washington | 83.077369 |
West Virginia | 86.697505 |
Wisconsin | 86.167571 |
Wyoming | 83.844357 |
Сортировка данных в датафрейме
sorted_df = df.sort_values(by='WeightInKilograms', ascending = False)
sorted_df
State | Sex | GeneralHealth | PhysicalHealthDays | MentalHealthDays | LastCheckupTime | PhysicalActivities | SleepHours | RemovedTeeth | HadHeartAttack | ... | HeightInMeters | WeightInKilograms | BMI | AlcoholDrinkers | HIVTesting | FluVaxLast12 | PneumoVaxEver | TetanusLast10Tdap | HighRiskLastYear | CovidPos | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9060 | Arizona | Male | Fair | 15.0 | 15.0 | Within past year (anytime less than 12 months ... | No | 8.0 | None of them | No | ... | 1.85 | 292.57 | 85.10 | No | No | No | No | Yes, received Tdap | No | No |
48969 | Hawaii | Male | Poor | 30.0 | 30.0 | Within past year (anytime less than 12 months ... | Yes | 4.0 | None of them | No | ... | 1.93 | 276.24 | 74.13 | No | No | No | No | No, did not receive any tetanus shot in the pa... | No | Yes |
75697 | Kentucky | Male | Very good | 0.0 | 0.0 | 5 or more years ago | No | 7.0 | None of them | No | ... | 1.91 | 273.52 | 75.37 | No | No | No | No | Yes, received tetanus shot but not sure what type | No | No |
143147 | New York | Male | Very good | 3.0 | 1.0 | Within past year (anytime less than 12 months ... | Yes | 8.0 | None of them | No | ... | 1.88 | 273.06 | 77.29 | Yes | No | No | No | Yes, received tetanus shot but not sure what type | No | No |
76244 | Kentucky | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 1.83 | 272.16 | 81.37 | No | Yes | No | No | Yes, received tetanus shot but not sure what type | No | Yes |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
203695 | Vermont | Female | Poor | 30.0 | 3.0 | Within past 2 years (1 year but less than 2 ye... | No | 18.0 | All | No | ... | 1.60 | 30.84 | 12.05 | No | No | No | No | No, did not receive any tetanus shot in the pa... | No | No |
242632 | Puerto Rico | Female | Fair | 30.0 | 7.0 | Within past year (anytime less than 12 months ... | No | 7.0 | 6 or more, but not all | No | ... | 1.35 | 30.39 | 16.77 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | No |
11614 | Arkansas | Female | Poor | 30.0 | 30.0 | Within past year (anytime less than 12 months ... | No | 8.0 | All | No | ... | 1.52 | 29.48 | 12.69 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes |
127404 | Nebraska | Female | Poor | 30.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 6.0 | None of them | No | ... | 1.52 | 29.48 | 12.69 | No | No | No | Yes | Yes, received tetanus shot but not sure what type | No | No |
179326 | South Carolina | Female | Very good | 0.0 | 0.0 | 5 or more years ago | No | 8.0 | None of them | No | ... | 1.52 | 28.12 | 12.11 | No | No | No | No | No, did not receive any tetanus shot in the pa... | No | No |
246022 rows × 40 columns
Удаление строк/столбцов
df_dropped_columns = df.drop(columns=['AlcoholDrinkers', 'BMI']) # Удаление столбцов 'AlcoholDrinkers' и 'BMI'
df_dropped_columns
State | Sex | GeneralHealth | PhysicalHealthDays | MentalHealthDays | LastCheckupTime | PhysicalActivities | SleepHours | RemovedTeeth | HadHeartAttack | ... | RaceEthnicityCategory | AgeCategory | HeightInMeters | WeightInKilograms | HIVTesting | FluVaxLast12 | PneumoVaxEver | TetanusLast10Tdap | HighRiskLastYear | CovidPos | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Alabama | Female | Very good | 4.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 9.0 | None of them | No | ... | White only, Non-Hispanic | Age 65 to 69 | 1.60 | 71.67 | No | Yes | Yes | Yes, received Tdap | No | No |
1 | Alabama | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 6.0 | None of them | No | ... | White only, Non-Hispanic | Age 70 to 74 | 1.78 | 95.25 | No | Yes | Yes | Yes, received tetanus shot but not sure what type | No | No |
2 | Alabama | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 8.0 | 6 or more, but not all | No | ... | White only, Non-Hispanic | Age 75 to 79 | 1.85 | 108.86 | No | No | Yes | No, did not receive any tetanus shot in the pa... | No | Yes |
3 | Alabama | Female | Fair | 5.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 9.0 | None of them | No | ... | White only, Non-Hispanic | Age 80 or older | 1.70 | 90.72 | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes |
4 | Alabama | Female | Good | 3.0 | 15.0 | Within past year (anytime less than 12 months ... | Yes | 5.0 | 1 to 5 | No | ... | White only, Non-Hispanic | Age 80 or older | 1.55 | 79.38 | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | No |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
246017 | Virgin Islands | Male | Very good | 0.0 | 0.0 | Within past 2 years (1 year but less than 2 ye... | Yes | 6.0 | None of them | No | ... | White only, Non-Hispanic | Age 60 to 64 | 1.78 | 102.06 | No | No | No | Yes, received tetanus shot but not sure what type | No | No |
246018 | Virgin Islands | Female | Fair | 0.0 | 7.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | Black only, Non-Hispanic | Age 25 to 29 | 1.93 | 90.72 | No | No | No | No, did not receive any tetanus shot in the pa... | No | Yes |
246019 | Virgin Islands | Male | Good | 0.0 | 15.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | 1 to 5 | No | ... | Multiracial, Non-Hispanic | Age 65 to 69 | 1.68 | 83.91 | Yes | Yes | Yes | Yes, received tetanus shot but not sure what type | No | Yes |
246020 | Virgin Islands | Female | Excellent | 2.0 | 2.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | Black only, Non-Hispanic | Age 50 to 54 | 1.70 | 83.01 | Yes | Yes | No | Yes, received tetanus shot but not sure what type | No | No |
246021 | Virgin Islands | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 5.0 | None of them | Yes | ... | Black only, Non-Hispanic | Age 70 to 74 | 1.83 | 108.86 | Yes | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes |
246022 rows × 38 columns
df_dropped_rows = df.drop([0, 1]) # Удаление строк с индексами 0 и 1
df_dropped_rows
State | Sex | GeneralHealth | PhysicalHealthDays | MentalHealthDays | LastCheckupTime | PhysicalActivities | SleepHours | RemovedTeeth | HadHeartAttack | ... | HeightInMeters | WeightInKilograms | BMI | AlcoholDrinkers | HIVTesting | FluVaxLast12 | PneumoVaxEver | TetanusLast10Tdap | HighRiskLastYear | CovidPos | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Alabama | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 8.0 | 6 or more, but not all | No | ... | 1.85 | 108.86 | 31.66 | Yes | No | No | Yes | No, did not receive any tetanus shot in the pa... | No | Yes |
3 | Alabama | Female | Fair | 5.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 9.0 | None of them | No | ... | 1.70 | 90.72 | 31.32 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes |
4 | Alabama | Female | Good | 3.0 | 15.0 | Within past year (anytime less than 12 months ... | Yes | 5.0 | 1 to 5 | No | ... | 1.55 | 79.38 | 33.07 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | No |
5 | Alabama | Male | Good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 1.85 | 120.20 | 34.96 | Yes | Yes | Yes | No | Yes, received tetanus shot but not sure what type | No | No |
6 | Alabama | Female | Good | 3.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 8.0 | 6 or more, but not all | No | ... | 1.63 | 88.00 | 33.30 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | No |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
246017 | Virgin Islands | Male | Very good | 0.0 | 0.0 | Within past 2 years (1 year but less than 2 ye... | Yes | 6.0 | None of them | No | ... | 1.78 | 102.06 | 32.28 | Yes | No | No | No | Yes, received tetanus shot but not sure what type | No | No |
246018 | Virgin Islands | Female | Fair | 0.0 | 7.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 1.93 | 90.72 | 24.34 | No | No | No | No | No, did not receive any tetanus shot in the pa... | No | Yes |
246019 | Virgin Islands | Male | Good | 0.0 | 15.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | 1 to 5 | No | ... | 1.68 | 83.91 | 29.86 | Yes | Yes | Yes | Yes | Yes, received tetanus shot but not sure what type | No | Yes |
246020 | Virgin Islands | Female | Excellent | 2.0 | 2.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 1.70 | 83.01 | 28.66 | No | Yes | Yes | No | Yes, received tetanus shot but not sure what type | No | No |
246021 | Virgin Islands | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 5.0 | None of them | Yes | ... | 1.83 | 108.86 | 32.55 | No | Yes | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes |
246020 rows × 40 columns
Создание новых столбцов на основе данных из существующих столбцов датафрейма¶
df['SleepHours-HeightInMeters'] = df['SleepHours'] - df['HeightInMeters']
df
State | Sex | GeneralHealth | PhysicalHealthDays | MentalHealthDays | LastCheckupTime | PhysicalActivities | SleepHours | RemovedTeeth | HadHeartAttack | ... | WeightInKilograms | BMI | AlcoholDrinkers | HIVTesting | FluVaxLast12 | PneumoVaxEver | TetanusLast10Tdap | HighRiskLastYear | CovidPos | SleepHours-HeightInMeters | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Alabama | Female | Very good | 4.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 9.0 | None of them | No | ... | 71.67 | 27.99 | No | No | Yes | Yes | Yes, received Tdap | No | No | 7.40 |
1 | Alabama | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 6.0 | None of them | No | ... | 95.25 | 30.13 | No | No | Yes | Yes | Yes, received tetanus shot but not sure what type | No | No | 4.22 |
2 | Alabama | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 8.0 | 6 or more, but not all | No | ... | 108.86 | 31.66 | Yes | No | No | Yes | No, did not receive any tetanus shot in the pa... | No | Yes | 6.15 |
3 | Alabama | Female | Fair | 5.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 9.0 | None of them | No | ... | 90.72 | 31.32 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes | 7.30 |
4 | Alabama | Female | Good | 3.0 | 15.0 | Within past year (anytime less than 12 months ... | Yes | 5.0 | 1 to 5 | No | ... | 79.38 | 33.07 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | No | 3.45 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
246017 | Virgin Islands | Male | Very good | 0.0 | 0.0 | Within past 2 years (1 year but less than 2 ye... | Yes | 6.0 | None of them | No | ... | 102.06 | 32.28 | Yes | No | No | No | Yes, received tetanus shot but not sure what type | No | No | 4.22 |
246018 | Virgin Islands | Female | Fair | 0.0 | 7.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 90.72 | 24.34 | No | No | No | No | No, did not receive any tetanus shot in the pa... | No | Yes | 5.07 |
246019 | Virgin Islands | Male | Good | 0.0 | 15.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | 1 to 5 | No | ... | 83.91 | 29.86 | Yes | Yes | Yes | Yes | Yes, received tetanus shot but not sure what type | No | Yes | 5.32 |
246020 | Virgin Islands | Female | Excellent | 2.0 | 2.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 83.01 | 28.66 | No | Yes | Yes | No | Yes, received tetanus shot but not sure what type | No | No | 5.30 |
246021 | Virgin Islands | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 5.0 | None of them | Yes | ... | 108.86 | 32.55 | No | Yes | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes | 3.17 |
246022 rows × 41 columns
Удаление строк с пустыми значениями
print(df.isna().sum())
State 0 Sex 0 GeneralHealth 0 PhysicalHealthDays 0 MentalHealthDays 0 LastCheckupTime 0 PhysicalActivities 0 SleepHours 0 RemovedTeeth 0 HadHeartAttack 0 HadAngina 0 HadStroke 0 HadAsthma 0 HadSkinCancer 0 HadCOPD 0 HadDepressiveDisorder 0 HadKidneyDisease 0 HadArthritis 0 HadDiabetes 0 DeafOrHardOfHearing 0 BlindOrVisionDifficulty 0 DifficultyConcentrating 0 DifficultyWalking 0 DifficultyDressingBathing 0 DifficultyErrands 0 SmokerStatus 0 ECigaretteUsage 0 ChestScan 0 RaceEthnicityCategory 0 AgeCategory 0 HeightInMeters 0 WeightInKilograms 0 BMI 0 AlcoholDrinkers 0 HIVTesting 0 FluVaxLast12 0 PneumoVaxEver 0 TetanusLast10Tdap 0 HighRiskLastYear 0 CovidPos 0 SleepHours-HeightInMeters 0 dtype: int64
df.dropna() #Тк.пустых строк нет, мы ничего не удалили
State | Sex | GeneralHealth | PhysicalHealthDays | MentalHealthDays | LastCheckupTime | PhysicalActivities | SleepHours | RemovedTeeth | HadHeartAttack | ... | WeightInKilograms | BMI | AlcoholDrinkers | HIVTesting | FluVaxLast12 | PneumoVaxEver | TetanusLast10Tdap | HighRiskLastYear | CovidPos | SleepHours-HeightInMeters | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Alabama | Female | Very good | 4.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 9.0 | None of them | No | ... | 71.67 | 27.99 | No | No | Yes | Yes | Yes, received Tdap | No | No | 7.40 |
1 | Alabama | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 6.0 | None of them | No | ... | 95.25 | 30.13 | No | No | Yes | Yes | Yes, received tetanus shot but not sure what type | No | No | 4.22 |
2 | Alabama | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 8.0 | 6 or more, but not all | No | ... | 108.86 | 31.66 | Yes | No | No | Yes | No, did not receive any tetanus shot in the pa... | No | Yes | 6.15 |
3 | Alabama | Female | Fair | 5.0 | 0.0 | Within past year (anytime less than 12 months ... | Yes | 9.0 | None of them | No | ... | 90.72 | 31.32 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes | 7.30 |
4 | Alabama | Female | Good | 3.0 | 15.0 | Within past year (anytime less than 12 months ... | Yes | 5.0 | 1 to 5 | No | ... | 79.38 | 33.07 | No | No | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | No | 3.45 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
246017 | Virgin Islands | Male | Very good | 0.0 | 0.0 | Within past 2 years (1 year but less than 2 ye... | Yes | 6.0 | None of them | No | ... | 102.06 | 32.28 | Yes | No | No | No | Yes, received tetanus shot but not sure what type | No | No | 4.22 |
246018 | Virgin Islands | Female | Fair | 0.0 | 7.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 90.72 | 24.34 | No | No | No | No | No, did not receive any tetanus shot in the pa... | No | Yes | 5.07 |
246019 | Virgin Islands | Male | Good | 0.0 | 15.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | 1 to 5 | No | ... | 83.91 | 29.86 | Yes | Yes | Yes | Yes | Yes, received tetanus shot but not sure what type | No | Yes | 5.32 |
246020 | Virgin Islands | Female | Excellent | 2.0 | 2.0 | Within past year (anytime less than 12 months ... | Yes | 7.0 | None of them | No | ... | 83.01 | 28.66 | No | Yes | Yes | No | Yes, received tetanus shot but not sure what type | No | No | 5.30 |
246021 | Virgin Islands | Male | Very good | 0.0 | 0.0 | Within past year (anytime less than 12 months ... | No | 5.0 | None of them | Yes | ... | 108.86 | 32.55 | No | Yes | Yes | Yes | No, did not receive any tetanus shot in the pa... | No | Yes | 3.17 |
246022 rows × 41 columns
#df.fillna(df.mean(), inplace=True)
#df.fillna(df.median(), inplace=True)
Мы обрабатываем пустые значения для каждого столбца отдельно
Мы можем заполнить пропуски средним или медианой, если это числовой столбец
Мы заполняем средним, если в колонке нет выбросов
Если столбец категориальный, то мы можем заполнить пропуски модой (самым часто встречающимся значением)
Если пропусков мало, то их можно просто удалить.
- Возможности визуализации
import matplotlib.pyplot as plt
#Линейная диаграмма
plt.figure(figsize=(10, 5))
df['WeightInKilograms'].plot(title='Line Plot (WeightInKilograms)')
plt.show()
#Гистограмма
plt.figure(figsize=(8, 5))
df.plot.hist(column=["SleepHours"], bins=80)
plt.show()
<Figure size 800x500 with 0 Axes>
plt.figure(figsize=(8, 5))
df['AgeCategory'].value_counts().plot(kind='bar', title='Bar Plot (AgeCategory)')
plt.show()
plt.figure(figsize=(8, 5))
df["BMI"].plot(kind = "box", title='Ящик с усами')
plt.show()
plt.figure(figsize=(8, 5))
df[['AgeCategory', 'BMI']].plot(kind='area', alpha=0.2, title='Area Plot (AgeCategory, BMI)')
plt.show()
<Figure size 800x500 with 0 Axes>
df.plot.scatter(x="BMI", y="WeightInKilograms")
<Axes: xlabel='BMI', ylabel='WeightInKilograms'>
plt.figure(figsize=(8, 5))
df['AgeCategory'].value_counts().plot(kind='pie', autopct='%1.1f%%', title='Pie Chart (AgeCategory)')
plt.show()