Files
AIM-PIbd-31-Rodionov-I-A/lab_7/lab7.ipynb
2025-02-08 22:19:23 +04:00

250 KiB
Raw Blame History

Загружаем датасет по варианту и выводим информацию о столбцах:

In [71]:
import pandas as pd

df = pd.read_csv("..//..//static//csv//healthcare-dataset-stroke-data.csv")

df
Out[71]:
<style scoped=""> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
id gender age hypertension heart_disease ever_married work_type Residence_type avg_glucose_level bmi smoking_status stroke
0 9046 Male 67.0 0 1 Yes Private Urban 228.69 36.6 formerly smoked 1
1 51676 Female 61.0 0 0 Yes Self-employed Rural 202.21 NaN never smoked 1
2 31112 Male 80.0 0 1 Yes Private Rural 105.92 32.5 never smoked 1
3 60182 Female 49.0 0 0 Yes Private Urban 171.23 34.4 smokes 1
4 1665 Female 79.0 1 0 Yes Self-employed Rural 174.12 24.0 never smoked 1
... ... ... ... ... ... ... ... ... ... ... ... ...
5105 18234 Female 80.0 1 0 Yes Private Urban 83.75 NaN never smoked 0
5106 44873 Female 81.0 0 0 Yes Self-employed Urban 125.20 40.0 never smoked 0
5107 19723 Female 35.0 0 0 Yes Self-employed Rural 82.99 30.6 never smoked 0
5108 37544 Male 51.0 0 0 Yes Private Rural 166.29 25.6 formerly smoked 0
5109 44679 Female 44.0 0 0 Yes Govt_job Urban 85.28 26.2 Unknown 0

5110 rows × 12 columns

Заменим пустые значения в столбце bmi на медиану:

In [72]:
df["bmi"] = df["bmi"].fillna(df["bmi"].median())

Создадим лингвистические переменные

Входные: age (возраст пациента) и bmi (индекс массы тела пациента)

Выходные: avg_glucose_level (средний уровень глюкозы в крови)

In [73]:
import numpy as np
from skfuzzy import control as ctrl

age = ctrl.Antecedent(np.arange(df["age"].min(), df["age"].max() + 1, 1), "age")
bmi = ctrl.Antecedent(np.arange(df["bmi"].min(), df["bmi"].max() + 0.1, 0.1), "bmi")
avg_glucose_level = ctrl.Consequent(np.arange(df["avg_glucose_level"].min(), df["avg_glucose_level"].max() + 0.01, 0.01), "avg_glucose_level")

Далее произведем их настройку:

In [74]:
import skfuzzy as fuzz

age["young"] = fuzz.zmf(age.universe, 7, 30)
age["middle-aged"] = fuzz.trapmf(age.universe, [18, 30, 48, 60])
age["old"] = fuzz.smf(age.universe, 48, 60)
age.view()

bmi["low"] = fuzz.zmf(bmi.universe, 12, 18)
bmi["normal"] = fuzz.trapmf(bmi.universe, [16, 19, 24, 27])
bmi["high"] = fuzz.smf(bmi.universe, 25, 27)
bmi.view()

avg_glucose_level["low"] = fuzz.zmf(avg_glucose_level.universe, 60, 75)
avg_glucose_level["normal"] = fuzz.trapmf(avg_glucose_level.universe, [65, 75, 120, 135])
avg_glucose_level["high"] = fuzz.smf(avg_glucose_level.universe, 120, 160)
avg_glucose_level.view()
c:\Users\Ilya\Desktop\AIM\aimenv\Lib\site-packages\skfuzzy\control\fuzzyvariable.py:125: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
  fig.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Сформируем базу нечетких правил:

In [75]:
rule1 = ctrl.Rule(age["young"] & bmi["low"], avg_glucose_level["low"])
rule2 = ctrl.Rule(age["young"] & bmi["normal"], avg_glucose_level["normal"])
rule3 = ctrl.Rule(age["young"] & bmi["high"], avg_glucose_level["normal"])

rule4 = ctrl.Rule(age["middle-aged"] & bmi["low"], avg_glucose_level["normal"])
rule5 = ctrl.Rule(age["middle-aged"] & bmi["normal"], avg_glucose_level["normal"])
rule6 = ctrl.Rule(age["middle-aged"] & bmi["high"], avg_glucose_level["high"])

rule7 = ctrl.Rule(age["old"] & bmi["low"], avg_glucose_level["low"])
rule8 = ctrl.Rule(age["old"] & bmi["normal"], avg_glucose_level["normal"])
rule9 = ctrl.Rule(age["old"] & bmi["high"], avg_glucose_level["high"])

rule1.view()
Out[75]:
(<Figure size 640x480 with 1 Axes>, <Axes: >)
No description has been provided for this image

Создадим нечеткую систему и добавим созданные нечеткие правила в ее базу знаний:

In [76]:
glucose_ctrl = ctrl.ControlSystem(
    [
        rule1,
        rule2,
        rule3,
        rule4,
        rule5,
        rule6,
        rule7,
        rule8,
        rule9
    ]
)

glucose_simulation = ctrl.ControlSystemSimulation(glucose_ctrl)

glucose_ctrl.view()
c:\Users\Ilya\Desktop\AIM\aimenv\Lib\site-packages\skfuzzy\control\controlsystem.py:135: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
  fig.show()
No description has been provided for this image

Теперь можно выполнить оценку качества полученной нечеткой системы.

Для начала проверим работу системы на единичном примере:

In [77]:
glucose_simulation.input["age"] = 25
glucose_simulation.input["bmi"] = 26

glucose_simulation.compute()

glucose_simulation.print_state()
glucose_simulation.output["avg_glucose_level"]
=============
 Antecedents 
=============
Antecedent: age                     = 25
  - young                           : 0.09479621928166351
  - middle-aged                     : 0.5833333333333334
  - old                             : 0.0
Antecedent: bmi                     = 26
  - low                             : 0.0
  - normal                          : 0.33333333333333337
  - high                            : 0.49999999999999717

=======
 Rules 
=======
RULE #0:
  IF age[young] AND bmi[low] THEN avg_glucose_level[low]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age[young]                                             : 0.09479621928166351
  - bmi[low]                                               : 0.0
                                   age[young] AND bmi[low] = 0.0
  Activation (THEN-clause):
                                    avg_glucose_level[low] : 0.0

RULE #1:
  IF age[young] AND bmi[normal] THEN avg_glucose_level[normal]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age[young]                                             : 0.09479621928166351
  - bmi[normal]                                            : 0.33333333333333337
                                age[young] AND bmi[normal] = 0.09479621928166351
  Activation (THEN-clause):
                                 avg_glucose_level[normal] : 0.09479621928166351

RULE #2:
  IF age[young] AND bmi[high] THEN avg_glucose_level[normal]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age[young]                                             : 0.09479621928166351
  - bmi[high]                                              : 0.49999999999999717
                                  age[young] AND bmi[high] = 0.09479621928166351
  Activation (THEN-clause):
                                 avg_glucose_level[normal] : 0.09479621928166351

RULE #3:
  IF age[middle-aged] AND bmi[low] THEN avg_glucose_level[normal]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age[middle-aged]                                       : 0.5833333333333334
  - bmi[low]                                               : 0.0
                             age[middle-aged] AND bmi[low] = 0.0
  Activation (THEN-clause):
                                 avg_glucose_level[normal] : 0.0

RULE #4:
  IF age[middle-aged] AND bmi[normal] THEN avg_glucose_level[normal]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age[middle-aged]                                       : 0.5833333333333334
  - bmi[normal]                                            : 0.33333333333333337
                          age[middle-aged] AND bmi[normal] = 0.33333333333333337
  Activation (THEN-clause):
                                 avg_glucose_level[normal] : 0.33333333333333337

RULE #5:
  IF age[middle-aged] AND bmi[high] THEN avg_glucose_level[high]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age[middle-aged]                                       : 0.5833333333333334
  - bmi[high]                                              : 0.49999999999999717
                            age[middle-aged] AND bmi[high] = 0.49999999999999717
  Activation (THEN-clause):
                                   avg_glucose_level[high] : 0.49999999999999717

RULE #6:
  IF age[old] AND bmi[low] THEN avg_glucose_level[low]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age[old]                                               : 0.0
  - bmi[low]                                               : 0.0
                                     age[old] AND bmi[low] = 0.0
  Activation (THEN-clause):
                                    avg_glucose_level[low] : 0.0

RULE #7:
  IF age[old] AND bmi[normal] THEN avg_glucose_level[normal]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age[old]                                               : 0.0
  - bmi[normal]                                            : 0.33333333333333337
                                  age[old] AND bmi[normal] = 0.0
  Activation (THEN-clause):
                                 avg_glucose_level[normal] : 0.0

RULE #8:
  IF age[old] AND bmi[high] THEN avg_glucose_level[high]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age[old]                                               : 0.0
  - bmi[high]                                              : 0.49999999999999717
                                    age[old] AND bmi[high] = 0.0
  Activation (THEN-clause):
                                   avg_glucose_level[high] : 0.0


==============================
 Intermediaries and Conquests 
==============================
Consequent: avg_glucose_level        = 178.22779760948296
  low:
    Accumulate using accumulation_max : 0.0
  normal:
    Accumulate using accumulation_max : 0.33333333333333337
  high:
    Accumulate using accumulation_max : 0.49999999999999717

Out[77]:
np.float64(178.22779760948296)
In [78]:
avg_glucose_level.view(sim=glucose_simulation)
No description has been provided for this image

Также сравним результаты работы системы с реальными данными:

In [79]:
def fuzzy_pred(row):
    glucose_simulation.input["age"] = row["age"]
    glucose_simulation.input["bmi"] = row["bmi"]
    glucose_simulation.compute()
    return glucose_simulation.output["avg_glucose_level"]
In [80]:
result = df.copy()

result["avg_glucose_level_pred"] = result.apply(fuzzy_pred, axis=1)

result.loc[240:260, ["age", "bmi", "avg_glucose_level", "avg_glucose_level_pred"]]
Out[80]:
<style scoped=""> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
age bmi avg_glucose_level avg_glucose_level_pred
240 66.0 21.2 76.46 98.840580
241 57.0 34.5 197.28 204.737613
242 68.0 42.4 233.94 205.616976
243 68.0 40.5 247.51 205.616976
244 57.0 36.7 84.96 204.737613
245 14.0 30.9 57.93 99.039674
246 75.0 29.3 78.80 205.616976
247 71.0 28.1 87.80 205.616976
248 78.0 19.6 78.81 98.840580
249 3.0 18.0 95.12 99.204204
250 58.0 39.2 87.96 205.196699
251 8.0 17.6 110.89 99.254419
252 70.0 35.9 69.04 205.616976
253 14.0 19.1 161.28 99.039674
254 47.0 50.1 210.95 205.616976
255 52.0 17.7 77.59 99.263808
256 75.0 27.0 243.53 205.616976
257 32.0 32.3 77.67 205.616976
258 74.0 54.6 205.84 205.616976
259 79.0 35.0 77.08 205.616976
260 79.0 22.0 57.08 98.840580

И оценим эти результаты с помощью метрик для задачи регрессии:

In [81]:
import math
from sklearn import metrics

rmetrics = {}
rmetrics["RMSE"] = math.sqrt(
    metrics.mean_squared_error(result["avg_glucose_level"], result["avg_glucose_level_pred"])
)
rmetrics["RMAE"] = math.sqrt(
    metrics.mean_absolute_error(result["avg_glucose_level"], result["avg_glucose_level_pred"])
)
rmetrics["R2"] = metrics.r2_score(
    result["avg_glucose_level"], result["avg_glucose_level_pred"]
)

rmetrics
Out[81]:
{'RMSE': 83.49030705918618,
 'RMAE': 8.214434960116987,
 'R2': -2.3999770642194247}

Как можно заметить, нечеткая система с такими переменными и настройками со своей задачей справляется плохо