2025-02-07 22:11:21 +04:00

207 KiB
Raw Blame History

загрузка данных

In [5]:
import pandas as pd
df = pd.read_csv("C://Users//annal//aim//static//csv//Forbes_Billionaires.csv")
df
Out[5]:
Rank Name Networth Age Country Source Industry
0 1 Elon Musk 219.0 50 United States Tesla, SpaceX Automotive
1 2 Jeff Bezos 171.0 58 United States Amazon Technology
2 3 Bernard Arnault & family 158.0 73 France LVMH Fashion & Retail
3 4 Bill Gates 129.0 66 United States Microsoft Technology
4 5 Warren Buffett 118.0 91 United States Berkshire Hathaway Finance & Investments
... ... ... ... ... ... ... ...
2595 2578 Jorge Gallardo Ballart 1.0 80 Spain pharmaceuticals Healthcare
2596 2578 Nari Genomal 1.0 82 Philippines apparel Fashion & Retail
2597 2578 Ramesh Genomal 1.0 71 Philippines apparel Fashion & Retail
2598 2578 Sunder Genomal 1.0 68 Philippines garments Fashion & Retail
2599 2578 Horst-Otto Gerberding 1.0 69 Germany flavors and fragrances Food & Beverage

2600 rows × 7 columns

входные и выходные переменные:

входные: age и country

выходные: networth

создание лингвистических переменных

In [6]:
import skfuzzy as fuzz
from skfuzzy import control as ctrl
age_cat = ctrl.Antecedent(df["Age"].sort_values(), "age_cat")
networth = ctrl.Antecedent(df["Networth"].sort_values(), "networth")
rank = ctrl.Consequent(df["Rank "].sort_values(), "rank")

формирование нечетких переменных для лингвистических переменных и их визуализация¶

In [7]:
age_cat.automf(3, variable_type= "quant")
age_cat.view()
networth.automf(3, variable_type="quant")
networth.view()
rank.automf(5, variable_type="quant")
rank.view()
c:\Users\annal\aim\.venv\Lib\site-packages\skfuzzy\control\fuzzyvariable.py:125: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
  fig.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

формирование и визуализация нечётких правил

In [13]:
import skfuzzy as fuzz
from skfuzzy import control as ctrl 
rule1 = ctrl.Rule(age_cat['low'] & networth['low'], rank['lower'])
rule2 = ctrl.Rule(age_cat['low'] & networth['average'], rank['low'])
rule3 = ctrl.Rule(age_cat['low'] & networth['high'], rank['average'])
rule4 = ctrl.Rule(age_cat['average'] & networth['low'], rank['lower'])
rule5 = ctrl.Rule(age_cat['average'] & networth['average'], rank['average'])
rule6 = ctrl.Rule(age_cat['average'] & networth['high'], rank['high'])
rule7 = ctrl.Rule(age_cat['high'] & networth['low'], rank['average'])
rule8 = ctrl.Rule(age_cat['high'] & networth['average'], rank['high'])
rule9 = ctrl.Rule(age_cat['high'] & networth['high'], rank['higher'])

rule1.view()
Out[13]:
(<Figure size 640x480 with 1 Axes>, <Axes: >)
No description has been provided for this image

создание нечеткой системы и добавление нечетких правил в базу знаний нечеткой системы

In [14]:
rank_ctrl = ctrl.ControlSystem(
    [
        rule1,
        rule2,
        rule3,
        rule4,
        rule5,
        rule6,
        rule7,
        rule8,
        rule9,
    ]
)

ranks = ctrl.ControlSystemSimulation(rank_ctrl)

пример расчёта rank на основе age_cat и networth

In [15]:
ranks.input['age_cat'] = 50
ranks.input['networth'] = 219
ranks.compute()
ranks.print_state()
print(ranks.output['rank'])
=============
 Antecedents 
=============
Antecedent: age_cat                 = 50
  - low                             : 0.2345679012345679
  - average                         : 0.7654320987654321
  - high                            : 0.0
Antecedent: networth                = 219
  - low                             : 0.0
  - average                         : 0.0
  - high                            : 1.0

=======
 Rules 
=======
RULE #0:
  IF age_cat[low] AND networth[low] THEN rank[lower]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age_cat[low]                                           : 0.2345679012345679
  - networth[low]                                          : 0.0
                            age_cat[low] AND networth[low] = 0.0
  Activation (THEN-clause):
                                               rank[lower] : 0.0

RULE #1:
  IF age_cat[low] AND networth[average] THEN rank[low]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age_cat[low]                                           : 0.2345679012345679
  - networth[average]                                      : 0.0
                        age_cat[low] AND networth[average] = 0.0
  Activation (THEN-clause):
                                                 rank[low] : 0.0

RULE #2:
  IF age_cat[low] AND networth[high] THEN rank[average]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age_cat[low]                                           : 0.2345679012345679
  - networth[high]                                         : 1.0
                           age_cat[low] AND networth[high] = 0.2345679012345679
  Activation (THEN-clause):
                                             rank[average] : 0.2345679012345679

RULE #3:
  IF age_cat[average] AND networth[low] THEN rank[lower]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age_cat[average]                                       : 0.7654320987654321
  - networth[low]                                          : 0.0
                        age_cat[average] AND networth[low] = 0.0
  Activation (THEN-clause):
                                               rank[lower] : 0.0

RULE #4:
  IF age_cat[average] AND networth[average] THEN rank[average]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age_cat[average]                                       : 0.7654320987654321
  - networth[average]                                      : 0.0
                    age_cat[average] AND networth[average] = 0.0
  Activation (THEN-clause):
                                             rank[average] : 0.0

RULE #5:
  IF age_cat[average] AND networth[high] THEN rank[high]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age_cat[average]                                       : 0.7654320987654321
  - networth[high]                                         : 1.0
                       age_cat[average] AND networth[high] = 0.7654320987654321
  Activation (THEN-clause):
                                                rank[high] : 0.7654320987654321

RULE #6:
  IF age_cat[high] AND networth[low] THEN rank[average]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age_cat[high]                                          : 0.0
  - networth[low]                                          : 0.0
                           age_cat[high] AND networth[low] = 0.0
  Activation (THEN-clause):
                                             rank[average] : 0.0

RULE #7:
  IF age_cat[high] AND networth[average] THEN rank[high]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age_cat[high]                                          : 0.0
  - networth[average]                                      : 0.0
                       age_cat[high] AND networth[average] = 0.0
  Activation (THEN-clause):
                                                rank[high] : 0.0

RULE #8:
  IF age_cat[high] AND networth[high] THEN rank[higher]
	AND aggregation function : fmin
	OR aggregation function  : fmax

  Aggregation (IF-clause):
  - age_cat[high]                                          : 0.0
  - networth[high]                                         : 1.0
                          age_cat[high] AND networth[high] = 0.0
  Activation (THEN-clause):
                                              rank[higher] : 0.0


==============================
 Intermediaries and Conquests 
==============================
Consequent: rank                     = 1756.4538971994773
  lower:
    Accumulate using accumulation_max : 0.0
  low:
    Accumulate using accumulation_max : 0.0
  average:
    Accumulate using accumulation_max : 0.2345679012345679
  high:
    Accumulate using accumulation_max : 0.7654320987654321
  higher:
    Accumulate using accumulation_max : 0.0

1756.4538971994773

тестирование нечёткой системы

In [16]:
def fuzzy_pred(row):
    ranks.input['age_cat'] = row['Age']
    ranks.input['networth'] = row['Networth']
    ranks.compute()
    return ranks.output['rank']

res = df[['Age', 'Networth', 'Rank ']].head(100)

res['Pred'] = res.apply(fuzzy_pred, axis=1)

res.head(15)
Out[16]:
Age Networth Rank Pred
0 50 219.0 1 1756.453897
1 58 171.0 2 1604.640029
2 73 158.0 3 1622.578064
3 66 129.0 4 1441.738174
4 91 118.0 5 1768.807248
5 49 111.0 6 1106.377953
6 48 107.0 7 1083.278263
7 77 106.0 8 1541.073986
8 66 91.4 9 1283.707578
9 64 90.7 10 1245.266049
10 59 90.0 11 1143.677808
11 80 82.0 12 1412.560349
12 82 81.2 13 1429.370589
13 68 74.8 14 1227.428422
14 37 67.3 15 889.877260

оценка результатов (метрики для задачи регрессии)

In [17]:
import math
from sklearn import metrics


rmetrics = {}
rmetrics["RMSE"] = math.sqrt(metrics.mean_squared_error(res['Rank '], res['Pred']))
rmetrics["RMAE"] = math.sqrt(metrics.mean_absolute_error(res['Rank '], res['Pred']))
rmetrics["R2"] = metrics.r2_score(res['Rank '], res['Pred'])

rmetrics
Out[17]:
{'RMSE': 1028.1501824418408,
 'RMAE': 31.426056535415803,
 'R2': -1272.901233304339}

p.s. менять датасет нет смысла. все подобные датасеты имеют одинаковые колонки. какой не возьми, резульат не хороший