Compare commits

...

92 Commits

Author SHA1 Message Date
fc5942cdb1 basharin_sevastyan_lab_7 is ready 2023-12-14 22:33:48 +04:00
71b16e78b7 в процессе 2023-12-07 22:35:52 +04:00
bcc00fa6a5 Merge pull request 'martysheva_tamara_lab_3' (#172) from martysheva_tamara_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/172
2023-12-05 23:24:36 +04:00
6599b19d25 Merge pull request 'zhukova_alina_lab_2 is ready' (#157) from zhukova_alina_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/157
2023-12-05 23:23:51 +04:00
b4d9dfaa00 Merge pull request 'malkova_anastasia_lab_3 ready' (#148) from malkova_anastasia_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/148
2023-12-05 23:23:16 +04:00
d734997760 Merge pull request 'zhukova_alina_lab_3 is ready' (#158) from zhukova_alina_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/158
2023-12-05 23:21:56 +04:00
0b0dc13465 Merge pull request 'sergeev_evgenii_lab_3' (#144) from sergeev_evgenii_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/144
2023-12-05 23:21:28 +04:00
8fe64134c0 Merge pull request 'mashkova_margarita_lab_3 ready' (#184) from mashkova_margarita_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/184
2023-12-05 23:17:36 +04:00
0ce6657922 Merge pull request 'Лабораторная работа 3' (#186) from orlov_artem_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/186
2023-12-05 23:17:06 +04:00
8b39205604 Merge pull request 'laba 2 ready!!!' (#187) from verina_daria_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/187
2023-12-05 23:16:36 +04:00
ac83aa892a Merge pull request 'kochkareva_elizaveta_lab_3 is ready' (#195) from kochkareva_elizaveta_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/195
2023-12-05 23:16:14 +04:00
2648aac11a Merge pull request 'verina_daria_lab_3' (#188) from verina_daria_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/188
2023-12-05 23:16:04 +04:00
f3d73b433f Merge pull request 'kondrashin_mikhail_lab_3_ready' (#204) from kondrashin_mikhail_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/204
2023-12-05 23:14:01 +04:00
794389f861 Merge pull request 'lab3' (#212) from simonov_nikita_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/212
2023-12-05 23:13:40 +04:00
d6d9068a03 Merge pull request 'tepechin_kirill_lab_3' (#216) from tepechin_kirill_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/216
2023-12-05 23:13:16 +04:00
64d87ef5f7 Merge pull request 'shestakova_maria_lab_3 is ready' (#222) from shestakova_maria_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/222
2023-12-05 23:12:37 +04:00
1dca1eb91b Merge pull request 'tepechin_kirill_lab_4' (#227) from tepechin_kirill_lab_4 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/227
2023-12-05 23:12:05 +04:00
7be972dbc7 Merge pull request 'arzamaskina_milana_lab_3 is ready' (#232) from arzamaskina_milana_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/232
2023-12-05 23:11:31 +04:00
9bcd4fdbf0 Merge pull request 'volkov_rafael_lab_3 is done' (#251) from volkov_rafael_lab_3 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/251
2023-12-05 23:11:15 +04:00
131cb584dd Merge pull request 'malkova_anastasia_lab_2 ready' (#147) from malkova_anastasia_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/147
2023-12-05 23:02:56 +04:00
8241b9b429 Merge pull request 'kutygin_andrey_lab_2_ready' (#150) from kutygin_andrey_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/150
2023-12-05 23:02:30 +04:00
2d9998d681 Merge pull request 'arzamaskina_milana_lab_2 is ready' (#180) from arzamaskina_milana_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/180
2023-12-05 23:02:11 +04:00
5886f99b30 Merge pull request 'mashkova_margarita_lab_2' (#183) from mashkova_margarita_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/183
2023-12-05 23:01:27 +04:00
50e0780960 Merge pull request 'Лабораторная работа 2' (#185) from orlov_artem_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/185
2023-12-05 23:00:40 +04:00
e64084bb6e Merge pull request 'kochkareva_elizaveta_lab_2 is ready' (#194) from kochkareva_elizaveta_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/194
2023-12-05 22:59:34 +04:00
076449fd0b Merge pull request 'kondrashin_mikhail_lab_2_ready' (#203) from kondrashin_mikhail_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/203
2023-12-05 22:58:59 +04:00
c2101d3e00 Merge pull request 'romanova_adelina_lab_2 is ready' (#209) from romanova_adelina_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/209
2023-12-05 22:57:27 +04:00
7e5a16e38b Merge pull request 'tepechin_kirill_lab_2' (#211) from tepechin_kirill_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/211
2023-12-05 22:57:13 +04:00
cad64a19f6 Merge pull request 'shestakova_maria_lab_2 is ready' (#221) from shestakova_maria_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/221
2023-12-05 22:57:07 +04:00
c2e32b2ef2 Merge pull request 'degtyarev_mikhail_lab_2_is_ready' (#245) from degtyarev_mikhail_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/245
2023-12-05 22:55:44 +04:00
32a53d4be5 Merge pull request 'volkov_rafael_lab_2 is done' (#250) from volkov_rafael_lab_2 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/250
2023-12-05 22:55:25 +04:00
4748853b67 Merge pull request 'volkov_rafael_lab_1 is done' (#249) from volkov_rafael_lab_1 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/249
2023-12-05 22:52:25 +04:00
8a297e2542 Merge pull request 'tepechin_kirill_lab_1' (#210) from tepechin_kirill_lab_1 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/210
2023-12-05 22:52:05 +04:00
23e1642819 Merge pull request 'kondrashin_mikhail_lab_1_ready' (#202) from kondrashin_mikhail_lab_1 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/202
2023-12-05 22:52:01 +04:00
e943a38726 Merge pull request 'degtyarev_mikhail_lab_1' (#234) from degtyarev_mikhail_lab_1 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/234
2023-12-05 22:51:04 +04:00
205e558e12 Merge pull request 'shestakova_maria_lab_1 is ready' (#220) from shestakova_maria_lab_1 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/220
2023-12-05 22:50:49 +04:00
5fa2de6b57 Merge pull request 'mashkova_margarita_lab_1 ready' (#182) from mashkova_margarita_lab_1 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/182
2023-12-05 22:49:49 +04:00
1a80ebbe76 Merge pull request 'kochkareva_elizaveta_lab_1 is ready' (#193) from kochkareva_elizaveta_lab_1 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/193
2023-12-05 22:49:40 +04:00
a8fe7f1c3e Merge pull request 'laba 1 ready!!!' (#181) from verina_daria_lab_1 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/181
2023-12-05 22:48:47 +04:00
3b74c70f50 Merge pull request 'arzamaskina_milana_lab_1 is ready' (#179) from arzamaskina_milana_lab_1 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/179
2023-12-05 22:48:30 +04:00
37e90b4c6c Merge pull request 'kutygin_lab_1_ready' (#149) from kutygin_andrey_lab_1 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/149
2023-12-05 22:44:56 +04:00
44023b7305 Merge pull request 'Лабораторная работа 1' (#145) from orlov_artem_lab_1 into main
Reviewed-on: http://student.git.athene.tech/Alexey/IIS_2023_1/pulls/145
2023-12-05 22:44:31 +04:00
Rafael Volkov
2c7a1a1e18 volkov_rafael_lab_3 is done 2023-12-05 12:27:52 +04:00
Rafael Volkov
b661ebcb41 volkov_rafael_lab_2 is done 2023-12-05 12:27:36 +04:00
Rafael Volkov
5b27113150 volkov_rafael_lab_1 is done 2023-12-05 12:27:16 +04:00
3e32d676e0 degtyarev_mikhail_lab_2_is_ready 2023-12-03 15:05:18 +04:00
79ae40d608 degtyarev_mikhail_lab_1 2023-12-02 00:15:42 +04:00
b3c9da8f15 arzamaskina_milana_lab_3 is ready 2023-12-01 22:12:53 +04:00
9ac110d9ab tepechin_kirill_lab_4 2023-12-01 13:27:39 +04:00
d2627b6a38 tepechin_kirill_lab_3 2023-11-29 23:09:58 +04:00
Мария Ш
de0a8ee5bc shestakova_maria_lab_3 is ready 2023-11-29 20:32:10 +03:00
a4aa458bc5 lab3 2023-11-29 19:36:57 +04:00
c24c21caf3 tepechin_kirill_lab_2 2023-11-29 17:43:31 +04:00
c47e1e3012 Merge remote-tracking branch 'origin/tepechin_kirill_lab_1' into tepechin_kirill_lab_1 2023-11-29 15:54:51 +04:00
5729ed64a9 Merge remote-tracking branch 'origin/tepechin_kirill_lab_1' into tepechin_kirill_lab_1 2023-11-29 15:53:25 +04:00
49d703350e Merge remote-tracking branch 'origin/tepechin_kirill_lab_1' into tepechin_kirill_lab_1 2023-11-29 15:50:12 +04:00
d29da45383 tepechin_kirill_lab1 2023-11-29 15:49:59 +04:00
90954dfa89 tepechin_kirill_lab1 2023-11-29 15:43:19 +04:00
Мария Ш
4c08267e74 shestakova_maria_lab_2 is ready 2023-11-28 21:24:32 +03:00
Мария Ш
bc33af764d shestakova_maria_lab_1 is ready 2023-11-28 20:52:20 +03:00
2b2eb9b72d romanova_adelina_lab_2 is ready 2023-11-28 01:00:12 +04:00
3b1bd034a7 kondrashin_mikhail_lab_3_ready 2023-11-26 19:54:27 +04:00
2a1d17c98f kondrashin_mikhail_lab_2_ready 2023-11-25 17:47:27 +04:00
d62368540e kondrashin_mikhail_lab_1_ready 2023-11-25 16:34:00 +04:00
Kochkareva
2dbadb666a kochkareva_elizaveta_lab_3 is ready 2023-11-24 15:25:28 +04:00
Kochkareva
9789f9772b kochkareva_elizaveta_lab_2 is ready 2023-11-24 15:23:08 +04:00
Kochkareva
2670375d98 kochkareva_elizaveta_lab_1 is ready 2023-11-24 15:19:29 +04:00
altteam
690fd745de laba 3 economica file 2023-11-23 00:37:33 +04:00
altteam
cc021ad78a laba 3 ready!!! 2023-11-23 00:35:34 +04:00
altteam
cc1802b4f0 laba 2 ready!!! 2023-11-22 22:52:44 +04:00
artem.orlov
53818e12e5 Лабораторная работа 3 2023-11-22 21:28:17 +04:00
artem.orlov
7c63fc79f6 Лабораторная работа 2 2023-11-22 21:23:29 +04:00
88f5268b31 mashkova_margarita_lab_3 ready 2023-11-22 06:33:50 +04:00
3a316d94a1 mashkova_margarita_lab_2 change md 2023-11-22 00:40:05 +04:00
481a18c68d mashkova_margarita_lab_2 change md 2023-11-22 00:35:41 +04:00
27d25c8f14 mashkova_margarita_lab_2 ready 2023-11-22 00:31:35 +04:00
8eb27373e0 mashkova_margarita_lab_1 ready 2023-11-21 05:50:29 +04:00
altteam
f0334fdc44 laba 1 ready!!! 2023-11-20 19:05:36 +04:00
7eda432471 arzamaskina_milana_lab_2 is ready 2023-11-20 16:21:41 +04:00
ebf64afcf2 martysheva lab3 done 2023-11-19 14:28:42 +04:00
637083470c martysheva lab3 done 2023-11-19 14:25:06 +04:00
c3aa36cc7b arzamaskina_milana_lab_1 is ready 2023-11-16 13:54:53 +04:00
e80da1c4ce zhukova_alina_lab_3 is ready 2023-11-15 17:00:31 +04:00
2506e7cd95 zhukova_alina_lab_2 is ready 2023-11-15 16:54:40 +04:00
6298d561f8 kutygin_andrey_lab_2_ready 2023-11-13 20:34:13 +04:00
294764b582 kutygin_lab_1_ready 2023-11-13 20:28:24 +04:00
54eea76599 lab3 ready 2023-11-11 22:55:33 +04:00
4c8da5afb3 fix readme file 2023-11-11 18:52:08 +04:00
21552b4c19 lab2 ready 2023-11-11 18:47:17 +04:00
artem.orlov
220b176be4 Лабораторная работа 1 2023-11-09 23:41:58 +04:00
sergeevevgen
0d1e5a83f4 done 2023-11-06 23:11:22 +04:00
Евгений Сергеев
c4bd132891 + 2023-10-29 21:02:40 +04:00
184 changed files with 3013033 additions and 0 deletions

View File

@@ -0,0 +1,52 @@
## Задание
Работа с типовыми наборами данных и различными моделями.
Сгенерируйте определенный тип данных и сравните на нем 3 модели
Вариант №2
Данные: make_circles(noise=0.2, factor=0.5, random_state=1)
Модели:
+ Линейная регрессия
+ Полиномиальная регрессия (degree=3)
+ Гребневая рекрессия (degree=3, alpha=1.0)
## Используемые технологии
В лабораторной были использованы библиотеки:
+ matplotlib - используется для создания графиков
+ sklearn - используется для работы с моделями машинного обучения
## Как запустить
Запустить файл main.py, который выполнит необходимые действия над моделями
и выведет графики на экран.
## Что делает программа
Генерирует набор данных типа circles, делит его на обучающую и тестовую выборки.
По очереди обучает на данных обучающей выборки 3 модели:
модель линейной регрессии, модель полиномиальной регрессии со степенью 3 и
модель гребневой регрессии со степенью 3 и alpha=1.0.
После обучения предсказания моделей проверяются на тестовых данных.
Строится 4 графика, один для отображения первоначальных тестовых и обучающих данных, где:
`o` - точки обучающей выборки первого и второго типа.
`x` - точки тестовой выборки первого и второго типа.
И по одному графику для каждой модели, где:
`o` - точки тестовой выборки первого и второго типа.
Далее программа выведет оценки точности моделей. Полученные оценки:
+ Линейная регрессия - 0.268
+ Полиномиальная регрессия со степенью 3 - 0.134
+ Гребневая регрессия со степенью 3, alpha=1.0 - 0.131
## Скриншоты работы программы
График для отображения первоначальных тестовых и обучающих данных и
полученные графики разбиения точек на классы:
Линейная регрессия - Полиномиальная регрессия (со степенью 3) - Гребневая регрессия (со степенью 3, alpha=1.0)
![img.png](img_screen_1.png)
Вывод анализа точности работы моделей:
![img.png](img_screen_2.png)
## Вывод
Исходя из этого, можно сделать вывод: лучший результат показала модель линейной регрессии.

Binary file not shown.

After

Width:  |  Height:  |  Size: 332 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

View File

@@ -0,0 +1,87 @@
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.datasets import make_circles
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
# Нелинейный генератор - позволяет сгенерировать такие классы-признаки,
# что признаки одного класса геометрически окружают признаки другого класса
X, y = make_circles(noise=0.2, factor=0.5, random_state=1)
X = StandardScaler().fit_transform(X)
# Разделение на обучающую и тестовую выборки (40% данных будет использовано для тестов)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=42)
# Создание необходимых для оценки моделей
def models():
# Линейная регрессия
linear_regression = LinearRegression()
linear_regression.fit(X_train, y_train)
# Полиномиальная регрессия (degree=3)
poly_regression = make_pipeline(PolynomialFeatures(degree=3), LinearRegression())
poly_regression.fit(X_train, y_train)
# Гребневая рекрессия (degree=3, alpha=1.0)
ridge_regression = make_pipeline(PolynomialFeatures(degree=3), Ridge(alpha=1.0))
ridge_regression.fit(X_train, y_train)
models = [linear_regression, poly_regression, ridge_regression]
# Предсказанные y
linear_predict = linear_regression.predict(X_test)
poly_predict = poly_regression.predict(X_test)
ridge_predict = ridge_regression.predict(X_test)
pred = [linear_predict, poly_predict, ridge_predict]
# Среднеквадратичные ошибки
lin_mse = mean_squared_error(y_test, linear_predict)
poly_mse = mean_squared_error(y_test, poly_predict)
rr_mse = mean_squared_error(y_test, ridge_predict)
mse = [lin_mse, poly_mse, rr_mse]
grafics(pred, mse, models)
# Графики
def grafics(pred, mse, models):
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(8, 8))
cm_color1 = ListedColormap(['r', 'g'])
plt.suptitle('Лабораторная работа 1. Вариант 2.', fontweight='bold')
# График данных
plt.subplot(2, 2, 1)
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_color1, marker='o', label='тренировочные данные')
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_color1, marker='x', label='тестовые данные')
plt.title('Датасет circles', fontsize=10, loc='left')
plt.legend(loc='upper left')
# График линейной модели
plt.subplot(2, 2, 2)
plt.scatter(X_test[:, 0], X_test[:, 1], c=pred[0], cmap=cm_color1)
plt.title('Линейная регрессия', fontsize=10, loc='left')
# График полиномиальной модели (degree=3)
plt.subplot(2, 2, 3)
plt.scatter(X_test[:, 0], X_test[:, 1], c=pred[1], cmap=cm_color1)
plt.title('Полиномиальная регрессия (degree=3)', fontsize=10, loc='left')
plt.xlabel('X')
plt.ylabel('Y')
# График гребневой модели (degree=3, alpha=1.0)
plt.subplot(2, 2, 4)
plt.scatter(X_test[:, 0], X_test[:, 1], c=pred[2], cmap=cm_color1)
plt.title('Гребневая регрессия (degree=3, alpha=1.0)', fontsize=10, loc='left')
plt.show()
# Сравнение качества
print('Линейная MSE:', mse[0])
print('Полиномиальная (degree=3) MSE:', mse[1])
print('Гребневая (degree=3, alpha=1.0) MSE:', mse[2])
models()

View File

@@ -0,0 +1,50 @@
## Лабораторная работа №2
### Ранжирование признаков
Вариант №2
## Задание:
Используя код из [1] (пункт «Решение задачи ранжирования признаков», стр. 205),
выполните ранжирование признаков с помощью указанных по варианту моделей.
Отобразите получившиеся значения\оценки каждого признака каждым методом\моделью и среднюю оценку.
Проведите анализ получившихся результатов.
Какие четыре признака оказались самыми важными по среднему значению?
(Названия\индексы признаков и будут ответом на задание).
Модели:
+ Линейная регрессия (LinearRegression)
+ Рекурсивное сокращение признаков (Recursive Feature Elimination RFE),
+ Сокращение признаков Случайными деревьями (Random Forest Regressor)
### Какие технологии использовались:
Используемые библиотеки:
* numpy
* pandas
* sklearn
### Как запустить:
* установить python, numpy, pandas, sklearn
* запустить проект (стартовая точка - main.py)
### Что делает программа:
* Генерирует данные и обучает модели: LinearRegression, Recursive Feature Elimination (RFE), Random Forest Regressor
* Производится ранжирование признаков с помощью моделей
* Отображение получившихся результатов: значения признаков для каждой модели и 4 самых важных признака по среднему значению
### 4 самых важных признака по среднему значению
* Признак №1 : 0.887
* Признак №4 : 0.821
* Признак №2 : 0.741
* Признак №11 : 0.600
#### Результаты работы программы:
![Result1](img_result_1.png)
![Result2](img_result_2.png)
![Result3](img_result_3.png)
![Result4](img_result_4.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 148 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 146 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 155 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

View File

@@ -0,0 +1,84 @@
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import MinMaxScaler
# Используя код из [1](пункт «Решение задачи ранжирования признаков», стр. 205),
# выполните ранжирование признаков с помощью указанных по варианту моделей.
# Отобразите получившиеся значения\оценки каждого признака каждым методом\моделью и среднюю оценку.
# Проведите анализ получившихся результатов.
# Какие четыре признака оказались самыми важными по среднему значению?
# (Названия\индексы признаков и будут ответом на задание).
# Линейная регрессия (LinearRegression)
# Рекурсивное сокращение признаков (Recursive Feature Elimination RFE),
# Сокращение признаков Случайными деревьями (Random Forest Regressor)
# Модели
model_LR = LinearRegression()
model_RFE = RFE(LinearRegression(), n_features_to_select=1)
model_RFR = RandomForestRegressor()
# Оценки моделей
model_scores = {}
# Cлучайные данные для регрессии
def generation_data_and_start():
np.random.seed(0)
size = 750
X_ = np.random.uniform(0, 1, (size, 14))
y_ = (10 * np.sin(np.pi * X_[:, 0] * X_[:, 1]) + 20 * (X_[:, 2] - .5) ** 2 +
10 * X_[:, 3] + 5 * X_[:, 4] ** 5 + np.random.normal(0, 1))
X_[:, 10:] = X_[:, :4] + np.random.normal(0, .025, (size, 4))
# DataFrame для данных
data = pd.DataFrame(X_)
data['y'] = y_
models_study_and_scores(data.drop('y', axis=1), data['y'])
print_scores()
# Обучение и оценка моделей
def models_study_and_scores(X, y):
# Линейная регрессия
model_LR.fit(X, y)
# Нормализация коэффициентов признаков
norm_coef = MinMaxScaler().fit_transform(np.abs(model_LR.coef_).reshape(-1, 1))
model_scores["Линейная регрессия"] = norm_coef.flatten()
# Рекурсивное сокращение признаков
model_RFE.fit(X, y)
# Нормализация рангов
norm_rank = 1 - (model_RFE.ranking_ - 1) / (np.max(model_RFE.ranking_) - 1)
model_scores["Рекурсивное сокращение признаков"] = norm_rank
# Сокращение признаков Случайными деревьями
model_RFR.fit(X, y)
# Нормализация значений важности признаков
norm_imp = MinMaxScaler().fit_transform(model_RFR.feature_importances_.reshape(-1, 1))
model_scores["Сокращение признаков Случайными деревьями"] = norm_imp.flatten()
# Вывод оценок
def print_scores():
print()
print(f"---- Оценки признаков ----")
print()
for name, scores in model_scores.items():
print(f"{name}:")
for feature, score in enumerate(scores, start=1):
print(f"Признак №{feature}: {score:.3f}")
print(f"Средняя оценка признаков: {np.mean(scores):.3f}")
print()
# 4 наиболее важных признака по среднему значению
scores = np.mean(list(model_scores.values()), axis=0)
sorted_f = sorted(enumerate(scores, start=1), key=lambda x: x[1], reverse=True)
imp_features = sorted_f[:4]
print("Четыре наиболее важных признака:")
for f, score in imp_features:
print(f"Признак №{f}: {score:.3f}")
generation_data_and_start()

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,83 @@
## Лабораторная работа №3
### Деревья решений
## Задание:
+ Задача регресcии: предсказание общего объема выбросов СО2 (Total) страной (Country) за определённый год (Year).
+ Задача классификации: предсказание процента выбросов СО2 от добычи нефти (procent oil) страной (Country)
с учётом общего объёма выбросов (Total) за определённый год (Year)
(или: какая часть выбросов придётся на добычу нефти).
## Данные:
Этот набор данных обеспечивает углубленный анализ глобальных выбросов CO2 на уровне страны, позволяя лучше понять,
какой вклад каждая страна вносит в глобальное совокупное воздействие человека на климат.
Он содержит информацию об общих выбросах, а также от добычи и сжигания угля, нефти, газа, цемента и других источников.
Данные также дают разбивку выбросов CO2 на душу населения по странам, показывая,
какие страны лидируют по уровням загрязнения, и определяют потенциальные области,
где следует сосредоточить усилия по сокращению выбросов.
Этот набор данных необходим всем, кто хочет получить информацию о своем воздействии на окружающую среду
или провести исследование тенденций международного развития.
Данные организованы с использованием следующих столбцов:
+ Country: название страны
+ ISO 3166-1 alpha-3: трехбуквенный код страны
+ Year: год данных исследования
+ Total: общее количество CO2, выброшенный страной в этом году
+ Coal: количество CO2, выброшенное углем в этом году
+ Oil: количество выбросов нефти
+ Gas: количество выбросов газа
+ Cement: количество выбросов цемента
+ Flaring: сжигание на факелах уровни выбросов
+ Other: другие формы, такие как промышленные процессы
+ Per Capita: столбец «на душу населения»
### Какие технологии использовались:
Используемые библиотеки:
* math
* pandas
* sklearn
### Как запустить:
* установить python, math, pandas, sklearn
* запустить проект (стартовая точка - main.py)
### Что делает программа:
* Загружает набор данных из файла 'CO2.csv', который содержит информацию о выбросах странами CO2 в год от различной промышленной деятельности.
* Очищает набор данных путём удаления строк с нулевыми значениями из набора.
* Добавляет в набор столбец с хеш-кодом наименования страны.
* Добавляет в набор столбец 'procent oil' - процент выбросов от добычи нефтепродуктов от общего объема выбросов страны за год (для возможности классификации).
* Выбирает набор признаков (features) из данных, которые будут использоваться для обучения моделей регрессии и классификации.
* Определяет задачу регрессии, где целевой переменной (task) является 'Total', и задачу классификации, где целевой переменной является 'procent oil'.
* Делит данные на обучающий и тестовый наборы для обеих задач с использованием функции train_test_split. Тестовый набор составляет 1% от исходных данных.
* Создает и обучает деревья решений для регрессии и классификации с использованием моделей DecisionTreeRegressor и DecisionTreeClassifier.
* Предсказывает значения целевой переменной на тестовых наборах для обеих задач.
* Оценивает качество моделей с помощью оценки точности (score) для регрессии и классификации.
* Выводит важности признаков для обеих задач.
#### Результаты работы программы:
![Result](img.png)
### Вывод:
Для задачи регрессии, где целью было предсказание общего объема выбросов СО2 страной за определённый год, модель дерева решений показала оценку точности равную 0.99. Это очень хороший показатель, значит модель вполне приемлемо предсказывает объём выбросов определенной страной в определенный год.
Для задачи классификации, где целью было предсказать какая часть выбросов придётся на добычу нефти, модель дерева решений показала более низкую точность - 18%. Это означает, что модель классификации не справляется с предсказанием доли выбросов от добычи нефтепродуктов на основе выбранных признаков.
Низкая точность указывает на необходимость улучшения модели или выбора других методов для решения задачи классификации.
Анализ важности признаков для задачи регрессии показал, что наибольший вклад в предсказание объёма выбросов страной за год вносит признак 'hashcode' или 'Country'.
Наименование страны оказывает наибольшее влияние на результаты модели.
Из этого можно сделать вывод, что количество выбросов CO2 определённой страной не сильно изменяется с течением времени
и каждая страна ежегодно производит примерно одинаковый объём выбросов CO2, что может быть связано с наличием месторождений ископаемых.
Для задачи классификации наибольший вклад в предсказание стоимости жилья вносят признаки 'Year' и 'Total'.
Эти признаки имеют наибольшее значение при определении классов по объёму выбросов от добычи нефтепродуктов.

Binary file not shown.

After

Width:  |  Height:  |  Size: 173 KiB

View File

@@ -0,0 +1,96 @@
import math
import pandas as pd
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split
### Деревья решений для регрессии и классификации
### с использованием моделей DecisionTreeRegressor и DecisionTreeClassifier
# Загрузка данных
data = pd.read_csv('CO2.csv')
data = data.dropna()
# Хеширование наименований стран
countries = {}
for country in data['Country']:
countries[country] = hash(country)
hash_column = []
for country in data['Country']:
hash_column.append(countries[country])
data.insert(loc=0, column='hashcode', value=hash_column)
# Добавление колонки "процент выбросов от добычи нефти в стране за год" для классификации
procent_oil = []
oils = []
totals = []
for oil in data['Oil']:
oils.append(oil)
for total in data['Total']:
totals.append(total)
for i in range(len(oils)):
procent_oil.append(math.ceil(oils[i]/totals[i]*100))
data.insert(loc=0, column='procent oil', value=procent_oil)
#------ Дерево решений для регрессии ------#
# ЗАДАЧА: предсказание общего объема выбросов СО2 страной за определённый год.
# Необходимые признаки для дерева регрессии
features_for_regr = data[['Year', 'hashcode']]
# Задача дерева регрессии
task_regr = data['Total']
# Разделение данных на обучающий и тестовый наборы для регрессии
X_train_r, X_test_r, \
y_train_r, y_test_r = train_test_split(features_for_regr, task_regr, test_size=0.01, random_state=250)
# Создание и обучение дерева решений для регрессии
model_regr = DecisionTreeRegressor(random_state=250)
model_regr.fit(X_train_r, y_train_r)
# Предсказание на тестовом наборе для регрессии
y_pred_r = model_regr.predict(X_test_r)
# Точечная оценка модели
score_r = model_regr.score(X_test_r, y_test_r)
print("\n\nТочность дерева регрессии:", score_r)
# Важности признаков для дерева регрессии
imp_regr = model_regr.feature_importances_
print("Важность признаков для дерева регрессии: ")
print("Важность 'Year':", imp_regr[0])
print("Важность 'hashcode':", imp_regr[1], "\n\n")
#------ Дерево решений для классификации ------#
# ЗАДАЧА: предсказание процента выбросов СО2 от добычи нефти страной за определённый год
# с учётом общего объёма выбросов за год (или: какая часть выбросов придётся на добычу нефти).
# Необходимые признаки для дерева классификации
features_for_class = data[['Total', 'Year', 'hashcode']]
# Задача дерева классификации
task_class = data['procent oil']
# Разделение данных на обучающий и тестовый наборы для классификации
X_train_c, X_test_c, \
y_train_c, y_test_c = train_test_split(features_for_class, task_class, test_size=0.01, random_state=250)
# Создание и обучение дерева решений для классификации
model_class = DecisionTreeClassifier(random_state=250)
model_class.fit(X_train_c, y_train_c)
# Предсказание на тестовом наборе для классификации
y_pred_c = model_class.predict(X_test_c)
# Точечная оценка модели
score_c = model_class.score(X_test_c, y_test_c)
print("Точность дерева классификации:", score_c)
# Важности признаков для дерева классификации
imp_class = model_class.feature_importances_
print("Важность признаков для дерева классификации: ")
print("Важность 'Total':", imp_class[0])
print("Важность 'Year':", imp_class[1])
print("Важность 'hashcode':", imp_class[2])

View File

@@ -0,0 +1,136 @@
Annotation
The Fellowship of the Ring is the first part of J.R.R.Tolkien's epic adventure, The Lord Of The Rings.
Sauron, the Dark Lord, has gathered to him all the Rings of Power - the means by which he intends to rule Middle-earth. All he lacks in his plans for dominion is the One Ring - the ring that rules them all - which has fallen into the hands of the hobbit Bilbo Baggins.
In a sleepy village in the Shire, young Frodo Baggins finds himself faced with an immense task, as his elderly cousin Bilbo entrusts the Ring to his care. Frodo must leave his home and make a perilous journey across Middle-earh to the Cracks of Doom, there to destroy the Ring and foil the Dark Lord in his evil purpose.
* * *
JRR Tolkien The Lord of the Ring 1 - The Fellowship of the Ring
Table of Contents
Foreward
This tale grew in the telling, until it became a history of the Great War of the Ring and included many glimpses of the yet more ancient history that preceded it. It was begun soon afterThe Hobbit was written and before its publication in 1937; but I did not go on with this sequel, for I wished first to complete and set in order the mythology and legends of the Elder Days, which had then been taking shape for some years. I desired to do this for my own satisfaction, and I had little hope that other people would be interested in this work, especially since it was primarily linguistic in inspiration and was begun in order to provide the necessary background of 'history' for Elvish tongues.
When those whose advice and opinion I sought correctedlittle hope tono hope, I went back to the sequel, encouraged by requests from readers for more information concerning hobbits and their adventures. But the story was drawn irresistibly towards the older world, and became an account, as it were, of its end and passing away before its beginning and middle had been told. The process had begun in the writing ofThe Hobbit, in which there were already some references to the older matter: Elrond, Gondolin, the High-elves, and the orcs, as well as glimpses that had arisen unbidden of things higher or deeper or darker than its surface: Durin, Moria, Gandalf, the Necromancer, the Ring. The discovery of the significance of these glimpses and of their relation to the ancient histories revealed the Third Age and its culmination in the War of the Ring.
Those who had asked for more information about hobbits eventually got it, but they had to wait a long time; for the composition ofThe Lord of the Rings went on at intervals during the years 1936 to 1949, a period in which I had many duties that I did not neglect, and many other interests as a learner and teacher that often absorbed me. The delay was, of course, also increased by the outbreak of war in 1939, by the end of which year the tale had not yet reached the end of Book One. In spite of the darkness of the next five years I found that the story could not now be wholly abandoned, and I plodded on, mostly by night, till I stood by Balin's tomb in Moria. There I halted for a long while. It was almost a year later when I went on and so came to Lothlorien and the Great River late in 1941. In the next year I wrote the first drafts of the matter that now stands as Book Three, and the beginnings of chapters I and III of Book Five; and there as the beacons flared in Anorien and Theoden came to Harrowdale I stopped. Foresight had failed and there was no time for thought.
It was during 1944 that, leaving the loose ends and perplexities of a war which it was my task to conduct, or at least to report, 1 forced myself to tackle the journey of Frodo to Mordor. These chapters, eventually to become Book Four, were written and sent out as a serial to my son, Christopher, then in South Africa with the RAF. Nonetheless it took another five years before the tale was brought to its present end; in that time I changed my house, my chair, and my college, and the days though less dark were no less laborious. Then when the 'end' had at last been reached the whole story had to be revised, and indeed largely re-written backwards. And it had to be typed, and re-typed: by me; the cost of professional typing by the ten-fingered was beyond my means.
The Lord of the Ringshas been read by many people since it finally appeared in print; and I should like to say something here with reference to the many opinions or guesses that I have received or have read concerning the motives and meaning of the tale. The prime motive was the desire of a tale-teller to try his hand at a really long story that would hold the attention of readers, amuse them, delight them, and at times maybe excite them or deeply move them. As a guide I had only my own feelings for what is appealing or moving, and for many the guide was inevitably often at fault. Some who have read the book, or at any rate have reviewed it, have found it boring, absurd, or contemptible; and I have no cause to complain, since I have similar opinions of their works, or of the kinds of writing that they evidently prefer. But even from the points of view of many who have enjoyed my story there is much that fails to please. It is perhaps not possible in a long tale to please everybody at all points, nor to displease everybody at the same points; for I find from the letters that I have received that the passages or chapters that are to some a blemish are all by others specially approved. The most critical reader of all, myself, now finds many defects, minor and major, but being fortunately under no obligation either to review the book or to write it again, he will pass over these in silence, except one that has been noted by others: the book is too short.
As for any inner meaning or 'message', it has in the intention of the author none. It is neither allegorical nor topical. As the story grew it put down roots (into the past) and threw out unexpected branches: but its main theme was settled from the outset by the inevitable choice of the Ring as the link between it andThe Hobbit. The crucial chapter, "The Shadow of the Past', is one of the oldest parts of the tale. It was written long before the foreshadow of 1939 had yet become a threat of inevitable disaster, and from that point the story would have developed along essentially the same lines, if that disaster had been averted. Its sources are things long before in mind, or in some cases already written, and little or nothing in it was modified by the war that began in 1939 or its sequels.
The real war does not resemble the legendary war in its process or its conclusion. If it had inspired or directed the development of the legend, then certainly the Ring would have been seized and used against Sauron; he would not have been annihilated but enslaved, and Barad-dur would not have been destroyed but occupied. Saruman, failing to get possession of the Ring, would m the confusion and treacheries of the time have found in Mordor the missing links in his own researches into Ring-lore, and before long he would have made a Great Ring of his own with which to challenge the self-styled Ruler of Middle-earth. In that conflict both sides would have held hobbits in hatred and contempt: they would not long have survived even as slaves.
Other arrangements could be devised according to the tastes or views of those who like allegory or topical reference. But I cordially dislike allegory in all its manifestations, and always have done so since I grew old and wary enough to detect its presence. I much prefer history, true or feigned, with its varied applicability to the thought and experience of readers. I think that many confuse 'applicability' with 'allegory'; but the one resides in the freedom of the reader, and the other in the purposed domination of the author.
An author cannot of course remain wholly unaffected by his experience, but the ways in which a story-germ uses the soil of experience are extremely complex, and attempts to define the process are at best guesses from evidence that is inadequate and ambiguous. It is also false, though naturally attractive, when the lives of an author and critic have overlapped, to suppose that the movements of thought or the events of times common to both were necessarily the most powerful influences. One has indeed personally to come under the shadow of war to feel fully its oppression; but as the years go by it seems now often forgotten that to be caught in youth by 1914 was no less hideous an experience than to be involved in 1939 and the following years. By 1918 all but one of my close friends were dead. Or to take a less grievous matter: it has been supposed by some that "The Scouring of the Shire' reflects the situation in England at the time when I was finishing my tale. It does not. It is an essential part of the plot, foreseen from the outset, though in the event modified by the character of Saruman as developed in the story without, need I say, any allegorical significance or contemporary political reference whatsoever. It has indeed some basis in experience, though slender (for the economic situation was entirely different), and much further back. The country in which I lived in childhood was being shabbily destroyed before I was ten, in days when motor-cars were rare objects (I had never seen one) and men were still building suburban railways. Recently I saw in a paper a picture of the last decrepitude of the once thriving corn-mill beside its pool that long ago seemed to me so important. I never liked the looks of the Young miller, but his father, the Old miller, had a black beard, and he was not named Sandyman.
The Lord of the Ringsis now issued in a new edition, and the opportunity has been taken of revising it. A number of errors and inconsistencies that still remained in the text have been corrected, and an attempt has been made to provide information on a few points which attentive readers have raised. I have considered all their comments and enquiries, and if some seem to have been passed over that may be because I have failed to keep my notes in order; but many enquiries could only be answered by additional appendices, or indeed by the production of an accessory volume containing much of the material that I did not include in the original edition, in particular more detailed linguistic information. In the meantime this edition offers this Foreword, an addition to the Prologue, some notes, and an index of the names of persons and places. This index is in intention complete in items but not in references, since for the present purpose it has been necessary to reduce its bulk. A complete index, making full use of the material prepared for me by Mrs. N. Smith, belongs rather to the accessory volume.
Prologue
This book is largely concerned with Hobbits, and from its pages a reader may discover much of their character and a little of their history. Further information will also be found in the selection from the Red Book of Westmarch that has already been published, under the title ofThe Hobbit . That story was derived from the earlier chapters of the Red Book, composed by Bilbo himself, the first Hobbit to become famous in the world at large, and called by himThere and Back Again, since they told of his journey into the East and his return: an adventure which later involved all the Hobbits in the great events of that Age that are here related.
Many, however, may wish to know more about this remarkable people from the outset, while some may not possess the earlier book. For such readers a few notes on the more important points are here collected from Hobbit-lore, and the first adventure is briefly recalled.
Hobbits are an unobtrusive but very ancient people, more numerous formerly than they are today; for they love peace and quiet and good tilled earth: a well-ordered and well-farmed countryside was their favourite haunt. They do not and did not understand or like machines more complicated than a forge-bellows, a water-mill, or a hand-loom, though they were skilful with tools. Even in ancient days they were, as a rule, shy of 'the Big Folk', as they call us, and now they avoid us with dismay and are becoming hard to find. They are quick of hearing and sharp-eyed, and though they are inclined to be fat and do not hurry unnecessarily, they are nonetheless nimble and deft in their movements. They possessed from the first the art of disappearing swiftly and silently, when large folk whom they do not wish to meet come blundering by; and this an they have developed until to Men it may seem magical. But Hobbits have never, in fact, studied magic of any kind, and their elusiveness is due solely to a professional skill that heredity and practice, and a close friendship with the earth, have rendered inimitable by bigger and clumsier races.
For they are a little people, smaller than Dwarves: less tout and stocky, that is, even when they are not actually much shorter. Their height is variable, ranging between two and four feet of our measure. They seldom now reach three feet; but they hive dwindled, they say, and in ancient days they were taller. According to the Red Book, Bandobras Took (Bullroarer), son of Isengrim the Second, was four foot five and able to ride a horse. He was surpassed in all Hobbit records only by two famous characters of old; but that curious matter is dealt with in this book.
As for the Hobbits of the Shire, with whom these tales are concerned, in the days of their peace and prosperity they were a merry folk. They dressed in bright colours, being notably fond of yellow and green; but they seldom wore shoes, since their feet had tough leathery soles and were clad in a thick curling hair, much like the hair of their heads, which was commonly brown. Thus, the only craft little practised among them was shoe-making; but they had long and skilful fingers and could make many other useful and comely things. Their faces were as a rule good-natured rather than beautiful, broad, bright-eyed, red-cheeked, with mouths apt to laughter, and to eating and drinking. And laugh they did, and eat, and drink, often and heartily, being fond of simple jests at all times, and of six meals a day (when they could get them). They were hospitable and delighted in parties, and in presents, which they gave away freely and eagerly accepted.
It is plain indeed that in spite of later estrangement Hobbits are relatives of ours: far nearer to us than Elves, or even than Dwarves. Of old they spoke the languages of Men, after their own fashion, and liked and disliked much the same things as Men did. But what exactly our relationship is can no longer be discovered. The beginning of Hobbits lies far back in the Elder Days that are now lost and forgotten. Only the Elves still preserve any records of that vanished time, and their traditions are concerned almost entirely with their own history, in which Men appear seldom and Hobbits are not mentioned at all. Yet it is clear that Hobbits had, in fact, lived quietly in Middle-earth for many long years before other folk became even aware of them. And the world being after all full of strange creatures beyond count, these little people seemed of very little importance. But in the days of Bilbo, and of Frodo his heir, they suddenly became, by no wish of their own, both important and renowned, and troubled the counsels of the Wise and the Great.
Those days, the Third Age of Middle-earth, are now long past, and the shape of all lands has been changed; but the regions in which Hobbits then lived were doubtless the same as those in which they still linger: the North-West of the Old World, east of the Sea. Of their original home the Hobbits in Bilbo's time preserved no knowledge. A love of learning (other than genealogical lore) was far from general among them, but there remained still a few in the older families who studied their own books, and even gathered reports of old times and distant lands from Elves, Dwarves, and Men. Their own records began only after the settlement of the Shire, and their most ancient legends hardly looked further back than their Wandering Days. It is clear, nonetheless, from these legends, and from the evidence of their peculiar words and customs, that like many other folk Hobbits had in the distant past moved westward. Their earliest tales seem to glimpse a time when they dwelt in the upper vales of Anduin, between the eaves of Greenwood the Great and the Misty Mountains. Why they later undertook the hard and perilous crossing of the mountains into Eriador is no longer certain. Their own accounts speak of the multiplying of Men in the land, and of a shadow that fell on the forest, so that it became darkened and its new name was Mirkwood.
Before the crossing of the mountains the Hobbits had already become divided into three somewhat different breeds: Harfoots, Stoors, and Fallohides. The Harfoots were browner of skin, smaller, and shorter, and they were beardless and bootless; their hands and feet were neat and nimble; and they preferred highlands and hillsides. The Stoors were broader, heavier in build; their feet and hands were larger, and they preferred flat lands and riversides. The Fallohides were fairer of skin and also of hair, and they were taller and slimmer than the others; they were lovers of trees and of woodlands.
The Harfoots had much to do with Dwarves in ancient times, and long lived in the foothills of the mountains. They moved westward early, and roamed over Eriador as far as Weathertop while the others were still in the Wilderland. They were the most normal and representative variety of Hobbit, and far the most numerous. They were the most inclined to settle in one place, and longest preserved their ancestral habit of living in tunnels and holes.
The Stoors lingered long by the banks of the Great River Anduin, and were less shy of Men. They came west after the Harfoots and followed the course of the Loudwater southwards; and there many of them long dwelt between Tharbad and the borders of Dunland before they moved north again.
The Fallohides, the least numerous, were a northerly branch. They were more friendly with Elves than the other Hobbits were, and had more skill in language and song than in handicrafts; and of old they preferred hunting to tilling. They crossed the mountains north of Rivendell and came down the River Hoarwell. In Eriador they soon mingled with the other kinds that had preceded them, but being somewhat bolder and more adventurous, they were often found as leaders or chieftains among clans of Harfoots or Stoors. Even in Bilbo's time the strong Fallohidish strain could still be noted among the greater families, such as the Tooks and the Masters of Buckland.
In the westlands of Eriador, between the Misty Mountains and the Mountains of Lune, the Hobbits found both Men and Elves. Indeed, a remnant still dwelt there of the Dunedain, the kings of Men that came over the Sea out of Westernesse; but they were dwindling fast and the lands of their North Kingdom were falling far and wide into waste. There was room and to spare for incomers, and ere long the Hobbits began to settle in ordered communities. Most of their earlier settlements had long disappeared and been forgotten in Bilbo's time; but one of the first to become important still endured, though reduced in size; this was at Bree and in the Chetwood that lay round about, some forty miles east of the Shire.
It was in these early days, doubtless, that the Hobbits learned their letters and began to write after the manner of the Dunedain, who had in their turn long before learned the art from the Elves. And in those days also they forgot whatever languages they had used before, and spoke ever after the Common Speech, the Westron as it was named, that was current through all the lands of the kings from Arnor to Gondor, and about all the coasts of the Sea from Belfalas to Lune. Yet they kept a few words of their own, as well as their own names of months and days, and a great store of personal names out of the past.
About this time legend among the Hobbits first becomes history with a reckoning of years. For it was in the one thousand six hundred and first year of the Third Age that the Fallohide brothers, Marcho and Blanco, set out from Bree; and having obtained permission from the high king at Fornost, they crossed the brown river Baranduin with a great following of Hobbits. They passed over the Bridge of Stonebows, that had been built in the days of the power of the North Kingdom, and they took ail the land beyond to dwell in, between the river and the Far Downs. All that was demanded of them was that they should keep the Great Bridge in repair, and all other bridges and roads, speed the king's messengers, and acknowledge his lordship.
Thus began theShire-reckoning, for the year of the crossing of the Brandywine (as the Hobbits turned the name) became Year One of the Shire, and all later dates were reckoned from it. At once the western Hobbits fell in love with their new land, and they remained there, and soon passed once more out of the history of Men and of Elves. While there was still a king they were in name his subjects, but they were, in fact, ruled by their own chieftains and meddled not at all with events in the world outside. To the last battle at Fornost with the Witch-lord of Angmar they sent some bowmen to the aid of the king, or so they maintained, though no tales of Men record it. But in that war the North Kingdom ended; and then the Hobbits took the land for their own, and they chose from their own chiefs a Thain to hold the authority of the king that was gone. There for a thousand years they were little troubled by wars, and they prospered and multiplied after the Dark Plague (S.R. 37) until the disaster of the Long Winter and the famine that followed it. Many thousands then perished, but the Days of Dearth (1158-60) were at the time of this tale long past and the Hobbits had again become accustomed to plenty. The land was rich and kindly, and though it had long been deserted when they entered it, it had before been well tilled, and there the king had once had many farms, cornlands, vineyards, and woods.
Forty leagues it stretched from the Far Downs to the Brandywine Bridge, and fifty from the northern moors to the marshes in the south. The Hobbits named it the Shire, as the region of the authority of their Thain, and a district of well-ordered business; and there in that pleasant comer of the world they plied their well-ordered business of living, and they heeded less and less the world outside where dark things moved, until they came to think that peace and plenty were the rule in Middle-earth and the right of all sensible folk. They forgot or ignored what little they had ever known of the Guardians, and of the labours of those that made possible the long peace of the Shire. They were, in fact, sheltered, but they had ceased to remember it.
At no time had Hobbits of any kind been warlike, and they had never fought among themselves. In olden days they had, of course, been often obliged to fight to maintain themselves in a hard world; but in Bilbo's time that was very ancient history. The last battle, before this story opens, and indeed the only one that had ever been fought within the borders of the Shire, was beyond living memory: the Battle of Greenfields, S.R. 1147, in which Bandobras Took routed an invasion of Orcs. Even the weathers had grown milder, and the wolves that had once come ravening out of the North in bitter white winters were now only a grandfather's tale. So, though there was still some store of weapons in the Shire, these were used mostly as trophies, hanging above hearths or on walls, or gathered into the museum at Michel Delving. The Mathom-house it was called; for anything that Hobbits had no immediate use for, but were unwilling to throw away, they called amathom . Their dwellings were apt to become rather crowded with mathoms, and many of the presents that passed from hand to hand were of that son.
Nonetheless, ease and peace had left this people still curiously tough. They were, if it came to it, difficult to daunt or to kill; and they were, perhaps, so unwearyingly fond of good things not least because they could, when put to it, do without them, and could survive rough handling by grief, foe, or weather in a way that astonished those who did not know them well and looked no further than their bellies and their well-fed faces. Though slow to quarrel, and for sport killing nothing that lived, they were doughty at bay, and at need could still handle arms. They shot well with the bow, for they were keen-eyed and sure at the mark. Not only with bows and arrows. If any Hobbit stooped for a stone, it was well to get quickly under cover, as all trespassing beasts knew very well.
All Hobbits had originally lived in holes in the ground, or so they believed, and in such dwellings they still felt most at home; but in the course of time they had been obliged to adopt other forms of abode. Actually in the Shire in Bilbo's days it was, as a rule, only the richest and the poorest Hobbits that maintained the old custom. The poorest went on living in burrows of the most primitive kind, mere holes indeed, with only one window or none; while the well-to-do still constructed more luxurious versions of the simple diggings of old. But suitable sites for these large and ramifying tunnels (orsmials as they called them) were not everywhere to be found; and in the flats and the low-lying districts the Hobbits, as they multiplied, began to build above ground. Indeed, even in the hilly regions and the older villages, such as Hobbiton or Tuckborough, or in the chief township of the Shire, Michel Delving on the White Downs, there were now many houses of wood, brick, or stone. These were specially favoured by millers, smiths, ropers, and cartwrights, and others of that sort; for even when they had holes to live in. Hobbits had long been accustomed to build sheds and workshops.
The habit of building farmhouses and barns was said to have begun among the inhabitants of the Marish down by the Brandywine. The Hobbits of that quarter, the Eastfarthing, were rather large and heavy-legged, and they wore dwarf-boots in muddy weather. But they were well known to be Stoors in a large part of their blood, as indeed was shown by the down that many grew on their chins. No Harfoot or Fallohide had any trace of a beard. Indeed, the folk of the Marish, and of Buckland, east of the River, which they afterwards occupied, came for the most part later into the Shire up from south-away; and they still had many peculiar names and strange words not found elsewhere in the Shire.
It is probable that the craft of building, as many other crafts beside, was derived from the Dunedain. But the Hobbits may have learned it direct from the Elves, the teachers of Men in their youth. For the Elves of the High Kindred had not yet forsaken Middle-earth, and they dwelt still at that time at the Grey Havens away to the west, and in other places within reach of the Shire. Three Elf-towers of immemorial age were still to be seen on the Tower Hills beyond the western marches. They shone far off in the moonlight. The tallest was furthest away, standing alone upon a green mound. The Hobbits of the Westfarthing said that one could see the Sea from the lop of that tower; but no Hobbit had ever been known to climb it. Indeed, few Hobbits had ever seen or sailed upon the Sea, and fewer still had ever returned to report it. Most Hobbits regarded even rivers and small boats with deep misgivings, and not many of them could swim. And as the days of the Shire lengthened they spoke less and less with the Elves, and grew afraid of them, and distrustful of those that had dealings with them; and the Sea became a word of fear among them, and a token of death, and they turned their faces away from the hills in the west.
The craft of building may have come from Elves or Men, but the Hobbits used it in their own fashion. They did not go in for towers. Their houses were usually long, low, and comfortable. The oldest kind were, indeed, no more than built imitations ofsmials, thatched with dry grass or straw, or roofed with turves, and having walls somewhat bulged. That stage, however, belonged to the early days of the Shire, and hobbit-building had long since been altered, improved by devices, learned from Dwarves, or discovered by themselves. A preference for round windows, and even round doors, was the chief remaining peculiarity of hobbit-architecture.
The houses and the holes of Shire-hobbits were often large, and inhabited by large families. (Bilbo and Frodo Baggins were as bachelors very exceptional, as they were also in many other ways, such as their friendship with the Elves.) Sometimes, as in the case of the Tooks of Great Smials, or the Brandybucks of Brandy Hall, many generations of relatives lived in (comparative) peace together in one ancestral and many-tunnelled mansion. All Hobbits were, in any case, clannish and reckoned up their relationships with great care. They drew long and elaborate family-trees with innumerable branches. In dealing with Hobbits it is important to remember who is related to whom, and in what degree. It would be impossible in this book to set out a family-tree that included even the more important members of the more important families at the time which these tales tell of. The genealogical trees at the end of the Red Book of Westmarch are a small book in themselves, and all but Hobbits would find them exceedingly dull. Hobbits delighted in such things, if they were accurate: they liked to have books filled with things that they already knew, set out fair and square with no contradictions.
There is another astonishing thing about Hobbits of old that must be mentioned, an astonishing habit: they imbibed or inhaled, through pipes of clay or wood, the smoke of the burning leaves of a herb, which they calledpipe-weed orleaf, a variety probably ofNicotiana. A great deal of mystery surrounds the origin of this peculiar custom, or 'art' as the Hobbits preferred to call it. All that could be discovered about it in antiquity was put together by Meriadoc Brandybuck (later Master of Buckland), and since he and the tobacco of the Southfarthing play a part in the history that follows, his remarks in the introduction to hisHerblore of the Shire may be quoted.
"This," he says, 'is the one art that we can certainly claim to be our own invention. When Hobbits first began to smoke is not known, all the legends and family histories take it for granted; for ages folk in the Shire smoked various herbs, some fouler, some sweeter. But all accounts agree that Tobold Hornblower of Longbottom in the Southfarthing first grew the true pipe-weed in his gardens in the days of Isengrim the Second, about the year 1070 of Shire-reckoning. The best home-grown still comes from that district, especially the varieties now known as Longbottom Leaf, Old Toby, and Southern Star.
"How Old Toby came by the plant is not recorded, for to his dying day he would not tell. He knew much about herbs, but he was no traveller. It is said that in his youth he went often to Bree, though he certainly never went further from the Shire than that. It is thus quite possible that he learned of this plant in Bree, where now, at any rate, it grows well on the south slopes of the hill. The Bree-hobbits claim to have been the first actual smokers of the pipe-weed. They claim, of course, to have done everything before the people of the Shire, whom they refer to as "colonists"; but in this case their claim is, I think, likely to be true. And certainly it was from Bree that the art of smoking the genuine weed spread in the recent centuries among Dwarves and such other folk, Rangers, Wizards, or wanderers, as still passed to and fro through that ancient road-meeting. The home and centre of the an is thus to be found in the old inn of Bree,The Prancing Pony, that has been kept by the family of Butterbur from time beyond record.
"All the same, observations that I have made on my own many journeys south have convinced me that the weed itself is not native to our parts of the world, but came northward from the lower Anduin, whither it was, I suspect, originally brought over Sea by the Men of Westernesse. It grows abundantly in Gondor, and there is richer and larger than in the North, where it is never found wild, and flourishes only in warm sheltered places like Longbottom. The Men of Gondor call itsweet galenas, and esteem it only for the fragrance of its flowers. From that land it must have been carried up the Greenway during the long centuries between the coming of Elendil and our own day. But even the Dunedain of Gondor allow us this credit: Hobbits first put it into pipes. Not even the Wizards first thought of that before we did. Though one Wizard that I knew took up the art long ago, and became as skilful in it as in all other things that he put his mind to."
The Shire was divided into four quarters, the Farthings already referred to. North, South, East, and West; and these again each into a number of folklands, which still bore the names of some of the old leading families, although by the time of this history these names were no longer found only in their proper folklands. Nearly all Tooks still lived in the Tookland, but that was not true of many other families, such as the Bagginses or the Boffins. Outside the Farthings were the East and West Marches: the Buckland (see beginning of Chapter V, Book I); and the Westmarch added to the Shire in S.R. 1462.
The Shire at this time had hardly any 'government'. Families for the most part managed their own affairs. Growing food and eating it occupied most of their time. In other matters they were, as a rule, generous and not greedy, but contented and moderate, so that estates, farms, workshops, and small trades tended to remain unchanged for generations.
There remained, of course, the ancient tradition concerning the high king at Fornost, or Norbury as they called it, away north of the Shire. But there had been no king for nearly a thousand years, and even the ruins of Kings' Norbury were covered with grass. Yet the Hobbits still said of wild folk and wicked things (such as trolls) that they had not heard of the king. For they attributed to the king of old all their essential laws; and usually they kept the laws of free will, because they were The Rules (as they said), both ancient and just.
It is true that the Took family had long been pre-eminent; for the office of Thain had passed to them (from the Oldbucks) some centuries before, and the chief Took had borne that title ever since. The Thain was the master of the Shire-moot, and captain of the Shire-muster and the Hobbitry-in-arms, but as muster and moot were only held in times of emergency, which no longer occurred, the Thainship had ceased to be more than a nominal dignity. The Took family was still, indeed, accorded a special respect, for it remained both numerous and exceedingly wealthy, and was liable to produce in every generation strong characters of peculiar habits and even adventurous temperament. The latter qualities, however, were now rather tolerated (in the rich) than generally approved. The custom endured, nonetheless, of referring to the head of the family as The Took, and of adding to his name, if required, a number: such as Isengrim the Second, for instance.
The only real official in the Shire at this date was the Mayor of Michel Delving (or of the Shire), who was elected every seven years at the Free Fair on the White Downs at the Lithe, that is at Midsummer. As mayor almost his only duty was to preside at banquets, given on the Shire-holidays, which occurred at frequent intervals. But the offices of Postmaster and First Shirriff were attached to the mayoralty, so that he managed both the Messenger Service and the Watch. These were the only Shire-services, and the Messengers were the most numerous, and much the busier of the two. By no means all Hobbits were lettered, but those who were wrote constantly to all their friends (and a selection of their relations) who lived further off than an afternoon's walk.
The Shirriffs was the name that the Hobbits gave to their police, or the nearest equivalent that they possessed. They had, of course, no uniforms (such things being quite unknown), only a feather in their caps; and they were in practice rather haywards than policemen, more concerned with the strayings of beasts than of people. There were in all the Shire only twelve of them, three in each Farthing, for Inside Work. A rather larger body, varying at need, was employed to 'beat the bounds', and to see that Outsiders of any kind, great or small, did not make themselves a nuisance.
At the time when this story begins the Bounders, as they were called, had been greatly increased. There were many reports and complaints of strange persons and creatures prowling about the borders, or over them: the first sign that all was not quite as it should be, and always had been except in tales and legends of long ago. Few heeded the sign, and not even Bilbo yet had any notion of what it portended. Sixty years had passed since he set out on his memorable journey, and he was old even for Hobbits, who reached a hundred as often as not; but much evidently still remained of the considerable wealth that he had brought back. How much or how little he revealed to no one, not even to Frodo his favourite 'nephew'. And he still kept secret the ring that he bad found.
As is told in The Hobbit, there came one day to Bilbo's door the great Wizard, Gandalf the Grey, and thirteen dwarves with him: none other, indeed, than Thorin Oakenshield, descendant of kings, and his twelve companions in exile. With them he set out, to his own lasting astonishment, on a morning of April, it being then the year 1341 Shire-reckoning, on a quest of great treasure, the dwarf-hoards of the Kings under the Mountain, beneath Erebor in Dale, far off in the East. The quest was successful, and the Dragon that guarded the hoard was destroyed. Yet, though before all was won the Battle of Five Armies was fought, and Thorin was slain, and many deeds of renown were done, the matter would scarcely have concerned later history, or earned more than a note in the long annals of the Third Age, but for an 'accident' by the way. The party was assailed by Orcs in a high pass of the Misty Mountains as they went towards Wilderland; and so it happened that Bilbo was lost for a while in the black orc-mines deep under the mountains, and there, as he groped in vain in the dark, he put his hand on a ring, lying on the floor of a tunnel. He put it in his pocket. It seemed then like mere luck.
Trying to find his way out. Bilbo went on down to the roots of the mountains, until he could go no further. At the bottom of the tunnel lay a cold lake far from the light, and on an island of rock in the water lived Gollum. He was a loathsome little creature: he paddled a small boat with his large flat feet, peering with pale luminous eyes and catching blind fish with his long fingers, and eating them raw. He ate any living thing, even orc, if he could catch it and strangle it without a struggle. He possessed a secret treasure that had come to him long ages ago, when he still lived in the light: a ring of gold that made its wearer invisible. It was the one thing he loved, his 'precious', and he talked to it, even when it was not with him. For he kept it hidden safe in a hole on his island, except when he was hunting or spying on the ores of the mines.
Maybe he would have attacked Bilbo at once, if the ring had been on him when they met; but it was not, and the hobbit held in his hand an Elvish knife, which served him as a sword. So to gain time Gollum challenged Bilbo to the Riddle-game, saying that if he asked a riddle which Bilbo could not guess, then he would kill him and eat him; but if Bilbo defeated him, then he would do as Bilbo wished: he would lead him to a way out of the tunnels.
Since he was lost in the dark without hope, and could neither go on nor back. Bilbo accepted the challenge; and they asked one another many riddles. In the end Bilbo won the game, more by luck (as it seemed) than by wits; for he was stumped at last for a riddle to ask, and cried out, as his hand came upon the ring he lad picked up and forgotten:What haw I got in my pocket? This Gollum failed to answer, though he demanded three guesses.
The Authorities, it is true, differ whether this last question was a mere 'question' and not a 'riddle' according to the strict rules of the Game; but all agree that, after accepting it and trying to guess the answer, Gollum was bound by his promise. And Bilbo pressed him to keep his word; for the thought came to him that this slimy creature might prove false, even though such promises were held sacred, and of old all but the wickedest things feared to break them. But after ages alone in the dark Gollum's heart was black, and treachery was in it. He slipped away, and returned to the island, of which Bilbo knew nothing, not far off in the dark water. There, he thought, lay his ring. He was hungry now, and angry, and once his 'precious' was with him he would not fear any weapon at all.
But the ring was not on the island; he had lost it, it was gone. His screech sent a shiver down Bilbo's back, though he did not yet understand what had happened. But Gollum had at last leaped to a guess, too late.What has it got in its pocketses? he cried. The light in his eyes was like a green flame as he sped back to murder the hobbit and recover his 'precious'. Just in time Bilbo saw his peril, and he fled blindly up the passage away from the water; and once more he was saved by his luck. For just as he ran he put his hand in his pocket, and the ring slipped quietly on to his finger. So it was that Gollum passed him without seeing him, and went to guard the way out, lest the 'thief' should escape. Warily Bilbo followed him, as he went along, cursing, and talking to himself about his 'precious'; from which talk at last even Bilbo guessed the truth, and hope came to him in the darkness: he himself had found the marvellous ring and a chance of escape from the orcs and from Gollum.
At length they came to a halt before an unseen opening that led to the lower gates of the mines, on the eastward side of the mountains. There Gollum crouched at bay, smelling and listening; and Bilbo was tempted to slay him with his sword. But pity stayed him, and though he kept the ring, in which his only hope lay, he would not use it to help him kill the wretched creature at a disadvantage. In the end, gathering his courage, he leaped over Gollum in the dark, and fled away down the passage, pursued by his enemy's cries of hate and despair:Thief, thief! Baggins! We hates it for ever!
Now it is a curious fact that this is not the story as Bilbo first told it to his companions. To them his account was that Gollum had promised to give him apresent, if he won the game; but when Gollum went to fetch it from his island he found the treasure was gone: a magic ring, which had been given to him long ago on his birthday. Bilbo guessed that this was the very ring that he had found, and as he had won the game, it was already his by right. But being in a tight place, he said nothing about it, and made Gollum show him the way out, as a reward instead of a present. This account Bilbo set down in his memoirs, and he seems never to have altered it himself, not even after the Council of Elrond. Evidently it still appeared in the original Red Book, as it did in several of the copies and abstracts. But many copies contain the true account (as an alternative), derived no doubt from notes by Frodo or Samwise, both of whom learned the truth, though they seem to have been unwilling to delete anything actually written by the old hobbit himself.
Gandalf, however, disbelieved Bilbo's first story, as soon as he heard it, and he continued to be very curious about the ring. Eventually he got the true tale out of Bilbo after much questioning, which for a while strained their friendship; but the wizard seemed to think the truth important. Though he did not say so to Bilbo, he also thought it important, and disturbing, to find that the good hobbit had not told the truth from the first: quite contrary to his habit. The idea of a 'present' was not mere hobbitlike invention, all the same. It was suggested to Bilbo, as he confessed, by Gollum's talk that he overheard; for Gollum did, in fact, call the ring his 'birthday present', many times. That also Gandalf thought strange and suspicious; but he did not discover the truth in this point for many more years, as will be seen in this book.
Of Bilbo's later adventures little more need be said here. With the help of the ring he escaped from the orc-guards at the gate and rejoined his companions. He used the ring many times on his quest, chiefly for the help of his friends; but he kept it secret from them as long as he could. After his return to his home he never spoke of it again to anyone, save Gandalf and Frodo; and no one else in the Shire knew of its existence, or so he believed. Only to Frodo did he show the account of his Journey that he was writing.
His sword, Sting, Bilbo hung over his fireplace, and his coat of marvellous mail, the gift of the Dwarves from the Dragon-hoard, he lent to a museum, to the Michel Delving Mathom-house in fact. But he kept in a drawer at Bag End the old cloak and hood that he had worn on his travels; and the ring, secured by a fine chain, remained in his pocket.
He returned to his home at Bag End on June the 22nd in his fifty-second year (S.R. 1342), and nothing very notable occurred in the Shire until Mr. Baggins began the preparations for the celebration of his hundred-and-eleventh birthday (S.R. 1401). At this point this History begins.
At the end of the Third Age the part played by the Hobbits in the great events that led to the inclusion of the Shire in the Reunited Kingdom awakened among them a more widespread interest in their own history; and many of their traditions, up to that time still mainly oral, were collected and Written down. The greater families were also concerned with events in the Kingdom at large, and many of their members studied its ancient histories and legends. By the end of the first century of the Fourth Age there were already to be found in the Shire several libraries that contained many historical books and records.
The largest of these collections were probably at Undertowers, at Great Smials, and at Brandy Hall. This account of the end of the Third Age is drawn mainly from the Red Book of Westmarch. That most important source for the history of the War of the Ring was so called because it was long preserved at Undertowers, the home of the Fairbairns, Wardens of the Westmarch. It was in origin Bilbo's private diary, which he took with him to Rivendell. Frodo brought it back to the Shire, together with many loose leaves of notes, and during S.R. 1420-1 he nearly filled its pages with his account of the War. But annexed to it and preserved with it, probably m a single red case, were the three large volumes, bound in red leather, that Bilbo gave to him as a parting gift. To these four volumes there was added in Westmarch a fifth containing commentaries, genealogies, and various other matter concerning the hobbit members of the Fellowship.
The original Red Book has not been preserved, but many copies were made, especially of the first volume, for the use of the descendants of the children of Master Samwise. The most important copy, however, has a different history. It was kept at Great Smials, but it was written in Condor, probably at the request of the great-grandson of Peregrin, and completed in S.R. 1592 (F.A. 172). Its southern scribe appended this note: Findegil, King's Writer, finished this work in IV 172. It is an exact copy in all details of the Thain's Book m Minas Tirith. That book was a copy, made at the request of King Elessar, of the Red Book of the Periannath, and was brought to him by the Thain Peregrin when he retired to Gondor in IV 64.
The Thain's Book was thus the first copy made of the Red Book and contained much that was later omitted or lost. In Minas Tirith it received much annotation, and many corrections, especially of names, words, and quotations in the Elvish languages; and there was added to it an abbreviated version of those parts ofThe Tale of Aragorn and Arwen which lie outside the account of the War. The full tale is stated to have been written by Barahir, grandson of the Steward Faramir, some time after the passing of the King. But the chief importance of Findegil's copy is that it alone contains the whole of Bilbo's "Translations from the Elvish'. These three volumes were found to be a work of great skill and learning in which, between 1403 and 1418, he had used all the sources available to him in Rivendell, both living and written. But since they were little used by Frodo, being almost entirely concerned with the Elder Days, no more is said of them here.
Since Meriadoc and Peregrin became the heads of their great families, and at the same time kept up their connexions with Rohan and Gondor, the libraries at Bucklebury and Tuckborough contained much that did not appear in the Red Book. In Brandy Hall there were many works dealing with Eriador and the history of Rohan. Some of these were composed or begun by Meriadoc himself, though in the Shire he was chiefly remembered for hisHerblore of the Shire, and for hisReckoning of Years m which he discussed the relation of the calendars of the Shire and Bree to those of Rivendell, Gondor, and Rohan. He also wrote a short treatise onOld Words and Names in the Shire, having special interest in discovering the kinship with the language of the Rohirrim of such 'shire-words' asmathom and old elements in place names.
At Great Smials the books were of less interest to Shire-folk, though more important for larger history. None of them was written by Peregrin, but he and his successors collected many manuscripts written by scribes of Gondor: mainly copies or summaries of histories or legends relating to Elendil and his heirs. Only here in the Shire were to be found extensive materials for the history of Numenor and the arising of Sauron. It was probably at Great Smials thatThe Tale of Years was put together, with the assistance of material collected by Meriadoc. Though the dates given are often conjectural, especially for the Second Age, they deserve attention. It is probable that Meriadoc obtained assistance and information from Rivendell, which he visited more than once. There, though Elrond had departed, his sons long remained, together with some of the High-elven folk. It is said that Celeborn went to dwell there after the departure of Galadriel; but there is no record of the day when at last he sought the Grey Havens, and with him went the last living memory of the Elder Days in Middle-earth.

View File

@@ -0,0 +1,21 @@
## Лабораторная работа 7. Вариант 5.
### Задание
Выбрать художественный текст(четные варианты –русскоязычный, нечетные –англоязычный)и обучить на нем рекуррентную
нейронную сеть для решения задачи генерации. Подобрать архитектуру и параметры так, чтобы приблизиться к максимально
осмысленному результату.
В завершении подобрать компромиссную архитектуру, справляющуюся достаточно хорошо с обоими видами текстов.
### Ход работы
Для английской модели был взят пролог Властелина колец. Модель хоть им получилась удачнее, чем на русском, но время
обучение составило чуть больше часа.
#### Результат rus
здесь был человек прежде всего всего обманывает самого себя ибо он думает что успешно соврал а люди поняли и из
деликатности промолчали промолчали промолчали промолчали промолчали какие его неудачи могут его постигнуть не тому
помочь много ли людей не нуждаются в помощи помощи было врать врать врать молчания молчания а внести то
#### Результат eng
the harfoots were browner of skin smaller and shorter and they were beardless and bootless their hands and feet were
neat and nimble and they preferred highlands and hillsides the stoors were broader heavier in build their feet and
hands were larger and they preferred flat lands and riversides

View File

@@ -0,0 +1,70 @@
import numpy as np
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Embedding, LSTM, Dense
from keras.utils import to_categorical
with open('ru.txt', "r", encoding='utf-8') as file:
text = file.read()
# Предварительная обработка текста (в зависимости от вашей задачи)
# Создание словаря для отображения слов в индексы и обратно
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])
total_words = len(tokenizer.word_index) + 1
# Подготовка данных для обучения (в зависимости от вашей задачи)
input_sequences = []
for line in text.split('\n'):
token_list = tokenizer.texts_to_sequences([line])[0]
for i in range(1, len(token_list)):
n_gram_sequence = token_list[:i+1]
input_sequences.append(n_gram_sequence)
max_sequence_length = max([len(x) for x in input_sequences])
input_sequences = pad_sequences(input_sequences, maxlen=max_sequence_length, padding='pre')
X, y = input_sequences[:,:-1],input_sequences[:,-1]
y = to_categorical(y, num_classes=total_words)
# Определение архитектуры модели
model = Sequential()
model.add(Embedding(total_words, 50, input_length=max_sequence_length-1))
model.add(LSTM(100))
model.add(Dense(total_words, activation='softmax'))
# Компиляция модели
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Обучение модели
model.fit(X, y, epochs=100, verbose=2)
# Генерация текста с использованием обученной модели
def generate_text(seed_text, next_words, model_, max_sequence_length):
for _ in range(next_words):
token_list = tokenizer.texts_to_sequences([seed_text])[0]
token_list = pad_sequences([token_list], maxlen=max_sequence_length - 1, padding='pre')
predicted_probs = model.predict(token_list, verbose=0)[0]
predicted_index = np.argmax(predicted_probs)
output_word = ""
for word, index in tokenizer.word_index.items():
if index == predicted_index:
output_word = word
break
seed_text += " " + output_word
return seed_text
# Пример генерации текста (замените seed_text и next_words на свои значения)
seed_text = "здесь был"
next_words = 50
generated_text = generate_text(seed_text, next_words, model, max_sequence_length)
print(generated_text)

View File

@@ -0,0 +1,9 @@
Когда человек сознательно или интуитивно выбирает себе в жизни какую-то цель, жизненную задачу, он невольно дает себе оценку. По тому, ради чего человек живет, можно судить и о его самооценке - низкой или высокой.
Если человек живет, чтобы приносить людям добро, облегчать их страдания, давать людям радость, то он оценивает себя на уровне этой своей человечности. Он ставит себе цель, достойную человека.
Только такая цель позволяет человеку прожить свою жизнь с достоинством и получить настоящую радость. Да, радость! Подумайте: если человек ставит себе задачей увеличивать в жизни добро, приносить людям счастье, какие неудачи могут его постигнуть? Не тому помочь? Но много ли людей не нуждаются в помощи?
Если жить только для себя, своими мелкими заботами о собственном благополучии, то от прожитого не останется и следа. Если же жить для других, то другие сберегут то, чему служил, чему отдавал силы.
Можно по-разному определять цель своего существования, но цель должна быть. Надо иметь и принципы в жизни. Одно правило в жизни должно быть у каждого человека, в его цели жизни, в его принципах жизни, в его поведении: надо прожить жизнь с достоинством, чтобы не стыдно было вспоминать.
Достоинство требует доброты, великодушия, умения не быть эгоистом, быть правдивым, хорошим другом, находить радость в помощи другим.
Ради достоинства жизни надо уметь отказываться от мелких удовольствий и немалых тоже… Уметь извиняться, признавать перед другими ошибку - лучше, чем врать.
Обманывая, человек прежде всего обманывает самого себя, ибо он думает, что успешно соврал, а люди поняли и из деликатности промолчали.
Жизнь - прежде всего творчество, но это не значит, что каждый человек, чтобы жить, должен родиться художником, балериной или ученым. Можно творить просто добрую атмосферу вокруг себя. Человек может принести с собой атмосферу подозрительности, какого-то тягостного молчания, а может внести сразу радость, свет. Вот это и есть творчество.

View File

@@ -0,0 +1,57 @@
# Лабораторная 1
## Задание
Сгенерируйте определенный тип данных и сравнить на нем 3 модели (по варианту 9). Построить графики, отобразить качество моделей, объяснить полученные результаты
## Данные
make_classification (n_samples=500, n_features=2, n_redundant=0, n_informative=2, random_state=rs, n_clusters_per_class=1)
- Модели:
- - Персептрон
- - Многослойный персептрон с 10-ю нейронами в скрытом слое (alpha = 0.01)
- - Многослойный персептрон со 100-а нейронами в скрытом слое (alpha =0.01)
## Описание Программы
### Используемые библиотеки
- scikit-learn
- numpy
- matplotlib
### Шаги программы
1. **Генерация данных:**
- Используется функция `make_classification` из библиотеки scikit-learn.
- Создаются два признака, и данные разделяются на два класса.
- Используется 500 сэмплов.
2. **Разделение данных:**
- Данные разделяются на обучающий и тестовый наборы с использованием `train_test_split` из scikit-learn.
- Размер тестового набора установлен в 20% от общего размера.
3. **Создание моделей:**
- Три модели создаются с использованием библиотеки scikit-learn:
- Персептрон
- Многослойный персептрон с 10 нейронами в скрытом слое
- Многослойный персептрон с 100 нейронами в скрытом слое
4. **Обучение и Оценка:**
- Каждая модель обучается на обучающем наборе данных.
- Производится оценка каждой модели на тестовом наборе с использованием метрики точности (`accuracy`).
5. **Визуализация данных и Границ Решения:**
- Для каждой модели строится график, на котором отображаются точки тестового набора и граница решения модели.
- Каждый график снабжен названием, указывающим на модель и ее точность.
### Запуск программы
- Склонировать или скачать код `main.py`.
- Запустите файл в среде, поддерживающей выполнение Python.
### Результаты
- Можно проанализировать точность на графиках и понять,
что самая точная из 3 моделей оказалась Многослойный персептрон со 100-а нейронами в скрытом слое.
- Многослойный персептрон со 100-а нейронами: 0.96
- Многослойный персептрон с 10-ю нейронами: 0.90
- Персептрон: 0.86

View File

@@ -0,0 +1,54 @@
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Perceptron
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Установите random_state, чтобы результаты были воспроизводимыми
rs = 42
# Генерация данных
X, y = make_classification(
n_samples=500, n_features=2, n_redundant=0, n_informative=2,
random_state=rs, n_clusters_per_class=1
)
# Разделение данных на обучающий и тестовый наборы
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=rs)
# Создание моделей
models = [
('Perceptron', Perceptron(random_state=rs)),
('MLP (10 neurons)', MLPClassifier(hidden_layer_sizes=(10,), alpha=0.01, random_state=rs)),
('MLP (100 neurons)', MLPClassifier(hidden_layer_sizes=(100,), alpha=0.01, random_state=rs))
]
# Обучение и оценка моделей
results = {}
plt.figure(figsize=(15, 5))
for i, (name, model) in enumerate(models, 1):
plt.subplot(1, 3, i)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
results[name] = accuracy
# Разбиение точек на классы
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=plt.cm.Paired, edgecolors='k')
# Построение границы решения для каждой модели
h = .02 # Шаг сетки
x_min, x_max = X_test[:, 0].min() - 1, X_test[:, 0].max() + 1
y_min, y_max = X_test[:, 1].min() - 1, X_test[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
plt.title(f'{name}\nAccuracy: {accuracy:.2f}')
plt.show()

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

View File

@@ -0,0 +1,60 @@
# Лабораторная 2
## Вариант 9
## Задание
Выполните ранжирование признаков с помощью указанных по вариантумоделей. Отобразите получившиеся значения\оценки каждого признака каждым методом\моделью и среднюю оценку. Проведите анализ получившихся результатов.
## Модели
- Лассо (Lasso)
- Сокращение признаков Случайными деревьями (Random Forest Regressor)
- Линейная корреляция (f_regression)
## Описание Программы
Данная программа решает задачу ранжирования признаков в задаче регрессии, используя три различные модели: Lasso, случайные деревья (Random Forest) и линейную корреляцию (f_regression). Каждая модель ранжирует признаки в соответствии с их важностью, а затем производится вычисление среднего ранжирования для каждого признака на основе результатов всех моделей.
### Используемые библиотеки
- `numpy`: Для работы с массивами и вычислений.
- `scikit-learn`: Библиотека машинного обучения для реализации моделей регрессии и методов ранжирования признаков.
### Шаги программы
Исходные данные: Генерация случайных данных для задачи регрессии, состоящей из 750 строк и 14 признаков.
Модели:
Lasso: Применение линейной модели Lasso с параметром альфа равным 0.05.
Random Forest: Использование ансамбля случайных деревьев с 100 деревьями.
Линейная корреляция (f_regression): Расчет коэффициентов корреляции между признаками и целевой переменной.
Ранжирование признаков:
Каждая модель ранжирует признаки в соответствии с их важностью.
Используется MinMaxScaler для нормализации значений рангов.
Среднее ранжирование:
Для каждого признака рассчитывается среднее значение его ранга по всем моделям.
Вывод результатов:
Выводится среднее ранжирование для каждого признака.
Показываются результаты ранжирования для каждой модели.
Выводится топ-4 признака с их значениями на основе среднего ранжирования.
### Запуск программы
- Склонировать или скачать код `main.py`.
- Запустите файл в среде, поддерживающей выполнение Python. `python main.py`
### Результаты
- Lasso
{'x1': 0.69, 'x2': 0.72, 'x3': 0.0, 'x4': 1.0, 'x5': 0.29, 'x6': 0.0, 'x7': 0.0, 'x8': 0.0, 'x9': 0.0, 'x10': 0.0, 'x11': 0.0, 'x12': 0.0, 'x13': 0.0, 'x14': 0.0}
- Random Forest
{'x1': 0.66, 'x2': 0.76, 'x3': 0.1, 'x4': 0.55, 'x5': 0.23, 'x6': 0.0, 'x7': 0.01, 'x8': 0.0, 'x9': 0.0, 'x10': 0.0, 'x11': 0.29, 'x12': 0.28, 'x13': 0.09, 'x14': 1.0}
- Correlation
{'x1': 0.3, 'x2': 0.45, 'x3': 0.0, 'x4': 1.0, 'x5': 0.04, 'x6': 0.0, 'x7': 0.01, 'x8': 0.02, 'x9': 0.01, 'x10': 0.0, 'x11': 0.29, 'x12': 0.44, 'x13': 0.0, 'x14': 0.98}
- Среднее
{'x1': 0.55, 'x2': 0.64, 'x3': 0.03, 'x4': 0.85, 'x5': 0.19, 'x6': 0.0, 'x7': 0.01, 'x8': 0.01, 'x9': 0.0, 'x10': 0.0, 'x11': 0.19, 'x12': 0.24, 'x13': 0.03, 'x14': 0.66}
- Топ 4 признака с их значениями на основе среднего ранжирования:
1. **x4:** 0.85
2. **x14:** 0.66
3. **x2:** 0.64
4. **x1:** 0.55

View File

@@ -0,0 +1,71 @@
from sklearn.linear_model import Lasso
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import f_regression
from sklearn.preprocessing import MinMaxScaler
import numpy as np
# Генерация исходных данных
np.random.seed(0)
size = 750
X = np.random.uniform(0, 1, (size, 14))
Y = (10 * np.sin(np.pi*X[:, 0]*X[:, 1]) + 20*(X[:, 2] - .5)**2 +
10*X[:, 3] + 5*X[:, 4]**5 + np.random.normal(0, 1))
X[:, 10:] = X[:, :4] + np.random.normal(0, .025, (size, 4))
# Лассо
lasso = Lasso(alpha=0.05)
lasso.fit(X, Y)
# Случайные деревья
rf = RandomForestRegressor(n_estimators=100, random_state=0)
rf.fit(X, Y)
# Линейная корреляция (f_regression)
correlation_coeffs, _ = f_regression(X, Y)
# Ранжирование с использованием MinMaxScaler
def rank_to_dict(ranks, names):
ranks = np.abs(ranks)
minmax = MinMaxScaler()
ranks = minmax.fit_transform(np.array(ranks).reshape(14, 1)).ravel()
ranks = map(lambda x: round(x, 2), ranks)
return dict(zip(names, ranks))
# Ранжирование для каждой модели
ranks = {}
names = ["x%s" % i for i in range(1, 15)]
ranks["Lasso"] = rank_to_dict(lasso.coef_, names)
ranks["Random Forest"] = rank_to_dict(rf.feature_importances_, names)
ranks["Correlation"] = rank_to_dict(correlation_coeffs, names)
# Создание пустого словаря для данных
mean = {}
# Обработка словаря ranks
for key, value in ranks.items():
for item in value.items():
if item[0] not in mean:
mean[item[0]] = 0
mean[item[0]] += item[1]
# Нахождение среднего по каждому признаку
for key, value in mean.items():
res = value / len(ranks)
mean[key] = round(res, 2)
# Сортировка и вывод списка средних значений
mean_dict = dict(mean)
print("MEAN")
print(mean_dict)
# Вывод результатов ранжирования для каждой модели
for key, value in ranks.items():
print(key)
print(value)
# Вывод топ-4 признаков с их значениями
top_features = sorted(mean.items(), key=lambda x: x[1], reverse=True)[:4]
print("Top 4 features with values:")
for feature, value in top_features:
print(f"{feature}: {value}")

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 106 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 93 KiB

View File

@@ -0,0 +1,111 @@
# Лабораторная работа 1. Вариант 15
### Задание
Сгенерировать данные:
`
make_classification (n_samples=500, n_features=2,
n_redundant=0, n_informative=2, random_state=rs, n_clusters_per_class=1)
`
Сравнить на нем 3 модели:
- Линейную регрессию
- Полиномиальную регрессию (со степенью 4)
- Персептрон
### Как запустить лабораторную работу
Для запуска программы необходимо с помощью командной строки в корневой директории файлов прокета прописать:
```
python main.py
```
### Какие технологии использовали
- Библиотека *numpy* для работы с массивами.
- Библиотека *matplotlib pyplot* - для визуализации данных.
- Библиотека *sklearn*:
- *make_classification* для создания синтетических наборов данных.
- *LinearRegression* для создания и работы с моделью Линейной регрессии.
- *Perceptron* для создания и работы с Персептроном
- *accuracy_score* для использования функции, используемая для вычисления точности классификации.
- *train_test_split* для разделения набора данных на обучающую и тестовую выборки.
- *PolynomialFeatures* для создания преобразователя, который генерирует полиномиальные признаки из исходных признаков
### Описание лабораторной работы
#### Генерация данных
Программа создает синтетический набор данных, где переменная `X` будет содержать матрицу признаков размером `(n_samples, n_features)`, а переменная `y` будет содержать вектор целевых переменных размером `(n_samples,)`.
```
X, y = make_classification(n_samples=500, n_features=2, n_redundant=0,
n_informative=2, random_state=None,
n_clusters_per_class=1)
```
Добавляет шум к данным путем увеличения значений матрицы признаков `X` на случайные значения из равномерного распределения, умноженные на 2. Затем создает переменную, которая содержит кортеж из матрицы признаков `X` и вектора целевых переменных `y`. И разделяет данные на обучающий набор `(X_train, y_train)` и тестовый набор `(X_test, y_test)` с помощью функции `train_test_split`. Обучающий набор составляет 60% от исходных данных, а 40% от исходных данных используются для тестирования модели `(test_size=.4)`.
```python
rng = np.random.RandomState(2)
X += 2 * rng.uniform(size=X.shape)
linearly_dataset = (X, y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=42)
```
#### Работа с моделью линейной регрессии
Создаем экземпляр модели линейной регрессии с помощью класса `LinearRegression()`, которая будет использоваться для построения линейной регрессии. Обучаем модель на обучающем наборе данных `X_train` и `y_train` с помощью метода `fit()`. Затем используем обученную модель для прогнозирования целевых переменных на тестовом наборе данных `X_test` с помощью метода `predict()`. Полученные прогнозы сохраняются в переменную `y_pred`. И вычисляем коэффициент детерминации (R-квадрат) для для оценки качества модели регрессии на тестовом наборе данных с помощью метода `score()`.
```python
# Модель линейной регрессии
model = LinearRegression()
# Обучение на тренировочных данных
model.fit(X_train, y_train)
# Выполнение прогноза
y_pred = model.predict(X_test)
# Вычисление коэффициента детерминации
r_sq = model.score(X_test, y_test)
```
Выполним построение графика:
![График линейной регрессии](LinearRegressionChart.png)
#### Работа с моделью полиномиальной регрессии (со степенью 4)
Создаем экземпляр класса `PolynomialFeatures` для генерации полиномиальных признаков со степень полинома 4 и параметр `include_bias=False`, чтобы исключить добавление дополнительного столбца с единицами (смещения). Преобразуем обучающий набор данных `X_train` и тестовый набор данных `X_test` в полиномиальные признаки с помощью метода `fit_transform()` и сохраняем в переменные `X_poly_train` и `X_poly_test` соотвественно. Создаем экземпляр модели линейной регрессии с помощью класса `LinearRegression()`. Обучаем модель линейной регрессии на обучающем наборе данных `X_poly_train` и `y_train` с помощью метода `fit()`. Используем обученную модель для прогнозирования целевых переменных на тестовом наборе данных `X_poly_test` с помощью метода `predict()`. И вычисляем коэффициент детерминации (R-квадрат) для модели на тестовом наборе данных с помощью метода `score()`.
```python
pf = PolynomialFeatures(degree=4, include_bias=False)
# Преобразование исходного набора данных X_train в полиномиальные признаки
X_poly_train = pf.fit_transform(X_train)
# Преобразование исходного набора данных X_test в полиномиальные признаки
X_poly_test = pf.fit_transform(X_test)
# Модель линейной регрессии
model = LinearRegression()
# Обучение модели линейной регрессии на преобразованных полиномиальных признаках
model.fit(X_poly_train, y_train)
# Выполнение прогноза
y_pred = model.predict(X_poly_test)
# Вычисление коэффициента детерминации
r_sq = model.score(X_poly_test, y_test)
```
Выполним построение графика:
![График полиномиальной регрессии](PolynomialRegressionChart.png)
#### Работа с персептроном
Создаем экземпляр модели персептрона `model = Perceptron()` и обучаем модель на тренировочных данных с помощью метода `fit()`. После обучения модели персептрона, выполняем прогноз на тестовых данных с помощью метода `predict()`. Для оценки точности работы персептрона используем функцию `accuracy_score`, которая сравнивает предсказанные классы `y_pred` с истинными классами `y_test` и возвращает долю правильно классифицированных примеров.
```python
# Модель персептрона
model = Perceptron()
# Обучение на тренировочных данных
model.fit(X_train, y_train)
# Выполнение прогноза
y_pred = model.predict(X_test)
# Вычисление точности работы персептрона
accuracy = accuracy_score(y_test, y_pred)
```
Выполним построение графика:
![График персептрона](PerceptronChart.png)
### Вывод
Исходя из построенных графиков можно сделать следующий вывод:
1. Коэффициент детерминации для полиномиальной регрессии (0,56) выше, чем для линейной регрессии (0,52). Это означает, что полиномиальная модель лучше объясняет изменчивость в данных, чем линейная модель. Однако значение 0.56 указывает на некоторую связь между предсказываемой переменной и независимыми переменными, но остается возможность для дальнейшего улучшения модели.
2. Доля правильно классифицированных примеров персептроном (0,845) также высокая. Это говорит о том, что персептрон успешно выполнил задачу классификации и хорошо разделил примеры на правильные классы.
В целом, можно сделать вывод, что и полиномиальная регрессия и персептрон проявляют лучшую производительность и демонстрируют лучшие результаты в анализе сгенерированных нами данных, чем линейная регрессия.

View File

@@ -0,0 +1,102 @@
import os.path
import numpy as np
from matplotlib import pyplot as plt
from sklearn.datasets import make_classification
from sklearn.linear_model import LinearRegression, Perceptron
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
picfld = os.path.join('static', 'charts')
X, y = make_classification(n_samples=500, n_features=2, n_redundant=0,
n_informative=2, random_state=None,
n_clusters_per_class=1)
# sklearn.datasets.samples_generator.make_classification - используется для создания случайных задач классификации N.
# n_samples - Количество случайных чисел
# n_features - количество признаков (измерений) для каждого числа.
# n_informative - Количество информативных характеристик
# n_redundant -количество избыточных признаков, которые не вносят дополнительной информации.
# random_state - опциональный параметр для установки начального состояния генератора случайных чисел.
# n_clusters_per_class - Количество кластера в каждой категории
# Функция возвращает два значения:
# X: массив размера [n_samples, n_features], содержащий сгенерированные признаки.
# y: массив размера [n_samples], содержащий сгенерированные целевые переменные (классы).
rng = np.random.RandomState(2)
# добавление шума к данным
X += 2 * rng.uniform(size=X.shape)
linearly_dataset = (X, y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=42)
# Модель: линейная регрессия
def linear_regression():
# Модель линейной регрессии
model = LinearRegression()
# Обучение на тренировочных данных
model.fit(X_train, y_train)
# Выполнение прогноза
y_pred = model.predict(X_test)
# Вычисление коэффициента детерминации
r_sq = model.score(X_test, y_test)
# Создание графика
plt.plot(y_test, c="#bd0000", label="\"y\" исходная")
plt.plot(y_pred, c="#00BFFF", label="\"y\" предсказанная \n" "Кд = " + str(r_sq))
plt.title("Линейная регрессия")
plt.legend(loc='lower left')
plt.savefig('static/charts/LinearRegressionChart.png')
plt.close()
# Модель: полиномиальная регрессия (со степенью 4)
def polynomial_regression():
# Генерирование объекта полинома,
# где degree - степень полинома,
# include_bias - установка вектора смещения в полиномиальные признаки
pf = PolynomialFeatures(degree=4, include_bias=False)
# Преобразование исходного набора данных X_train в полиномиальные признаки
X_poly_train = pf.fit_transform(X_train)
# Преобразование исходного набора данных X_test в полиномиальные признаки
X_poly_test = pf.fit_transform(X_test)
# Модель линейной регрессии
model = LinearRegression()
# Обучение модели линейной регрессии на преобразованных полиномиальных признаках
model.fit(X_poly_train, y_train)
# Выполнение прогноза
y_pred = model.predict(X_poly_test)
# Вычисление коэффициента детерминации
r_sq = model.score(X_poly_test, y_test)
# Создание графика
plt.plot(y_test, c="#bd0000", label="\"y\" исходная")
plt.plot(y_pred, c="#00BFFF",
label="\"y\" предсказанная \n" "Кд = " + str(r_sq))
plt.legend(loc='lower left')
plt.title("Полиномиальная регрессия")
plt.savefig('static/charts/PolynomialRegressionChart.png')
plt.close()
# Модель: персептрон
def perceptron():
# Модель персептрона
model = Perceptron()
# Обучение на тренировочных данных
model.fit(X_train, y_train)
# Выполнение прогноза
y_pred = model.predict(X_test)
# Вычисление точности работы персептрона
accuracy = accuracy_score(y_test, y_pred)
# Создание графика
plt.plot(y_test, c="#bd0000", label="\"y\" исходная")
plt.plot(y_pred, c="#00BFFF",
label="\"y\" предсказанная \n" "Точность = " + str(accuracy))
plt.legend(loc='lower left')
plt.title("Персептрон")
plt.savefig('static/charts/PerceptronChart.png')
plt.close()
if __name__ == '__main__':
linear_regression()
polynomial_regression()
perceptron()

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 106 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 93 KiB

View File

@@ -0,0 +1,136 @@
# Лабораторная работа 2. Вариант 15
### Задание
Выполнить ранжирование признаков с помощью указанных по
варианту моделей. Отобразить получившиеся значения\оценки каждого
признака каждым методом\моделью и среднюю оценку.
3 модели:
- Случайное Лассо (RandomizedLasso)
- Рекурсивное сокращение признаков (Recursive Feature Elimination RFE)
- Линейная корреляция (f_regression)
### Как запустить лабораторную работу
Для запуска программы необходимо с помощью командной строки в корневой директории файлов прокета прописать:
```
python main.py
```
### Какие технологии использовали
- Библиотека *numpy* для работы с массивами.
- Библиотека *itemgetter* для использования функции для выбора элементов из коллекции.
- Библиотека *sklearn*:
- *LinearRegression* для создания и работы с моделью Линейной регрессии.
- *Ridge* для создания и работы с моделью линейной регрессии с регуляризацией
- *MinMaxScaler* - для нормализации данных путем масштабирования значений признаков в диапазоне от 0 до 1.
- *RFE, f_regression* - RFE используется для рекурсивного отбора признаков, а f_regression - для вычисления корреляции между каждым признаком и целевой переменной.
### Описание лабораторной работы
#### Генерация данных
Создаем массив `X` размером (750, 14), где каждый элемент выбирается случайным образом из равномерного распределения на интервале от 0 до 1. Затем вычисляем массив `Y`, используя формулу, отрадающую зависимость значения `Y` от значений в массиве `X` и состоящую из случайного шума.
Изменяем значения последних четырех столбцов массива `X`. Значения этих столбцов становятся суммой первых четырех столбцов и случайного шума. Создаем список names, который содержит названия признаков, и пустой словарь ranks, который будет использоваться для хранения значений важности признаков.
```python
np.random.seed(0)
size = 750
X = np.random.uniform(0, 1, (size, 14))
# Задаем функцию-выход: регрессионную проблему Фридмана
Y = (10 * np.sin(np.pi*X[:, 0]*X[:, 1]) + 20*(X[:, 2] - .5)**2 + 10*X[:, 3] + 5*X[:, 4]**5 + np.random.normal(0, 1))
# Добавляем зависимость признаков
X[:, 10:] = X[:, :4] + np.random.normal(0, .025, (size, 4))
names = ["x%s" % i for i in range(1, 15)] # - список признаков вида ['x1', 'x2', 'x3', ..., 'x14']
ranks = dict()
```
#### Обработка результатов
Создаем функцию `rank_to_dict`, которая принимает два аргумента: `ranks` (оценки важности признаков) и `names` (названия признаков). В данной функции создается словарь, используя `zip` для объединения названий признаков и округленных оценок важности признаков. Названия признаков становятся ключами словаря, а округленные оценки - значениями.
```python
def rank_to_dict(ranks, names):
# получение абсолютных значений оценок(модуля)
ranks = np.abs(ranks)
minmax = MinMaxScaler()
# преобразование данных
ranks = minmax.fit_transform(np.array(ranks).reshape(14, 1)).ravel()
# округление элементов массива
ranks = map(lambda x: round(x, 2), ranks)
# преобразование данных
return dict(zip(names, ranks))
```
#### Модель Ridge
Так как по заданию необходимо работать с устаревшей моделью случайное *Лассо (RandomizedLasso)*, воспользуемся *Ridge-регрессией (Ridge Regression)*.
Создадим экземпляр модели *Ridge*, которая используется для выполнения регрессии с использованием линейной комбинации признаков, которая применяет регуляризацию L2 для уменьшения влияния мультиколлинеарности в данных.
Обучаем модель *Ridge* с использованием метода `.fit(X, Y)`. Здесь `X` представляет собой матрицу признаков (независимые переменные), а `Y` - вектор целевой переменной (зависимая переменная). Модель обучается на этих данных, чтобы найти оптимальные значения коэффициентов регрессии. Далее отправляем полученные коэффициенты в метод `rank_to_dict` для обрезования данных в необходимый вид.
```python
ridge_model = Ridge()
ridge_model.fit(X, Y)
ranks['Ridge'] = rank_to_dict(ridge_model.coef_, names)
```
#### Модель рекурсивное сокращение признаков (Recursive Feature Elimination RFE)
Для работы с моделью рекурсивного сокрщения признаков создадим две функции: `recursive_feature_elimination()` и `rank_to_dict_rfe(ranking, names)`.
В функции `recursive_feature_elimination()` создаем экземпляр модели *LinearRegression* под именем `estimator`. Далее создаем экземпляр модели *RFE (Recursive Feature Elimination)* под именем `rfe_model`, передавая `estimator` в качестве аргумента конструктора. Обучаем `rfe_model` на данных `X` и `Y`. Затем добавляем ранги признаков, полученные от `rfe_model`, в словарь `ranks` под ключом *'Recursive Feature Elimination'*, используя функцию `rank_to_dict_rfe()`.
```python
def recursive_feature_elimination():
# создание модели LinearRegression
estimator = LinearRegression()
# создание модели RFE
rfe_model = RFE(estimator)
rfe_model.fit(X, Y)
ranks['Recursive Feature Elimination'] = rank_to_dict_rfe(rfe_model.ranking_, names)
```
В функции `rank_to_dict_rfe(ranking, names)` находит обратные значения рангов признаков, делим 1 на каждый ранг. Округляем элементы полученного массива до двух знаков после запятой, используя функцию `round()`. И возвращаем словарь, где имена признаков из `names` являются ключами, а значениями являются округленные обратные значения рангов.
```python
def rank_to_dict_rfe(ranking, names):
# нахождение обратных значений рангов
n_ranks = [float(1 / i) for i in ranking]
# округление элементов массива
n_ranks = map(lambda x: round(x, 2), n_ranks)
# преобразование данных
return dict(zip(names, n_ranks))
```
#### Линейная корреляция (f_regression)
Для работы с линейной корреляцией выполним линейную корреляцию между признаками в матрице `X` и целевой переменной `Y`. Для этого используем функцию `f_regression()` для выполнения линейной регрессии между каждым признаком в матрице `X` и целевой переменной `Y`. После этого возвращается два массива: `correlation` содержит значения коэффициентов корреляции между признаками и `Y`, а `p_values` содержит соответствующие p-значения для каждого коэффициента корреляции. После чего создаем новую запись в словаре `ranks` с ключом *'linear correlation'*.
```python
correlation, p_values = f_regression(X, Y)
ranks['linear correlation'] = rank_to_dict(correlation, names)
```
### Вывод
Согласно условию задания значимыми параметрами были: *x1, x2, x3, x4, x5*. А зависимыми от них *x10, x11, x12, x13, x14*.
После сортировки полученных результатов работы моделей, можем увидеть следующее:
```
Ridge
[('x4', 1.0), ('x1', 0.98), ('x2', 0.8), ('x14', 0.61), ('x5', 0.54), ('x12', 0.39), ('x3', 0.25), ('x13', 0.19), ('x11', 0.16), ('x6', 0.08), ('x8', 0.07), ('x7', 0.02), ('x10', 0.02), ('x9', 0.0)]
Recursive Feature Elimination
[('x1', 1.0), ('x2', 1.0), ('x3', 1.0), ('x4', 1.0), ('x5', 1.0), ('x11', 1.0), ('x13', 1.0), ('x12', 0.5), ('x14', 0.33), ('x8', 0.25), ('x6', 0.2), ('x10', 0.17), ('x7', 0.14), ('x9', 0.12)]
linear correlation
[('x4', 1.0), ('x14', 0.98), ('x2', 0.45), ('x12', 0.44), ('x1', 0.3), ('x11', 0.29), ('x5', 0.04), ('x8', 0.02), ('x7', 0.01), ('x9', 0.01), ('x3', 0.0), ('x6', 0.0), ('x10', 0.0), ('x13', 0.0)]
```
Как можно заметить в модели *Ridge* параметры *x1, x2, x4, x5* имеют наибольшую значимость, что соответстует исходному условию задания, однако был потерян признак *x3*, значимость которого была показана низкой.
Модель *Recursive Feature Elimination* правильно показала все наиболее значиме признаки *x1, x2, x3, x4, x5*.
В модели линейной корреляции параметры *'x4'* и *'x14'* имеют наибольшую корреляцию, равную 1.0. Параметры *'x3', 'x6', 'x10' и 'x13'* имеют наименьшую корреляцию, равную 0.0. Таким образом, показав наихудний результат среди трех моделей.
В среднем по работе трех моделей имеем следующий результат, где параметр *'x4'* имеет наибольшую значимость, равную 1.0, параметр *'x9'* имеет наименьшую значимость, равную 0.04.:
```
Mean
[('x4', 1.0), ('x1', 0.76), ('x2', 0.75), ('x14', 0.64), ('x5', 0.53), ('x11', 0.48), ('x12', 0.44), ('x3', 0.42), ('x13', 0.4), ('x8', 0.11), ('x6', 0.09), ('x7', 0.06), ('x10', 0.06), ('x9', 0.04)]
```
Таким образом, можно сделать вывод о том, лучше всех справилась модель *Recursive Feature Elimination*, а также, что порядок значимости параметров может немного различаться в зависимости от модели.

View File

@@ -0,0 +1,88 @@
from operator import itemgetter
import numpy as np
from sklearn.feature_selection import RFE, f_regression
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import Ridge
from sklearn.linear_model import LinearRegression
# генерируем исходные данные: 750 строк-наблюдений и 14 столбцов-признаков
np.random.seed(0)
size = 750
X = np.random.uniform(0, 1, (size, 14))
# Задаем функцию-выход: регрессионную проблему Фридмана
Y = (10 * np.sin(np.pi*X[:, 0]*X[:, 1]) + 20*(X[:, 2] - .5)**2 + 10*X[:, 3] + 5*X[:, 4]**5 + np.random.normal(0, 1))
# Добавляем зависимость признаков
X[:, 10:] = X[:, :4] + np.random.normal(0, .025, (size, 4))
names = ["x%s" % i for i in range(1, 15)] # - список признаков вида ['x1', 'x2', 'x3', ..., 'x14']
ranks = dict()
def rank_to_dict(ranks, names):
# получение абсолютных значений оценок(модуля)
ranks = np.abs(ranks)
minmax = MinMaxScaler()
# преобразование данных
ranks = minmax.fit_transform(np.array(ranks).reshape(14, 1)).ravel()
# округление элементов массива
ranks = map(lambda x: round(x, 2), ranks)
# преобразование данных
return dict(zip(names, ranks))
# Модель: случайное Лассо (RandomizedLasso) - устаревшее, поэтому используем Ridge-регрессия (Ridge Regression)
def ridge_regressions():
# Создание экземпляра модели Ridge
ridge_model = Ridge()
ridge_model.fit(X, Y)
ranks['Ridge'] = rank_to_dict(ridge_model.coef_, names)
# Модель: рекурсивное сокращение признаков (Recursive Feature Elimination RFE)
def recursive_feature_elimination():
# создание модели LinearRegression
estimator = LinearRegression()
# создание модели RFE
rfe_model = RFE(estimator)
rfe_model.fit(X, Y)
ranks['Recursive Feature Elimination'] = rank_to_dict_rfe(rfe_model.ranking_, names)
def rank_to_dict_rfe(ranking, names):
# нахождение обратных значений рангов
n_ranks = [float(1 / i) for i in ranking]
# округление элементов массива
n_ranks = map(lambda x: round(x, 2), n_ranks)
# преобразование данных
return dict(zip(names, n_ranks))
# Модель: линейная корреляция (f_regression)
def linear_correlation():
# вычисление линейной корреляции между X и y
correlation, p_values = f_regression(X, Y)
ranks['linear correlation'] = rank_to_dict(correlation, names)
if __name__ == '__main__':
ridge_regressions()
recursive_feature_elimination()
linear_correlation()
for key, value in ranks.items(): # Вывод нормализованных оценок важности признаков каждой модели
ranks[key] = sorted(value.items(), key=itemgetter(1), reverse=True)
for key, value in ranks.items():
print(key)
print(value)
mean = {} # - нахождение средних значений оценок важности по 3м моделям
for key, value in ranks.items():
for item in value:
if item[0] not in mean:
mean[item[0]] = 0
mean[item[0]] += item[1]
for key, value in mean.items():
res = value / len(ranks)
mean[key] = round(res, 2)
mean = sorted(mean.items(), key=itemgetter(1), reverse=True)
print("Mean")
print(mean)

View File

@@ -0,0 +1,123 @@
# Лабораторная работа 3. Вариант 15
### Задание
Выполнить ранжирование признаков и решить с помощью библиотечной реализации дерева решений задачу классификации на 99% данных из курсовой работы. Проверить работу модели на оставшемся проценте, сделать вывод.
Модель:
- дерево решений DecisionTreeClassifier.
### Как запустить лабораторную работу
Для запуска программы необходимо с помощью командной строки в корневой директории файлов прокета прописать:
```
python main.py
```
### Какие технологии использовали
- Библиотека *numpy* для работы с массивами.
- Библиотека *pandas* для для работы с данными и таблицами.
- Библиотека *sklearn*:
- *train_test_split* - для разделения данных на обучающую и тестовую выборки.
- *MinMaxScaler* - для нормализации данных путем масштабирования значений признаков в диапазоне от 0 до 1.
- *DecisionTreeClassifier* - для использования алгоритма дерева решений для задачи классификации.
### Описание лабораторной работы
#### Описание набора данных
В качестве набора данных был взят: *"Job Dataset"* - набор данных, содержащий объявления о вакансиях.
Набор данных состоит из следующих столбцов:
Descriptions for each of the columns in the dataset:
- Job Id - Уникальный идентификатор для каждой публикации вакансии.
- Experience - Требуемый или предпочтительный многолетний опыт работы на данной должности.
- Qualifications - Уровень образования, необходимый для работы.
- Salary Range - Диапазон окладов или компенсаций, предлагаемых за должность.
- Location - Город или область, где находится работа.
- Country - Страна, в которой находится работа.
- Latitude - Координата широты местоположения работы.
- Longitude - Координата долготы местоположения работы.
- Work Type - Тип занятости (например, полный рабочий день, неполный рабочий день, контракт).
- Company Size - Приблизительный размер или масштаб компании, принимающей на работу.
- Job Posting Date - Дата, когда публикация о вакансии была опубликована.
- Preference - Особые предпочтения или требования к кандидатам (например, только мужчины или только женщины, или и то, и другое).
- Contact Person - Имя контактного лица или рекрутера для работы.
- Contact - Контактная информация для запросов о работе.
- Job Title - Название должности
- Role - Роль или категория работы (например, разработчик программного обеспечения, менеджер по маркетингу).
- Job Portal - Платформа или веб-сайт, на котором была размещена вакансия.
- Job Description - Подробное описание должностных обязанностей и требований.
- Benefits - Информация о льготах, предоставляемых в связи с работой (например, медицинская страховка, пенсионные планы).
- Skills - Навыки или квалификация, необходимые для работы.
- Responsibilities - Конкретные обязанности, связанные с работой.
- Company Name - Название компании, принимающей на работу.
- Company Profile - Краткий обзор истории компании и миссии.
Ссылка на страницу набора на kuggle: [Job Dataset](https://www.kaggle.com/datasets/ravindrasinghrana/job-description-dataset)
#### Подготовка данных
Для обеспечения качественного анализа данных и построения точных моделей машинного обучения, необходимо провести предварительную обработку данных. В данном проекте была выполнена следующая предобработка данных:
- Были удалены незначищие столбцы: *"Job Id", "latitude", "longitude", "Contact Person", "Contact", "Job Description", "Responsibilities"*.
```python
df_job.drop(["Job Id", "latitude", "longitude", "Contact Person", "Contact", "Job Description", "Responsibilities"], axis=1,
inplace=True)
```
- Кодирование категориальных признаков, преобразованние их в уникальные числовые значения для каждого столбца, чтобы модель машинного обучения могла работать с ними, для столбцов: *'location', 'Country', 'Work Type','Preference', 'Job Title', 'Role', 'Job Portal', 'skills', 'Company', 'Sector'*. Пример кодирования категориальных признаков:
```python
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
qualifications_dict = {qual: i for i, qual in enumerate(df_job['Qualifications'].unique())}
# Заменяем значения в столбце "Qualifications" соответствующими числовыми идентификаторами
df_job['Qualifications'] = df_job['Qualifications'].map(qualifications_dict)
```
- Данные столбцов *'Experience' и 'Salary Range'* были разделены соответственно на дополнительные столбцы: *'Min Experience', 'Max Experience', 'Min Salary', 'Max Salary'*. А сами столбцы *'Experience' и 'Salary Range'* удалены.
Пример разделения:
```python
# Разделяем значения 'Years' на минимальное и максимальное
# Удаляем символы валюты и другие символы
df_job['Experience'] = df_job['Experience'].apply(lambda x: str(x).replace('Years', '') if x is not None else x)
df_job[['Min Experience', 'Max Experience']] = df_job['Experience'].str.split(' to ', expand=True)
# Преобразуем значения в числовой формат
df_job['Min Experience'] = pd.to_numeric(df_job['Min Experience'])
df_job['Max Experience'] = pd.to_numeric(df_job['Max Experience'])
```
- Данные столбцы *'Job Posting Date'* были разбиты на дополнительные столбцы: *'year', 'month', 'day'*. А сам столбец *'Job Posting Date'* был удален.
- Данные ячеек столбца *'Company Profile'* имеют структуру вида *{"Sector":"Diversified","Industry":"Diversified Financials","City":"Sunny Isles Beach","State":"Florida","Zip":"33160","Website":"www.ielp.com","Ticker":"IEP","CEO":"David Willetts"}*, поэтому были разделены на дополнительные столбцы и закодированы для избежания категориальных признаков: *'Sector', 'Industry', 'City', 'State', 'Ticker'*, а данные о *'Zip', 'Website', 'CEO'* были удалены, как наименее важные. Также был удален сам столбец *'Company Profile'*.
#### Выявление значимых параметров
Создаем переменную y, которая содержит значения целевой переменной *"Qualifications"* из нашего подготовленного набора данных `data`. Разделяем данные на обучающую и тестовую выборки, где `corr.values` содержит значения признаков, которые будут использоваться для обучения модели, `y.values` содержит значения целевой переменной, а `test_size=0.2` указывает, что 20% данных будет использоваться для тестирования модели. Затем создаем экземпляр классификатора `DecisionTreeClassifier` и обучаем классификатор на обучающих данных. После чего получаем важности признаков из обученной модели, которые показывают, насколько сильно каждый признак влияет на прогнозы модели.
```python
# определение целевой переменной
y = data['Qualifications']
# Разделение данных на обучающую и тестовую выборки
X_train, X_test, y_train, y_test = train_test_split(corr.values, y.values, test_size=0.2)
# Создание экземпляра классификатора дерева решений
clf = DecisionTreeClassifier(random_state=241)
# Обучение модели на обучающей выборке
clf.fit(X_train, y_train)
# Прогнозирование классов для тестовой выборки
y_pred = clf.predict(X_test)
importances = clf.feature_importances_
print("Важность признаков: ")
print(importances)
print("Отсортированная важность признаков: ")
conversion_ratings(importances)
```
Для того, чтобы получить отсортированный список важности признаков и их значения создаем дополнительный метод `conversion_ratings` с аналогичной логикой работы сортировки данных, как в лабораторной работе 2.
После запуска имеем следующий результат:
```
Важность признаков:
[0.04535517 0.04576875 0.03236705 0.07819966 0.02279837 0.0608208
0.04189454 0.04985896 0.0418959 0.03571376 0.03675038 0.04229454
0.04054691 0.05188657 0.03849015 0.04226668 0.04105321 0.03616932
0.03535738 0.01584379 0.04569225 0.0588709 0.00620841 0.00620682
0.00606359 0.00595985 0.00568906 0.00345068 0.00343211 0.00491702
0.00614867 0.00568446 0.00634429]
Отсортированная важность признаков:
{'Company Size': 1.0, 'Job Title': 0.77, 'day': 0.74, 'Max Salary': 0.65, 'Job Portal': 0.62, 'Country': 0.57, 'month': 0.57, 'location': 0.56, 'Max Experience': 0.52, 'Industry': 0.52, 'Role': 0.51, 'skills': 0.51, 'Min Salary': 0.5, 'City': 0.5, 'Sector': 0.47, 'Min Experience': 0.45, 'State': 0.44, 'Company': 0.43, 'Ticker': 0.43, 'Work Type': 0.39, 'Preference': 0.26, 'year': 0.17, "'Casual Dress Code, Social and Recreational Activities, Employee Referral Programs, Health and Wellness Facilities, Life and Disability Insurance'": 0.04, "'Childcare Assistance, Paid Time Off (PTO), Relocation Assistance, Flexible Work Arrangements, Professional Development'": 0.04, "'Employee Assistance Programs (EAP), Tuition Reimbursement, Profit-Sharing, Transportation Benefits, Parental Leave'": 0.04, "'Life and Disability Insurance, Stock Options or Equity Grants, Employee Recognition Programs, Health Insurance, Social and Recreational Activities'": 0.04, "'Tuition Reimbursement, Stock Options or Equity Grants, Parental Leave, Wellness Programs, Childcare Assistance'": 0.04, "'Employee Referral Programs, Financial Counseling, Health and Wellness Facilities, Casual Dress Code, Flexible Spending Accounts (FSAs)'": 0.03, "'Flexible Spending Accounts (FSAs), Relocation Assistance, Legal Assistance, Employee Recognition Programs, Financial Counseling'": 0.03, "'Transportation Benefits, Professional Development, Bonuses and Incentive Programs, Profit-Sharing, Employee Discounts'": 0.03, "'Legal Assistance, Bonuses and Incentive Programs, Wellness Programs, Employee Discounts, Retirement Plans'": 0.02, "'Health Insurance, Retirement Plans, Flexible Work Arrangements, Employee Assistance Programs (EAP), Bonuses and Incentive Programs'": 0.0, "'Health Insurance, Retirement Plans, Paid Time Off (PTO), Flexible Work Arrangements, Employee Assistance Programs (EAP)'": 0.0}
```
### Вывод
Таким образом, можно сделать вывод о том, что наиболее важным признаком является "Company Size" с важностью 1.0, за ним следуют "Job Title" (0.77), "day" (0.74) и "Max Salary" (0.65). Исходя из значений важности признаков, можно сделать вывод, что как числовые, так и категориальные признаки вносят вклад в прогнозирование целевой переменной.
В целом, результаты лабораторной работы позволяют оценить важность каждого признака в прогнозировании целевой переменной и помогают понять, какие признаки следует учитывать при анализе данных и принятии решений.

View File

@@ -0,0 +1,194 @@
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.tree import DecisionTreeClassifier
def data_preprocessing():
df_job_orig = pd.read_csv('D:/Интеллектуальные информационные системы/Dataset/job_descriptions.csv')
df_job_orig = pd.DataFrame(df_job_orig)
desired_rows = int(0.99 * len(df_job_orig))
df_job = df_job_orig.copy()
df_job = df_job[:desired_rows]
df_job.drop(["Job Id", "latitude", "longitude", "Contact Person", "Contact", "Job Description", "Responsibilities"], axis=1,
inplace=True)
# digitization
# --------------------------'Years'------------------------
# Разделяем значения 'Years' на минимальное и максимальное
# Удаляем символы валюты и другие символы
df_job['Experience'] = df_job['Experience'].apply(lambda x: str(x).replace('Years', '') if x is not None else x)
df_job[['Min Experience', 'Max Experience']] = df_job['Experience'].str.split(' to ', expand=True)
# Преобразуем значения в числовой формат
df_job['Min Experience'] = pd.to_numeric(df_job['Min Experience'])
df_job['Max Experience'] = pd.to_numeric(df_job['Max Experience'])
# --------------------------'Salary Range'------------------------
# Удаляем символы валюты и другие символы
df_job['Salary Range'] = df_job['Salary Range'].str.replace('$', '').str.replace('K', '000')
# Разделяем значения на минимальное и максимальное
df_job[['Min Salary', 'Max Salary']] = df_job['Salary Range'].str.split('-', expand=True)
# Преобразуем значения в числовой формат
df_job['Min Salary'] = pd.to_numeric(df_job['Min Salary'])
df_job['Max Salary'] = pd.to_numeric(df_job['Max Salary'])
# --------------------------'Qualifications'------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
qualifications_dict = {qual: i for i, qual in enumerate(df_job['Qualifications'].unique())}
# Заменяем значения в столбце "Qualifications" соответствующими числовыми идентификаторами
df_job['Qualifications'] = df_job['Qualifications'].map(qualifications_dict)
# --------------------------'location'------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
locations_dict = {locat: i for i, locat in enumerate(df_job['location'].unique())}
# Заменяем значения в столбце "location" соответствующими числовыми идентификаторами
df_job['location'] = df_job['location'].map(locations_dict)
# --------------------------'Country'-------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
countries_dict = {countr: i for i, countr in enumerate(df_job['Country'].unique())}
# Заменяем значения в столбце "Country" соответствующими числовыми идентификаторами
df_job['Country'] = df_job['Country'].map(countries_dict)
# --------------------------'Work Type'-------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
wt_dict = {wt: i for i, wt in enumerate(df_job['Work Type'].unique())}
# Заменяем значения в столбце "Work Type" соответствующими числовыми идентификаторами
df_job['Work Type'] = df_job['Work Type'].map(wt_dict)
# --------------------------'Preference gender'-------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
gender_dict = {gender: i for i, gender in enumerate(df_job['Preference'].unique())}
# Заменяем значения в столбце "Preference" соответствующими числовыми идентификаторами
df_job['Preference'] = df_job['Preference'].map(gender_dict)
# --------------------------'Job Title'-------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
jt_dict = {jt: i for i, jt in enumerate(df_job['Job Title'].unique())}
# Заменяем значения в столбце "Job Title" соответствующими числовыми идентификаторами
df_job['Job Title'] = df_job['Job Title'].map(jt_dict)
# --------------------------'Role'-------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
role_dict = {role: i for i, role in enumerate(df_job['Role'].unique())}
# Заменяем значения в столбце "Role" соответствующими числовыми идентификаторами
df_job['Role'] = df_job['Role'].map(role_dict)
# --------------------------'Job Portal'-------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
jp_dict = {jp: i for i, jp in enumerate(df_job['Job Portal'].unique())}
# Заменяем значения в столбце "Job Portal" соответствующими числовыми идентификаторами
df_job['Job Portal'] = df_job['Job Portal'].map(jp_dict)
# --------------------------'Company'-------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
comp_dict = {comp: i for i, comp in enumerate(df_job['Company'].unique())}
# Заменяем значения в столбце "Company" соответствующими числовыми идентификаторами
df_job['Company'] = df_job['Company'].map(comp_dict)
# --------------------------'Company Profile'-------------------------
df_company_profile = df_job['Company Profile'].str.split('",', expand=True)
df_company_profile.columns = ['Sector', 'Industry', 'City', 'State', 'Zip', 'Website', 'Ticker', 'CEO']
df_company_profile = df_company_profile.apply(
lambda x: x.str.replace('{', '').str.replace('"', '').str.replace('}', '')
.str.replace('Sector', '').str.replace('Industry', '').str.replace('City', '')
.str.replace('State', '').str.replace('Zip', '').str.replace('Website', '')
.str.replace('Ticker', '').str.replace('CEO', '').str.replace(':', ''))
df_company_profile.drop(["CEO", "Website", "Zip"], axis=1, inplace=True)
# --------------------------'Sector'-------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
comp_dict = {sector: i for i, sector in enumerate(df_company_profile['Sector'].unique())}
# Заменяем значения в столбце "Sector" соответствующими числовыми идентификаторами
df_company_profile['Sector'] = df_company_profile['Sector'].map(comp_dict)
# --------------------------'Industry'-------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
comp_dict = {industry: i for i, industry in enumerate(df_company_profile['Industry'].unique())}
# Заменяем значения в столбце "Industry" соответствующими числовыми идентификаторами
df_company_profile['Industry'] = df_company_profile['Industry'].map(comp_dict)
# --------------------------'City'-------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
comp_dict = {city: i for i, city in enumerate(df_company_profile['City'].unique())}
# Заменяем значения в столбце "City" соответствующими числовыми идентификаторами
df_company_profile['City'] = df_company_profile['City'].map(comp_dict)
# --------------------------'State'-------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
comp_dict = {state: i for i, state in enumerate(df_company_profile['State'].unique())}
# Заменяем значения в столбце "State" соответствующими числовыми идентификаторами
df_company_profile['State'] = df_company_profile['State'].map(comp_dict)
# --------------------------'Ticker'-------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
comp_dict = {ticker: i for i, ticker in enumerate(df_company_profile['Ticker'].unique())}
# Заменяем значения в столбце "Ticker" соответствующими числовыми идентификаторами
df_company_profile['Ticker'] = df_company_profile['Ticker'].map(comp_dict)
# Объединение преобразованных столбцов с исходным датасетом
df_job = pd.concat([df_job, df_company_profile], axis=1)
# --------------------------'Job Posting Date'-------------------------
df_job[['year', 'month', 'day']] = df_job['Job Posting Date'].str.split('-', expand=True)
df_job['year'] = pd.to_numeric(df_job['year'])
df_job['month'] = pd.to_numeric(df_job['month'])
df_job['day'] = pd.to_numeric(df_job['day'])
# --------------------------'Benefits'-------------------------
df_job['Benefits'] = df_job['Benefits'].str.replace('{', '').str.replace('}', '')
# Применить метод get_dummies для оцифровки столбца 'Benefits'
benefits_encoded = pd.get_dummies(df_job['Benefits'], dtype=int)
# Соединить исходный DataFrame с оцифрованными данными
df_job = pd.concat([df_job, benefits_encoded], axis=1)
# --------------------------'skills'-------------------------
# Создаем словарь для отображения уникальных значений в числовые идентификаторы
comp_dict = {skill: i for i, skill in enumerate(df_job['skills'].unique())}
# Заменяем значения в столбце "skills" соответствующими числовыми идентификаторами
df_job['skills'] = df_job['skills'].map(comp_dict)
df_job.drop(["Company Profile", "Experience", "Salary Range", "Benefits", "Job Posting Date"], axis=1, inplace=True)
print(df_job.dtypes)
df_job.to_csv('D:/Интеллектуальные информационные системы/Dataset/updated_job_descriptions.csv', index=False)
def decision_tree_classifier():
data = pd.read_csv('D:/Интеллектуальные информационные системы/Dataset/updated_job_descriptions.csv')
corr = data[['location', 'Country', 'Work Type', 'Company Size', 'Preference', 'Job Title', 'Role', 'Job Portal', 'skills', 'Company', 'Min Experience', 'Max Experience', 'Min Salary',
'Max Salary', 'Sector', 'Industry', 'City', 'State', 'Ticker', 'year', 'month', 'day',
"'Casual Dress Code, Social and Recreational Activities, Employee Referral Programs, Health and Wellness Facilities, Life and Disability Insurance'",
"'Childcare Assistance, Paid Time Off (PTO), Relocation Assistance, Flexible Work Arrangements, Professional Development'",
"'Employee Assistance Programs (EAP), Tuition Reimbursement, Profit-Sharing, Transportation Benefits, Parental Leave'",
"'Employee Referral Programs, Financial Counseling, Health and Wellness Facilities, Casual Dress Code, Flexible Spending Accounts (FSAs)'",
"'Flexible Spending Accounts (FSAs), Relocation Assistance, Legal Assistance, Employee Recognition Programs, Financial Counseling'",
"'Health Insurance, Retirement Plans, Flexible Work Arrangements, Employee Assistance Programs (EAP), Bonuses and Incentive Programs'",
"'Health Insurance, Retirement Plans, Paid Time Off (PTO), Flexible Work Arrangements, Employee Assistance Programs (EAP)'",
"'Legal Assistance, Bonuses and Incentive Programs, Wellness Programs, Employee Discounts, Retirement Plans'",
"'Life and Disability Insurance, Stock Options or Equity Grants, Employee Recognition Programs, Health Insurance, Social and Recreational Activities'",
"'Transportation Benefits, Professional Development, Bonuses and Incentive Programs, Profit-Sharing, Employee Discounts'",
"'Tuition Reimbursement, Stock Options or Equity Grants, Parental Leave, Wellness Programs, Childcare Assistance'"]]
print(corr.head())
# определение целевой переменной
y = data['Qualifications']
# Разделение данных на обучающую и тестовую выборки
X_train, X_test, y_train, y_test = train_test_split(corr.values, y.values, test_size=0.2)
# Создание экземпляра классификатора дерева решений
clf = DecisionTreeClassifier(random_state=241)
# Обучение модели на обучающей выборке
clf.fit(X_train, y_train)
# Прогнозирование классов для тестовой выборки
y_pred = clf.predict(X_test)
importances = clf.feature_importances_
print("Важность признаков: ")
print(importances)
print("Отсортированная важность признаков: ")
conversion_ratings(importances)
def conversion_ratings(rank):
column_names = ['location', 'Country', 'Work Type', 'Company Size', 'Preference', 'Job Title', 'Role', 'Job Portal', 'skills', 'Company', 'Min Experience', 'Max Experience', 'Min Salary',
'Max Salary', 'Sector', 'Industry', 'City', 'State', 'Ticker', 'year', 'month', 'day',
"'Casual Dress Code, Social and Recreational Activities, Employee Referral Programs, Health and Wellness Facilities, Life and Disability Insurance'",
"'Childcare Assistance, Paid Time Off (PTO), Relocation Assistance, Flexible Work Arrangements, Professional Development'",
"'Employee Assistance Programs (EAP), Tuition Reimbursement, Profit-Sharing, Transportation Benefits, Parental Leave'",
"'Employee Referral Programs, Financial Counseling, Health and Wellness Facilities, Casual Dress Code, Flexible Spending Accounts (FSAs)'",
"'Flexible Spending Accounts (FSAs), Relocation Assistance, Legal Assistance, Employee Recognition Programs, Financial Counseling'",
"'Health Insurance, Retirement Plans, Flexible Work Arrangements, Employee Assistance Programs (EAP), Bonuses and Incentive Programs'",
"'Health Insurance, Retirement Plans, Paid Time Off (PTO), Flexible Work Arrangements, Employee Assistance Programs (EAP)'",
"'Legal Assistance, Bonuses and Incentive Programs, Wellness Programs, Employee Discounts, Retirement Plans'",
"'Life and Disability Insurance, Stock Options or Equity Grants, Employee Recognition Programs, Health Insurance, Social and Recreational Activities'",
"'Transportation Benefits, Professional Development, Bonuses and Incentive Programs, Profit-Sharing, Employee Discounts'",
"'Tuition Reimbursement, Stock Options or Equity Grants, Parental Leave, Wellness Programs, Childcare Assistance'"]
ranks = dict()
ranks = np.abs(rank)
minmax = MinMaxScaler()
ranks = minmax.fit_transform(np.array(ranks).reshape(33, 1)).ravel() # - преобразование данных
ranks = map(lambda x: round(x, 2), ranks) # - округление элементов массива
my_dict = dict(zip(column_names, ranks))
sorted_dict = dict(sorted(my_dict.items(), key=lambda x: x[1], reverse=True))
print(sorted_dict)
if __name__ == '__main__':
#data_preprocessing()
decision_tree_classifier()

View File

@@ -0,0 +1,44 @@
#### Кондрашин Михаил ПИбд-41
## Лабораторная работа 1. Работа с типовыми наборами данных и различными моделями
### Задание:
**Данные:** make_classification (n_samples=500, n_features=2, n_redundant=0, n_informative=2, random_state=rs,
n_clusters_per_class=1)
**Модели:**
* Линейная регрессия
* Полиномиальная регрессия (со степенью 3)
* Гребневая полиномиальная регрессия (со степенью 4, alpha = 1.0)
### Запуск лабораторной работы:
* установить `python`, `numpy`, `matplotlib`, `sklearn`
* запустить проект (стартовая точка класс `main.py`)
### Используемые технологии:
* Язык программирования `Python`,
* Библиотеки `numpy`, `matplotlib`, `sklearn`
* Среда разработки `IntelliJ IDEA` (В версии "Ultimate edition" можно писать на python)
### Описание решения:
* Программа генерирует данные с make_classification (n_samples=500, n_features=2, n_redundant=0, n_informative=2,
random_state=rs, n_clusters_per_class=1)
* Сравнивает три типа моделей: линейная, полиномиальная, гребневая полиномиальная регрессии
* Выдает графики и оценки качества по коэффициенту детерминации для каждой модели
### Результат:
![Linear](images/linear.png)
![Polynomial](images/polynomial.png)
![Greb](images/greb_polynom.png)
* Результат расчета оценки качества:
![Result](images/result.png)
По результатам оценки качества можно сказать, что **полиномиальная регрессия** показала наибольшую оценку

View File

@@ -0,0 +1,47 @@
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
def lin(x_train, x_test, y_train, y_test):
plt.scatter(x_test, y_test)
model = LinearRegression().fit(x_train, y_train)
y_predict = model.intercept_ + model.coef_ * x_test
plt.plot(x_test, y_predict, color='red')
plt.title('Линейная регрессия')
plt.savefig('images/linear.png')
plt.show()
print('Линейная регрессия')
print('Оценка качества:', model.score(x_train, y_train))
def polynom(x_train, y_train):
plt.scatter(x_train, y_train)
x_poly = PolynomialFeatures(degree=4).fit_transform(x_train)
pol_reg = LinearRegression()
model = pol_reg.fit(x_poly, y_train)
y_predict = pol_reg.predict(x_poly)
plt.plot(x_train, y_predict, color='green')
plt.title('Полиномиальная регрессия')
plt.savefig('images/polynomial.png')
plt.show()
print('Полиномиальная регрессия')
print('Оценка качества:', model.score(x_poly, y_train))
def greb_polynom(x_train, x_test, y_train, y_test):
plt.scatter(x_test, y_test)
pipeline = Pipeline([("polynomial_features", PolynomialFeatures(degree=4)), ("ridge", Ridge(alpha=1.0))])
model = pipeline.fit(x_train, y_train)
y_predict = pipeline.predict(x_test)
plt.plot(x_test, y_predict, color='blue')
plt.title('Гребневая полиномиальная регрессия')
plt.savefig('images/greb_polynom.png')
plt.show()
print('Гребневая полиномиальная регрессия')
print('Оценка качества:', model.score(x_train, y_train))

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

View File

@@ -0,0 +1,16 @@
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from funcs import *
x, y = make_classification(n_samples=500, n_features=2, n_redundant=0, n_informative=2, random_state=0,
n_clusters_per_class=1)
x = x[:, np.newaxis, 1]
x_train, x_test, y_train, y_test = train_test_split(x, y)
lin(x_train, x_test, y_train, y_test)
polynom(x_train, y_train)
greb_polynom(x_train, x_test, y_train, y_test)

View File

@@ -0,0 +1,36 @@
#### Кондрашин Михаил ПИбд-41
## Лабораторная работа 2. Ранжирование признаков
### Задание:
* Линейная регрессия (LinearRegression)
* Сокращение признаков случайными деревьями (Random Forest Regressor)
* Линейная корреляция (f_regression)
### Запуск лабораторной работы:
* установить `python`, `numpy`, `matplotlib`, `sklearn`
* запустить проект (стартовая точка класс `main.py`)
### Используемые технологии:
* Язык программирования `Python`,
* Библиотеки `numpy`, `matplotlib`, `sklearn`
* Среда разработки `IntelliJ IDEA` (В версии "Ultimate edition" можно писать на python)
### Описание решения:
Программа выполняет ранжирование признаков для регрессионной модели:
* Линейная регрессия (LinearRegression)
* Сокращение признаков Случайными деревьями (Random Forest Regressor)
* Линейная корреляция (f_regression)
*14 признаков
*750 наблюдений
### Результат:
![Result](images/result.png)
* Лучше всего показал себя метод линейной корреляции (x4, x14, x2, x12). Хотя признаки x1 и x3 не были выявлены, их влияние может быть учтено через скоррелированные параметры x12 и x14.
* Самые важные признаки по среднему значению: x1, x4, x2, x11

View File

@@ -0,0 +1,12 @@
import numpy as np
def generate_data():
size = 750
np.random.seed(0)
X = np.random.uniform(0, 1, (size, 14))
Y = (10 * np.sin(np.pi * X[:, 0] * X[:, 1]) + 20 * (X[:, 2] - .5) ** 2 + 10 * X[:, 3] + 5 * X[:, 4] ** 5 + np.random.normal(0, 1))
X[:, 10:] = X[:, :4] + np.random.normal(0, .025, (size, 4))
return X, Y

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

View File

@@ -0,0 +1,22 @@
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import f_regression
from sklearn.linear_model import LinearRegression
from data import generate_data
from ranks import mean_calc_and_sort, get_ranks
if __name__ == '__main__':
x, y = generate_data()
linear = LinearRegression()
linear.fit(x, y)
rfr = RandomForestRegressor(bootstrap=True)
rfr.fit(x, y)
f, p_val = f_regression(x, y, center=True)
ranks = get_ranks(linear, rfr, f)
print("mean", mean_calc_and_sort(ranks))

View File

@@ -0,0 +1,40 @@
import numpy as np
from sklearn.preprocessing import MinMaxScaler
def get_ranks(linear, rfr, f):
ranks = dict()
features = ["x%s" % i for i in range(1, 15)]
ranks['Linear'] = rank_to_dict(linear.coef_, features)
ranks['RFR'] = rank_to_dict(rfr.feature_importances_, features)
ranks['f_reg'] = rank_to_dict(f, features)
return ranks
def rank_to_dict(ranks, names):
ranks = np.abs(ranks)
minmax = MinMaxScaler()
ranks = minmax.fit_transform(np.array(ranks).reshape(14, 1)).ravel()
ranks = map(lambda x: round(x, 2), ranks)
return dict(zip(names, ranks))
def mean_calc_and_sort(ranks):
mean = {}
for key, value in ranks.items():
print(key, value)
for item in value.items():
if item[0] not in mean:
mean[item[0]] = 0
mean[item[0]] += item[1]
for key, value in mean.items():
res = value / len(ranks)
mean[key] = round(res, 2)
return mean

View File

@@ -0,0 +1,26 @@
#### Кондрашин Михаил ПИбд-41
## Лабораторная работа 3. Деревья решений
### Запуск лабораторной работы:
* установить `python`, `numpy`, `matplotlib`, `sklearn`
* запустить проект (стартовая точка класс `main.py`)
### Используемые технологии:
* Язык программирования `Python`,
* Библиотеки `numpy`, `matplotlib`, `sklearn`
* Среда разработки `IntelliJ IDEA` (В версии "Ultimate edition" можно писать на python)
### Описание решения:
* Выполняет ранжирование признаков для регрессионной модели
* По данным "WindData" решает задачу классификации (с помощью дерева решений), в которой необходимо определить статистические параметры ветра, влияющие на классификацию по каждому из классов интенсивности турбулентности в соответствии с международной классификацией IEC-614000-1 ed.3.
### Результат:
![Result](images/result.png)
Как видно из результата работы программы - для каждого класса интенсивности турбулентности ключевым является показатель скорости ветра, наиболее выражено это для класса "С", наименее - для класса "B"

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

View File

@@ -0,0 +1,41 @@
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None
FILE_PATH = "WindData.csv"
REQUIRED_COLUMNS = ['TI1', 'V1']
TARGET_COLUMN_1 = 'TurbulenceIntensityClassA'
TARGET_COLUMN_2 = 'TurbulenceIntensityClassB'
TARGET_COLUMN_3 = 'TurbulenceIntensityClassC'
def print_classifier_info(feature_importance):
feature_names = REQUIRED_COLUMNS
embarked_score = feature_importance[-3:].sum()
scores = np.append(feature_importance[:2], embarked_score)
scores = map(lambda score: round(score, 2), scores)
print(dict(zip(feature_names, scores)))
def actions(target_column):
data = pd.read_csv(FILE_PATH)
X = data[REQUIRED_COLUMNS]
y = data[target_column]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=100)
classifier_tree = DecisionTreeClassifier(random_state=100)
classifier_tree.fit(X_train, y_train)
print_classifier_info(classifier_tree.feature_importances_)
print("Оценка качества классификации ", target_column, " - ", classifier_tree.score(X_test, y_test))
if __name__ == '__main__':
actions(TARGET_COLUMN_1)
actions(TARGET_COLUMN_2)
actions(TARGET_COLUMN_3)

View File

@@ -0,0 +1,44 @@
**Задание**
***
Вариант 16
***Данные:*** make_moons (noise=0.3, random_state=rs)
***Модели:***
· Линейную регрессию
· Многослойный персептрон с 10-ю нейронами в скрытом слое (alpha = 0.01)
· Персептрон
***Как запустить лабораторную:***
***
Запустить файл lab1.py
***Используемые технологии***
***
Библиотеки numpy, matplotlib, sklearn и их компоненты
***Описание лабораторной (программы)***
***
Реализация использования NumPy, Matplotlib и scikit-learn в данном коде позволяет создавать и обучать модели машинного обучения на наборе данных make_moons.
Первым шагом является генерация набора данных make_moons, которая представляет собой двумерный набор точек, распределенных в форме двух связных полумесяцев. Это набор данных, используемый для задачи классификации.
Затем модели линейной регрессии, многослойного персептрона и персептрона обучаются на этом наборе данных. В процессе обучения модели адаптируют свои внутренние параметры для предсказания целевых значений.
После обучения моделей создается сетка точек, чтобы предсказать значения моделей на этой сетке. Это позволяет визуализировать, как модели классифицируют пространство точек.
В конце кода данные и предсказания моделей визуализируются на графиках. Это позволяет сравнить и оценить, насколько хорошо каждая модель справляется с задачей классификации и как они различаются друг от друга.
***Результат***
***
В результате программа выводит графики и оценки производительности обучения, полученные через model.score библиотеки sclearn.
Линейная регрессия: 0.5453845246295626
Многослойный персептрон: 0.10895145407108087
Персептрон: 0.5

View File

@@ -0,0 +1,55 @@
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.linear_model import LinearRegression
from sklearn.neural_network import MLPClassifier, MLPRegressor
from sklearn.metrics import accuracy_score, mean_squared_error
X, y = make_moons(noise=0.3, random_state=42)
# Линейная регрессия
lr = LinearRegression()
lr.fit(X, y)
# Многослойный персептрон
mlp = MLPRegressor(hidden_layer_sizes=(10,), alpha=0.01, random_state=42)
mlp.fit(X, y)
# Персептрон
perceptron = MLPClassifier(hidden_layer_sizes=(1,), random_state=42)
perceptron.fit(X, y)
# Создаем сетку точек для предсказания моделей
xx, yy = np.meshgrid(np.linspace(-2, 3, 1000), np.linspace(-2, 2, 1000))
Z_lr = lr.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
Z_mlp = mlp.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
Z_perceptron = perceptron.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
# Отображаем данные и предсказания моделей
plt.figure(figsize=(18, 6))
# График линейной регрессии
plt.subplot(1, 3, 1)
plt.contourf(xx, yy, Z_lr, cmap=plt.cm.RdBu, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdBu, edgecolors='k')
plt.title("Линейная регрессия")
# График многослойного персептрона
plt.subplot(1, 3, 2)
plt.contourf(xx, yy, Z_mlp, cmap=plt.cm.RdBu, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdBu, edgecolors='k')
plt.title("Многослойный персептрон")
# График персептрона
plt.subplot(1, 3, 3)
plt.contourf(xx, yy, Z_perceptron, cmap=plt.cm.RdBu, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdBu, edgecolors='k')
plt.title("Персептрон")
print("Линейная регрессия:", lr.score(X, y))
print("Многослойный персептрон:", mlp.score(X, y))
print("Персептрон:", perceptron.score(X, y))
# Показываем графики
plt.show()

View File

@@ -0,0 +1,71 @@
**Задание**
***
Используя код из пункта «Решение задачи ранжирования признаков», выполните ранжирование признаков с помощью указанных по варианту моделей. Отобразите получившиеся оценки каждого признака каждой моделью и среднюю оценку. Проведите анализ получившихся результатов. Какие четыре признака оказались самыми важными по среднему значению? (Названия\индексы признаков и будут ответом на задание).
**Вариант 16:**
Линейная регрессия (LinearRegression)
Случайное Лассо (RandomizedLasso)
Линейная корреляция (f_regression)
**Как запустить лабораторную**
***
Запустить файл main.py
**Используемые технологии**
***
Библиотеки numpy, scikit-learn, их компоненты
**Описание лабораторной (программы)**
***
Этот код демонстрирует, как выполнить ранжирование признаков в задаче регрессии с использованием моделей Linear Regression и Random Forest Regression, а также метода f_regression.
Первым этапом является создание случайных данных с помощью функции make_regression. Затем мы обучаем модель Linear Regression на этих данных и сохраняем оценки весов признаков. То же самое мы делаем и с моделью Random Forest Regression, сохраняя значения "важности" признаков, полученные от этой модели. Кроме того, мы применяем метод f_regression для получения оценок важности каждого признака.
Далее мы вычисляем среднее значение оценок признаков от трех методов / моделей. Затем мы выводим все оценки важности признаков.
В конце кода мы выбираем четыре наиболее важных признака на основе средних оценок и выводим их значения.
Важность признака определяется по его оценке/значению, где более высокие значения указывают на бОльшую важность. Очевидно, что самые важные признаки будут те, у которых оценки/значения выше всего.
**Результат**
***
В результате получаем следующее:
Признак 0: 0.8672604223819891
Признак 1: 0.7708510602186707
Признак 2: 0.03116023013554309
Признак 3: 0.6998726361290992
Признак 4: 1.0
Признак 5: 0.08986896281166205
Признак 6: 0.669155851030746
Признак 7: 0.1410044322180913
Признак 8: 0.043892111747763814
Признак 9: 0.5011547461825057
4 Наиболее значимых признака:
Признак 3: 0.6998726361290992
Признак 1: 0.7708510602186707
Признак 0: 0.8672604223819891
Признак 4: 1.0
Вывод: Исходя из выполненного кода, мы получили оценки важности признаков для задачи регрессии с использованием моделей Linear Regression, Random Forest Regression и метода f_regression.
Наиболее важные признаки, определенные на основе средних оценок, оказались: признак 1, признак 6, признак 0 и признак 4.
Эти признаки имеют наибольшее влияние на результат задачи регрессии и следует обратить на них особое внимание при анализе данных и принятии решений.

View File

@@ -0,0 +1,42 @@
import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import f_regression
from sklearn.preprocessing import MinMaxScaler
# Создание случайных данных
X, y = make_regression(n_samples=100, n_features=10, random_state=42)
# Масштабирование признаков
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
# Ранжирование признаков с помощью Linear Regression
linreg = LinearRegression()
linreg.fit(X_scaled, y)
linreg_scores = np.abs(linreg.coef_)
# Ранжирование признаков с помощью Random Forest Regression
rfreg = RandomForestRegressor()
rfreg.fit(X_scaled, y)
rfreg_scores = rfreg.feature_importances_
# Ранжирование признаков с помощью f_regression
freg_scores, _ = f_regression(X_scaled, y)
# Вычисление средней оценки
avg_scores = np.mean([linreg_scores, rfreg_scores, freg_scores], axis=0)
# Масштабирование score в интервал от 0 до 1
scaled_scores = avg_scores / np.max(avg_scores)
# Вывод результатов
for i, score in enumerate(scaled_scores):
print(f"Признак {i}: {score}")
# Получение индексов четырех наиболее важных признаков
top_features_indices = np.argsort(scaled_scores)[-4:]
print("4 Наиболее значимых признака:")
for idx in top_features_indices:
print(f"Признак {idx}: {scaled_scores[idx]}")

View File

@@ -0,0 +1,57 @@
# Лабораторная работа №2
> Ранжирование признаков
### Как запустить лабораторную работу
1. Установить python, numpy, sklearn
2. Запустить команду `python main.py` в корне проекта
### Использованные технологии
* Язык программирования `python`
* Библиотеки `numpy, sklearn`
* Среда разработки `PyCharm`
### Что делает программа?
Выполняет ранжирование 14 признаков для регрессионной проблемы Фридмана с помощью моделей:
- Лассо (Lasso)
- Рекурсивное сокращение признаков (Recursive Feature Elimination RFE)
- Линейная корреляция (f_regression)
Было проведено несколько экспериментов с разными параметрами моделей, чтобы оценить их влияние на итоговый результат:
#### Тест 1
![alt text](exp_1.png "Experiment 1")
![alt text](exp_console_1.png "Result 1")
#### Тест 2
![alt text](exp_2.png "Experiment 2")
![alt text](exp_console_2.png "Result 2")
#### Тест 3
![alt text](exp_3.png "Experiment 3")
![alt text](exp_console_3.png "Result 3")
#### Тест 4
![alt text](exp_4.png "Experiment 4")
![alt text](exp_console_4.png "Result 4")
Первые 2 эксперимента выявили, что признаки x4, x2, x1, x5 оказались самыми важными по среднему значению.
Другие 2 эксперимента выявили, что признаки x4, x2, x1, x11 оказались самыми важными по среднему значению.
Так как мы изначально знаем, что от признаков x1, x2, x3, x4 зависит наша функция и x11, x12, x13, x14 соответсвенно зависят от них, то лучшим исходом будут эти признаки.
Но ни один эксперимент не смог точно их выявить. Лучшими оказались эксперименты 3 и 4, так как в отличии от 1-го и 2-го они выявили ещё признак x11 вместо x5, который не влияет на нашу функцию вообще.
Из данных моделей лучше всего определила признаки модель Lasso с alpha=0.001

View File

@@ -0,0 +1,5 @@
LASSO_TITLE = 'LASSO'
RFE_TITLE = 'RFE'
F_REGRESSION_TITLE = 'f_regression'
FEATURES_AMOUNT = 14

View File

@@ -0,0 +1,14 @@
import config
import numpy as np
def generate_dataset():
np.random.seed(0)
size = 750
x = np.random.uniform(0, 1, (size, config.FEATURES_AMOUNT))
# Задаем функцию-выход: регрессионную проблему Фридмана
y = (10 * np.sin(np.pi * x[:, 0] * x[:, 1]) + 20 * (x[:, 2] - .5)**2 +
10*x[:, 3] + 5*x[:, 4]**5 + np.random.normal(0, 1))
# Добавляем зависимость признаков
x[:, 10:] = x[:, :4] + np.random.normal(0, .025, (size, 4))
return x, y

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

View File

@@ -0,0 +1,15 @@
from sklearn.linear_model import Lasso
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE, f_regression
def fit_models(x, y):
lasso = Lasso(alpha=0.001)
lasso.fit(x, y)
rfe = RFE(lasso, step=2)
rfe.fit(x, y)
f, val = f_regression(x, y, center=False)
return lasso, rfe, f

View File

@@ -0,0 +1,13 @@
from dataset import generate_dataset
from fit import fit_models
from ranks import calc_mean, get_ranks
x, y = generate_dataset()
lasso, rfe, f = fit_models(x, y)
ranks = get_ranks(lasso, rfe, f)
mean = calc_mean(ranks)
print("MEAN", mean)

View File

@@ -0,0 +1,42 @@
import config
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from operator import itemgetter
def rank_to_dict(ranks, names):
ranks = np.abs(ranks)
minmax = MinMaxScaler()
ranks = minmax.fit_transform(
np.array(ranks).reshape(config.FEATURES_AMOUNT, 1)).ravel()
ranks = map(lambda x: round(x, 2), ranks)
return dict(zip(names, ranks))
def flip_array(arr):
return-1 * arr + np.max(arr)
def get_ranks(lasso, rfe, f):
ranks = dict()
names = ["x%s" % i for i in range(1, config.FEATURES_AMOUNT+1)]
ranks[config.LASSO_TITLE] = rank_to_dict(lasso.coef_, names)
ranks[config.RFE_TITLE] = rank_to_dict(flip_array(rfe.ranking_), names)
ranks[config.F_REGRESSION_TITLE] = rank_to_dict(f, names)
return ranks
def calc_mean(ranks):
mean = {}
for key, value in ranks.items():
print(key, value)
for item in value.items():
if item[0] not in mean:
mean[item[0]] = 0
mean[item[0]] += item[1]
for key, value in mean.items():
res = value/len(ranks)
mean[key] = round(res, 2)
return sorted(mean.items(), key=itemgetter(1), reverse=True)

View File

@@ -0,0 +1,83 @@
# Лабораторная работа №3
> Деревья решений
### Как запустить лабораторную работу
1. Установить python, numpy, sklearn
1. Для запуска на наборе данных первого задания `python titanic.py`
1. Для запуска на наборе данных второго задания `python cars.py`
### Использованные технологии
* Язык программирования `python`
* Библиотеки `numpy, sklearn`
* Среда разработки `PyCharm`
### Что делает программа?
#### Часть 1
По данным о пассажирах Титаника решите задачу классификации (с помощью дерева решений), в которой по различным характеристикам пассажиров требуется найти у выживших пассажиров два наиболее важных признака из трех рассматриваемых (по варианту).
Вариант 18 Pclass, Age, Ticket.
Была использована модель DecisionTreeClassifier
#### Набор данных titanic.csv
![alt text](titanic.png "titanic results")
Оценка модели 0.68
2 ключевых параметра, выделенных моделью: Age, Ticket(Fare)
#### Часть 2
Решите с помощью библиотечной реализации дерева решений задачу из лабораторной работы «Веб-сервис «Дерево решений» по предмету «Методы искусственного интеллекта» на 99% ваших данных. Проверьте работу модели на оставшемся проценте, сделайте вывод.
#### Данные
Набор данных о машинах на вторичном рынке.
> Ссылка на набор данных: https://www.kaggle.com/datasets/harikrishnareddyb/used-car-price-predictions
#### Цель
С помощью дерева решений классифицировать цену автомобилей
#### Модель
Модель использованная в ходе эксперимента DecisionTreeClassifier из пакета sklearn
#### Набор данных true_car_listings.csv
![alt text](cars.png "cars results")
Выбранный начальный набор параметров:
- Mileage
- Year
- Model
**Количество данных:**
[30000 rows x 3 columns]
**Оценка:**
0.01
**Важность параметров:**
[0.8780813 0.04707369 0.074845 ]
Качество неудовлетворительное.
Параметр, имеющий самую большую значимость: Mileage(пробег)
### Вывод
Главный вывод работы, состоит в том, что модель DecisionTreeClassifier
не подходит для решения 2 части данной задачи, поэтому решение не может быть применено на практике.
Причина низкой точности модели заключается в том, что цена автомобиля на вторичном рынке зависит не только от пробега,
но и от множества других факторов, таких как кол-во аварий, общего состояние автомобиля и экономической обстановке на рынке -
и эти фаткоры могут оказывать такое же существенное воздействие на конечную цену. Однако, можно сделать выводы по влиянию
пробега автомобиля на его стоимость и использовать это в дальнейшем при реализации задач.

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

View File

@@ -0,0 +1,30 @@
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
slice_size = 30000
data = pd.read_csv('true_car_listings.csv', index_col='Vin')[:slice_size]
unique_numbers = list(set(data['Model']))
data['Model'] = data['Model'].apply(unique_numbers.index)
clf = DecisionTreeClassifier(random_state=341)
# Выбираем параметры
Y = data['Price']
X = data[['Mileage', 'Year', 'Model']]
print(X)
# Разделяем набор на тренировочные и тестовые данные
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.2, random_state=42)
# Запуск на тренировочных данных
clf.fit(X_train, y_train)
# Точность модели
print(f'Score: {clf.score(X_test, y_test)}')
# Значимость параметров
importances = clf.feature_importances_
print(f'Means {importances}')

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

View File

@@ -0,0 +1,26 @@
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
data = pd.read_csv('titanic.csv', index_col='Passengerid')
clf = DecisionTreeClassifier(random_state=241)
# Выбираем параметры
Y = data['2urvived']
X = data[['Pclass', 'Age', 'Fare', ]]
print(X)
# Разделяем набор на тренировочные и тестовые данные
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.05, random_state=42)
# Запуск на тренировочных данных
clf.fit(X_train, y_train)
# Точность модели
print(f'Score: {clf.score(X_test, y_test)}')
# Значимость параметров
importances = clf.feature_importances_
print(f'Means: {importances}')

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,43 @@
# Лабораторная работа 3. Деревья решений
### Вариант № 18
Решите с помощью библиотечной реализации дерева решений задачу
из лабораторной работы «Веб-сервис «Дерево решений» по предмету
«Методы искусственного интеллекта» на 99% ваших данных.
Проверьте работу модели на оставшемся проценте, сделайте вывод.
***
## *Как запустить лабораторную работу:*
Чтобы запустить программу, открываем файл lab3 в PyCharm и нажимаем на зеленый треугольник в правом верхнем углу.
***
## *Использованные технологии:*
**Scikit-learn** - один из наиболее широко используемых пакетов Python для Data Science и Machine Learning. Он позволяет выполнять множество операций и предоставляет множество алгоритмов.
**Pandas** — это библиотека с открытым исходным кодом, предоставляющая высокопроизводительные, простые в использовании структуры данных и инструменты анализа данных для языка программирования Python.
***
## *Что делает ЛР:*
В данной работе анализируется работа дерева решений DecisionTreeClassifier и решается задача классификации ответчиков на регулярно занимающихся
и не занимающихся физическими упражнениями на основе характеристик. Имеется набор данных (clean_data),
содержащий результаты опроса добровольцев на тему их состояния здоровья. Были выбраны 5 признаков:
* age - возраст респондента;
* weight - вес респондента;
* work - уровень физ. активности на работе;
* phy_health - оценка состояния здоровья;
* gymtime - время, проведенное в тренажерном зале.
Среди них необходимо выявить 2 наиболее важных признака по целевой переменной exercise_reg - занимается или не занимается респондент физ. упражнениями,
построенной на основе признака phy_ex - оценка важности физических упражнений. Значение exercise_reg = 1, если значение phy_ex >= 7, и 0 - в остальных случаях.
Необходимо обучить модель на 99% данных и оценить качество модели на оставшемся проценте.
**Результатом работы программы** являются: вывод первых 15 строк подготовленных данных, вывод оценок важности признаков и вывод оценки качества модели (в консоли).
***
## *Пример выходных данных:*
>Вывод в консоли:
![](https://sun9-15.userapi.com/impg/Wq3qiVmaNYVI4CUX6SmFpRMJn3UZDJIbniFUMA/nsSbL7Xjcn4.jpg?size=492x421&quality=96&sign=772aa2b9ae8b708139b75a1ccc46d524&type=album)
***
**Вывод**:
Модель DecisionTreeClassifier выделила 2 наиболее важных параметра, а именно **вес** и **оценка важности физ. упражнений**, остальные
признаки имеют минимальную значимость, однако возраст достаточно близок по оценке важности к весу. Оценка качества модели говорит о том, что
модель достаточно точно предсказывает классы для данных по характеристикам.

View File

@@ -0,0 +1,262 @@
"","ts","age","sex","work","phy_ff","phy_health","phy_bw","phy_ex","meal","height","weight","exercise","fruit","veg","cook","spend","income","gymtime","disease","review","rate"
"2","2021-10-19 20:05:43",21,"Male","Moderate",7,8,6,7,"5",182.88,76,"Others",1,2,1,1000,50000,"Not at all","No","",4
"3","2021-10-19 20:18:40",23,"Male","Moderate",6,9,8,8,"5",170.688,70,"Others",1,2,4,300,6500,"1 Hour","No","",5
"4","2021-10-20 11:03:47",21,"Male","Moderate",7,6,7,8,"5+",182.88,77,"Walk, Run, Outdoor Games, Others",1,2,2,3000,50000,"Not at all","No","",5
"5","2021-10-20 11:06:04",23,"Male","Moderate",6,9,8,8,"5",170.688,70,"Walk, Weight lifting, Outdoor Games, Others",0,2,3,500,6500,"1 Hour","No","It's really good.",5
"6","2021-10-20 11:19:54",22,"Male","Moderate",2,2,2,2,"2",170.688,85,"Walk, Do not interested",1,1,1,0,100000,"Not at all","Nahi","",5
"7","2021-10-20 11:44:06",22,"Male","Moderate",3,4,3,4,"5",182.88,101,"Run, Outdoor Games",0,3,1,500,35000,"Not at all","No","",5
"8","2021-10-20 11:44:44",21,"Male","Moderate",9,9,10,8,"5",173.736,60,"Run, Outdoor Games",1,2,1,2000,50000,"Not at all","No","",5
"9","2021-10-20 11:45:38",21,"Male","Hardwork",2,3,3,2,"4",155.448,55,"Walk",1,0,1,0,35000,"Not at all","No","",4
"10","2021-10-20 11:46:49",21,"Male","Moderate",3,2,3,1,"5+",173.736,69,"Walk, Run, Outdoor Games",1,2,1,0,1000000,"Not at all","No","",4
"11","2021-10-20 11:47:29",22,"Male","Moderate",6,7,10,10,"5",182.88,78,"Others",4,4,3,0,200000,"Not at all","No","Welcome",5
"12","2021-10-20 11:48:16",23,"Male","Moderate",7,6,6,8,"5",176.784,65,"Walk, Run, Outdoor Games",4,4,2,1000,30000,"Not at all","No","",3
"13","2021-10-20 11:48:28",21,"Male","Moderate",3,4,4,4,"4",170.688,80,"Others",1,2,0,0,150000,"Not at all","Cough and cold","It's nearly impossible for someone to know their exact family income. So that should have been in groups like
Below 10,000
10,000-30,000
30,000-60,000
Above 60,000",3
"14","2021-10-20 11:50:53",22,"Male","Moderate",4,6,7,5,"5",179.832,58,"Walk, Others",1,2,0,500,150000,"Not at all","Alopecia Totalis","",4
"15","2021-10-20 11:55:03",26,"Male","Moderate",3,2,4,3,"4",158.496,53,"Outdoor Games, Others",0,2,1,0,50000,"Not at all","Cholesterol","",4
"16","2021-10-20 11:57:30",22,"Male","Moderate",2,2,3,2,"5",173.736,72,"Walk, Outdoor Games",1,2,1,0,50000,"Not at all","Hyperthyroidism","",4
"17","2021-10-20 12:01:13",23,"Male","Moderate",3,2,2,3,"4",182.88,72,"Run, Outdoor Games, Others",1,2,0,0,10000,"Not at all","No","",4
"18","2021-10-20 12:04:42",23,"Male","Moderate",8,6,7,4,"5",125.2728,48,"Walk, Outdoor Games",1,3,1,0,5000,"Not at all","No","",4
"19","2021-10-20 12:07:32",22,"Female","Moderate",1,4,4,1,"5+",161.544,72,"Walk, Others",1,2,3,0,50000,"Not at all","No","You should've given income range. ",3
"20","2021-10-20 12:07:40",24,"Male","Moderate",7,5,10,10,"4",167.64,82,"Walk, Run, Weight lifting, Outdoor Games",1,3,0,2000,32000,"1 Hour","Asthma","Questionnaire forming is good",4
"21","2021-10-20 12:15:47",22,"Male","Sedentary",4,7,7,5,"4",176.784,63,"Walk, Outdoor Games, Others",1,2,3,0,10000,"Not at all","Nope","",3
"22","2021-10-20 12:20:34",22,"Male","Moderate",4,5,1,3,"5",176.784,71,"Walk, Run, Swim",1,2,5,0,26000,"Not at all","Stomach problems, basically Digestion problem.","much good",5
"23","2021-10-20 12:25:37",21,"Male","Hardwork",3,4,4,3,"5",179.832,74,"Walk, Others",1,2,0,500,30000,"Not at all","Lower back pain","Home exercise should be considered in the fitness. Training option",4
"24","2021-10-20 12:30:44",22,"Male","Moderate",7,10,10,10,"4",173.736,78,"Walk, Run, Weight lifting",0,2,1,1000,80000,"2 Hour","Common cold","Go ahead w your project . All the best .",4
"25","2021-10-20 13:00:29",20,"Male","Moderate",7,5,7,6,"5",173.736,60,"Walk, Run, Outdoor Games",1,3,3,0,40000,"Not at all","no","",4
"26","2021-10-20 13:01:48",23,"Male","Moderate",7,6,9,9,"4",188.976,78,"Run, Weight lifting",1,2,1,500,85000,"1 Hour","No","",5
"27","2021-10-20 13:08:54",20,"Male","Moderate",5,4,8,5,"4",155.448,55,"Walk, Others",1,1,0,0,22500,"Not at all","Hyperthyroidism","",5
"28","2021-10-20 13:17:33",23,"Male","Moderate",8,3,8,7,"4",176.784,70,"Walk, Run, Weight lifting",1,2,1,1000,5000,"Not at all","No","",5
"29","2021-10-20 14:27:29",22,"Male","Hardwork",2,2,3,3,"5",155.7528,64,"Walk, Run, Swim",2,2,2,500,15000,"Not at all","Appendicitis","",5
"30","2021-10-20 14:50:44",20,"Male","Moderate",8,8,6,9,"5",155.7528,95,"Walk",1,2,0,0,35000,"Not at all","No","",5
"31","2021-10-20 14:53:47",21,"Male","Moderate",3,3,3,2,"2",179.832,65,"Walk, Outdoor Games",1,2,1,0,0,"Not at all","No","",3
"32","2021-10-20 14:56:35",21,"Male","Moderate",2,5,6,2,"5",164.592,52,"Walk, Run, Swim",4,4,0,500,40000,"Not at all","Cold","It's a good experience",5
"33","2021-10-20 15:02:49",21,"Male","Moderate",3,4,1,3,"2",179.832,95,"Walk",1,1,2,0,80000,"Not at all","No","",5
"34","2021-10-20 15:34:32",28,"Female","Hardwork",3,3,4,4,"4",121.92,40,"Walk, Run, Others",3,3,0,10000,50000,"More than 3 hours","Migraine","Thanks",5
"35","2021-10-20 15:36:13",24,"Male","Sedentary",3,3,2,2,"5",158.496,50,"Run, Outdoor Games",1,2,1,100,30000,"Not at all","No","",4
"36","2021-10-20 15:41:09",22,"Male","Moderate",4,6,8,4,"4",164.592,88,"Do not interested",0,3,1,300,16000,"Not at all","No","I would specify there should be an others section in gender .",4
"37","2021-10-20 15:54:06",22,"Female","Moderate",2,4,3,2,"5",164.592,65,"Dance",2,2,1,0,35000,"Not at all","No","Interesting ",4
"38","2021-10-20 17:17:30",18,"Female","Hardwork",3,2,4,3,"5",170.688,56,"Walk, Others",2,3,0,5000,2200000,"1 Hour","No","",5
"39","2021-10-20 19:16:57",23,"Male","Moderate",6,9,10,10,"4",179.832,84,"Weight lifting, Outdoor Games",0,1,2,2000,30000,"2 Hour","No","Good",4
"40","2021-10-20 21:04:27",22,"Male","Moderate",3,3,4,3,"5",173.736,90,"Run",3,3,1,500,40000,"Not at all","No","",5
"41","2021-10-21 09:01:29",21,"Male","Moderate",5,7,6,4,"5+",155.448,95,"Do not interested",0,4,2,0,35500,"Not at all","Weak Stomach","",4
"42","2021-10-21 09:20:34",20,"Male","Moderate",9,8,10,10,"5+",155.7528,70,"Walk, Outdoor Games, Others",3,4,4,200,5000,"Not at all","No","",5
"43","2021-10-21 11:24:37",21,"Female","Sedentary",5,5,5,5,"5",161.544,49,"Dance",2,4,3,100,20000,"Not at all","No","No ",5
"44","2021-10-23 18:45:38",21,"Male","Moderate",7,5,6,7,"4",182.88,72,"Outdoor Games",0,3,3,0,11000,"Not at all","no","",5
"45","2021-10-23 23:17:57",22,"Male","Moderate",7,2,9,10,"5+",173.736,60,"Walk, Run, Outdoor Games, Others",2,3,1,3000,27000,"1 Hour","No","",5
"46","2021-10-20 11:25:38",24,"Male","Moderate",2,2,3,3,"4",173.736,73,"Others",0,1,1,1000,12000,"Not at all","No","",4
"47","2021-10-20 11:26:15",24,"Female","Moderate",2,3,2,3,"5",164.592,69,"Walk, Dance, Outdoor Games, Others",0,2,1,0,10000,"Not at all","No","",3
"48","2021-10-20 11:30:49",20,"Male","Hardwork",5,10,10,10,"5",167.64,70,"Walk, Run",3,5,5,0,30000,"Not at all","No","Very goodod survey it was.",5
"49","2021-10-20 11:39:07",20,"Male","Moderate",3,3,2,1,"5+",155.7528,85,"Walk, Weight lifting",1,2,1,200,90000,"Not at all","No","✌🏻",5
"50","2021-10-20 11:39:28",20,"Male","Moderate",2,2,4,4,"4",170.688,58,"Walk, Weight lifting, Others",1,2,0,0,10000,"1 Hour","Sinusitis ","",3
"51","2021-10-20 11:39:41",21,"Male","Hardwork",9,8,9,9,"3",182.88,85,"Walk, Outdoor Games",4,4,4,2000,0,"2 Hour","No","Thanks",5
"52","2021-10-20 11:41:26",25,"Male","Moderate",5,6,7,6,"5",164.592,55,"Walk, Run",1,2,1,200,20000,"Not at all","No","",2
"53","2021-10-20 11:43:39",23,"Male","Moderate",2,3,4,4,"5",164.592,48,"Walk, Outdoor Games",0,1,2,1000,35000,"1 Hour","Pancriatitis","",4
"54","2021-10-20 11:43:48",21,"Male","Moderate",10,5,10,8,"5",180.4416,90,"Walk, Outdoor Games",1,4,2,0,45000,"Not at all","No","",5
"55","2021-10-20 11:44:26",21,"Male","Moderate",2,2,3,3,"5",182.88,76,"Swim",0,1,1,1000,30000,"Not at all","No","",5
"56","2021-10-20 11:47:21",22,"Female","Sedentary",6,9,10,6,"4",170.688,60,"Walk",1,3,1,0,150000,"Not at all","No","",3
"57","2021-10-20 11:47:38",23,"Male","Moderate",1,1,2,1,"5",167.64,70,"Walk",0,1,0,0,10000,"Not at all","No","",3
"58","2021-10-20 11:48:08",23,"Female","Moderate",3,1,3,3,"5+",152.4,65,"Others",1,3,0,500,50000,"1 Hour","PCOS","",4
"59","2021-10-20 11:51:59",20,"Female","Moderate",5,4,3,2,"5",155.448,60,"Walk",2,3,1,200,7000,"Not at all","No","",4
"60","2021-10-20 11:52:34",23,"Female","Hardwork",2,2,2,2,"5",152.4,51,"Dance",1,1,2,500,20000,"Not at all","No","Eat healthy stay healthy ",4
"61","2021-10-20 11:54:14",26,"Male","Moderate",3,1,3,3,"5",167,63,"Walk",0,1,1,0,3600,"Not at all","No","",4
"62","2021-10-20 11:54:31",18,"Female","Hardwork",2,2,2,1,"4",176.784,57,"Walk, Run, Outdoor Games",1,1,1,2300,56000,"Not at all","No","",5
"63","2021-10-20 11:54:41",56,"Male","Hardwork",4,4,4,4,"5+",152.4,65,"Dance",3,3,3,25000,300000,"3 Hours","No","Yes",5
"64","2021-10-20 11:57:15",23,"Male","Sedentary",7,9,9,8,"3",185.928,56,"Outdoor Games, Others",0,2,2,0,180000,"Not at all","No","Thank you",4
"65","2021-10-20 11:58:30",24,"Female","Moderate",7,7,9,8,"4",164.592,65,"Walk, Outdoor Games, Others",2,2,1,500,20000,"Not at all","No","",4
"66","2021-10-20 11:59:36",20,"Male","Moderate",10,10,10,10,"2",173.736,71.4,"Weight lifting",2,3,1,3000,40000,"2 Hour","No","Pretty good survey ",4
"67","2021-10-20 11:59:42",20,"Male","Hardwork",2,1,1,3,"5",155.7528,65,"Swim",1,2,2,2500,100000,"3 Hours","No","",5
"68","2021-10-20 11:59:49",18,"Female","Moderate",5,10,5,7,"5",164.592,56,"Run, Dance, Outdoor Games",1,3,1,500,400000,"Not at all","No.","",4
"69","2021-10-20 12:00:54",24,"Male","Moderate",3,2,2,3,"5",173.736,69,"Walk, Run, Outdoor Games, Others",1,2,1,1500,25000,"Not at all","no","",4
"70","2021-10-20 12:01:03",20,"Female","Moderate",3,4,3,2,"5",164.592,45,"Walk",3,1,3,2500,40000,"Not at all","Thyroid ","",5
"71","2021-10-20 12:03:22",21,"Male","Hardwork",2,2,4,4,"4",170.688,85,"Walk",1,2,1,0,5009,"Not at all","No","Excellent",5
"72","2021-10-20 12:06:20",21,"Male","Hardwork",4,2,4,4,"4",182.88,80,"Run, Swim, Outdoor Games",1,3,3,1000,50000,"Not at all","no","",5
"73","2021-10-20 12:06:21",23,"Female","Moderate",9,6,8,8,"5",152.4,54,"Walk, Dance, Others",1,3,1,700,50000,"Not at all","No","",4
"74","2021-10-20 12:06:58",22,"Male","Moderate",7,8,10,9,"4",167.64,56,"Walk, Others",1,4,1,300,8000,"Not at all","Yes, gastric","",4
"75","2021-10-20 12:08:13",20,"Female","Moderate",2,1,2,3,"4",146.304,50,"Walk",1,2,0,500,5000,"Not at all","PCOD ","",5
"76","2021-10-20 12:08:20",23,"Male","Moderate",6,3,10,10,"5+",176.784,70,"Walk, Weight lifting",1,2,1,3000,35000,"1 Hour","Crohn's disease","",4
"77","2021-10-20 12:09:33",24,"Female","Moderate",7,10,10,9,"5",161.544,38,"Others",1,3,1,3000,14000,"Not at all","No","Interesting survey..",5
"78","2021-10-20 12:10:50",21,"Female","Moderate",7,4,9,6,"4",170.688,58,"Walk",0,2,1,0,40000,"Not at all","No","",4
"79","2021-10-20 12:11:02",21,"Female","Sedentary",2,3,3,3,"5+",161.544,49,"Others",2,1,1,1000,70000,"Not at all","Anemia","It's ammezing",4
"80","2021-10-20 12:11:30",23,"Female","Moderate",7,10,9,8,"5",161.544,70,"Walk",1,2,1,2500,35000,"2 Hour","Diabetes","",5
"81","2021-10-20 12:11:46",25,"Male","Hardwork",8,7,9,10,"5",176.784,60.2,"Walk, Run",4,3,4,250,32500,"Not at all","Back pain.","Please correct some spelling mistakes and write the meaning of each ratings. It will be helpful for the respondents.
A little suggestion: try to put more questions about time of exercise, type of exercises, food habits etc.
Wishing you a great success in this survey as well as analysis. ",3
"82","2021-10-20 12:11:49",20,"Female","Sedentary",2,2,2,2,"5",161.544,56,"Walk",1,1,1,3000,140000,"Not at all","no","",4
"83","2021-10-20 12:11:54",20,"Female","Sedentary",2,1,1,3,"4",167.64,55,"Walk",1,2,2,1500,50000,"1 Hour","No","Good.",5
"84","2021-10-20 12:13:57",21,"Male","Moderate",2,2,4,3,"5",161.544,70,"Walk, Outdoor Games",2,2,1,100,15000,"Not at all","No","",5
"85","2021-10-20 12:18:59",21,"Male","Hardwork",3,4,3,2,"4",179.832,82,"Others",1,1,0,0,40000,"Not at all","sneezing","Nice",4
"86","2021-10-20 12:19:37",20,"Female","Sedentary",2,10,10,10,"5+",164.592,65,"Walk, Dance",1,2,1,2500,60000,"2 Hour","No","Very good",4
"87","2021-10-20 12:20:00",24,"Female","Moderate",7,9,9,8,"4",155,45,"Walk",3,4,1,200,100000,"Not at all","Psoriasis","",4
"88","2021-10-20 12:24:37",23,"Female","Moderate",8,5,6,7,"4",173.4312,51,"Walk",1,5,2,500,40000,"Not at all","No","",4
"89","2021-10-20 12:30:14",26,"Female","Hardwork",3,2,4,4,"5",164.592,56,"Walk, Others",2,3,1,1000,100000,"Not at all","No","Hope your study completes soon, and come out with good result.",4
"90","2021-10-20 12:34:50",25,"Female","Moderate",4,4,4,3,"4",167.64,53,"Walk",0,2,3,3000,0,"Not at all","No ","",3
"91","2021-10-20 12:35:23",23,"Male","Moderate",8,2,10,10,"5+",176.784,64,"Weight lifting",1,2,2,3000,80000,"2 Hour","No","",4
"92","2021-10-20 12:40:53",21,"Female","Moderate",5,4,4,8,"5",152.4,51,"Others",2,3,4,1000,30000,"2 Hour","No","Very good",5
"93","2021-10-20 12:45:36",24,"Male","Hardwork",5,5,5,4,"4",173.736,56,"Walk",1,1,2,3000,200000,"Not at all","No","❤️",5
"94","2021-10-20 12:50:18",25,"Female","Moderate",2,3,1,1,"4",155.448,49,"Walk, Dance",0,1,1,0,400000,"Not at all","No","",5
"95","2021-10-20 12:53:40",21,"Male","Moderate",9,1,9,9,"3",173.736,70,"Outdoor Games",0,4,2,1000,20000,"2 Hour","No","",4
"96","2021-10-20 12:55:11",20,"Female","Moderate",7,8,10,7,"4",152.4,38,"Walk",3,2,5,500,6000,"Not at all","No","",5
"97","2021-10-20 12:57:19",26,"Female","Moderate",4,2,4,3,"5",161.544,55,"Walk, Run",1,3,0,500,90000,"1 Hour","No","",5
"98","2021-10-20 12:57:36",20,"Male","Hardwork",4,3,4,4,"5",182.88,77,"Run, Weight lifting",2,3,3,500,45000,"3 Hours","No","Very good",5
"99","2021-10-20 12:59:28",30,"Male","Moderate",9,2,7,7,"4",170.688,60,"Walk",1,4,1,0,35000,"Not at all","No","Na",4
"100","2021-10-20 12:59:29",24,"Female","Hardwork",3,4,3,3,"5",152.4,46,"Walk, Dance, Swim",1,2,1,100,30000,"Not at all","Ni","Good initiative",5
"101","2021-10-20 13:04:42",23,"Female","Moderate",1,2,1,3,"4",149.352,57,"Do not interested",0,1,0,0,35000,"Not at all","No","Great questionnaire",5
"102","2021-10-20 13:05:49",22,"Male","Hardwork",4,3,4,4,"5",176.784,69,"Run, Weight lifting, Outdoor Games",1,3,1,1500,500000,"1 Hour","NO ","",5
"103","2021-10-20 13:05:51",24,"Male","Moderate",2,1,2,2,"5",176.784,70,"Run",0,1,0,0,10000,"Not at all","No","",5
"104","2021-10-20 13:10:16",22,"Male","Moderate",2,3,3,1,"5+",173.736,60,"Walk",2,2,2,500,7000,"Not at all","Cough and cold ","",5
"105","2021-10-20 13:19:48",20,"Female","Moderate",6,10,7,5,"5",176.784,57,"Walk, Run, Dance",3,5,1,0,15000,"Not at all","No","",3
"106","2021-10-20 13:20:27",22,"Male","Moderate",1,4,3,3,"5",173.736,65,"Walk",1,2,1,0,50000,"Not at all","No","",3
"107","2021-10-20 13:21:21",24,"Female","Moderate",2,4,4,4,"4",167.64,58,"Walk",3,3,1,1000,80000,"Not at all","No","",5
"108","2021-10-20 13:26:17",22,"Female","Moderate",7,9,10,8,"5",152.4,48,"Do not interested",0,2,4,1000,20000,"Not at all","No","",5
"109","2021-10-20 13:26:48",24,"Male","Hardwork",8,8,10,2,"4",172.5168,47,"Walk",0,3,4,0,8000,"Not at all","No","Good",4
"110","2021-10-20 13:27:14",26,"Male","Moderate",4,2,3,3,"5",176.784,72,"Walk, Run, Outdoor Games, Others",3,3,1,100,25000,"Not at all","No","",4
"111","2021-10-20 13:29:56",20,"Male","Moderate",2,3,2,1,"2",152.4,58,"Walk, Run",0,1,2,400,10000,"Not at all","Vomiting","It's a very good idea for health concern",5
"112","2021-10-20 13:37:24",20,"Male","Moderate",2,3,3,3,"3",167.64,67.5,"Walk, Run",0,1,1,0,3000,"Not at all","No","Yeah this is a good thing to do anylysis data",3
"113","2021-10-20 13:37:24",23,"Female","Moderate",2,3,2,1,"5",161.544,62,"Do not interested",0,1,2,500,15000,"Not at all","No","",5
"114","2021-10-20 13:38:30",25,"Male","Moderate",2,1,3,2,"3",173.736,65,"Walk",1,2,0,3000,60000,"Not at all","No","",4
"115","2021-10-20 13:43:32",21,"Male","Moderate",7,9,9,10,"5+",173.736,68,"Swim, Outdoor Games",3,4,3,500,7000,"Not at all","No","",4
"116","2021-10-20 13:49:25",19,"Female","Hardwork",3,2,4,3,"5+",161.544,57,"Walk",1,2,1,500,25000,"Not at all","No","",4
"117","2021-10-20 13:59:13",24,"Female","Moderate",7,9,9,10,"4",164.592,65,"Dance",2,4,1,0,25000,"Not at all","No","",4
"118","2021-10-20 14:03:25",20,"Female","Moderate",3,4,2,4,"5",161.544,65,"Walk",1,2,3,7000,30000,"2 Hour","No","",5
"119","2021-10-20 14:05:11",23,"Female","Sedentary",3,2,1,1,"4",170.688,53,"Weight lifting, Outdoor Games",0,0,0,0,10000,"Not at all","No","",5
"120","2021-10-20 14:13:12",22,"Female","Moderate",2,4,3,4,"4",152.4,43,"Others",0,1,0,200,10000,"Not at all","Spondilytis, dry eyes, pcos","",4
"121","2021-10-20 14:15:36",20,"Female","Moderate",4,2,4,4,"5+",158.496,50,"Others",1,1,2,0,65000,"Not at all","No","",4
"122","2021-10-20 14:17:26",20,"Female","Moderate",2,4,3,3,"4",164.592,50,"Dance",1,1,2,1000,5000,"Not at all","No","",4
"123","2021-10-20 14:19:59",20,"Male","Moderate",5,6,8,7,"5+",167.64,75,"Walk",2,4,5,0,17000,"Not at all","No","",4
"124","2021-10-20 14:20:30",19,"Female","Moderate",6,10,10,7,"5",124.968,36,"Walk, Run, Outdoor Games",4,4,4,1600,4000,"Not at all","No","Necessary survey ",4
"125","2021-10-20 14:21:06",19,"Female","Moderate",9,1,7,8,"5",161.544,43.5,"Walk, Run",1,2,0,2000,45000,"1 Hour","No","",4
"126","2021-10-20 14:21:55",26,"Male","Moderate",7,8,6,8,"4",173.736,65,"Walk",0,2,1,1000,50000,"Not at all","No","",4
"127","2021-10-20 14:22:58",19,"Male","Hardwork",2,4,1,1,"5",179.832,72,"Run, Outdoor Games",1,2,1,1000,20000,"Not at all","No","Thank you,your survey form really helps me to evaluate my daily meals and it also helps me to put emphasis on how I utilize them in exercise and outdoor games...",5
"128","2021-10-20 14:31:29",19,"Male","Hardwork",8,10,10,10,"4",161.544,85,"Outdoor Games",1,5,0,0,70000,"Not at all","Yes ( Cough )","",5
"129","2021-10-20 14:35:18",20,"Female","Moderate",1,3,2,2,"5",155.448,55,"Walk",1,1,1,0,10000,"Not at all","No.","Very good thing that you are doing, carry on. ",5
"130","2021-10-20 14:37:02",26,"Female","Hardwork",3,3,3,4,"3",148,55,"Dance",1,2,2,500,30000,"Not at all","No","",3
"131","2021-10-20 14:37:48",26,"Female","Moderate",6,10,10,7,"5",161.544,73,"Walk",1,3,1,1000,46000,"Not at all","PCOS","",4
"132","2021-10-20 14:39:32",24,"Female","Hardwork",9,8,9,6,"5",161.544,60,"Walk",1,4,2,0,30000,"Not at all","No","Great project idea. Please share with me the end results. ",5
"133","2021-10-20 14:39:44",19,"Male","Hardwork",2,2,3,3,"4",170.688,74,"Walk, Outdoor Games",1,2,0,0,60000,"Not at all","Noo","",4
"134","2021-10-20 14:41:36",23,"Male","Hardwork",3,2,3,2,"5",179.832,63,"Walk",2,4,1,0,30000,"Not at all","Allergy","",4
"135","2021-10-20 14:41:50",23,"Male","Moderate",8,4,2,9,"3",161.544,59,"Others",3,4,1,0,6000,"Not at all","No","-",4
"136","2021-10-20 14:44:52",22,"Male","Hardwork",2,1,1,3,"3",164.592,54,"Walk, Others",0,2,1,1000,20000,"1 Hour","No","This survey is realy very good.",5
"137","2021-10-20 14:48:20",20,"Male","Moderate",2,4,3,3,"5",176.784,52,"Walk, Outdoor Games, Others",1,2,2,1000,60000,"Not at all","Allergy","",4
"138","2021-10-20 14:52:01",20,"Male","Hardwork",5,6,6,10,"5",176.784,85,"Walk, Run, Weight lifting, Others",5,4,5,4000,50000,"2 Hour","Allergy","",3
"139","2021-10-20 14:52:34",26,"Male","Moderate",5,10,10,5,"4",167.64,68,"Walk",1,1,2,2000,300000,"Not at all","No","",5
"140","2021-10-20 14:55:26",19,"Female","Sedentary",4,3,4,4,"5",170.688,55,"Walk, Others",2,3,0,200,7000,"Not at all","Yes, common cold disease ","",5
"141","2021-10-20 14:55:38",23,"Male","Sedentary",7,5,5,7,"4",152.4,52,"Walk, Run, Swim",1,2,1,0,40000,"Not at all","No","That's amazing. ",4
"142","2021-10-20 14:59:25",20,"Female","Moderate",3,3,1,3,"4",161.544,50,"Walk, Dance",2,2,1,0,5000,"Not at all","No","",4
"143","2021-10-20 15:03:55",24,"Male","Hardwork",10,5,9,10,"5",176.784,68,"Run, Weight lifting, Others",3,4,4,0,5000,"1 Hour","N0","",4
"144","2021-10-20 15:11:01",22,"Male","Hardwork",5,6,9,9,"5",155.7528,66,"Walk, Run, Weight lifting",1,0,1,4000,32000,"1 Hour","Yes.Irritable bowel syndrome.(IBS)","You should what do you do for mental health.As that's very important now days.",4
"145","2021-10-20 15:17:51",25,"Male","Moderate",2,1,2,3,"4",179.832,64,"Outdoor Games",1,2,1,1500,15000,"Not at all","No","",4
"146","2021-10-20 15:23:11",21,"Male","Moderate",3,3,4,4,"5",157,55,"Run, Outdoor Games",1,2,2,1500,15000,"Not at all","No","Good",4
"147","2021-10-20 15:23:32",20,"Male","Moderate",4,2,3,4,"5+",179.832,64,"Walk, Run, Dance, Swim, Outdoor Games",2,1,2,0,30000,"Not at all","No","",5
"148","2021-10-20 15:27:02",20,"Male","Moderate",2,2,2,3,"5",170.688,56,"Walk, Run, Dance, Outdoor Games",1,1,1,2000,30000,"1 Hour","No","All the best!👍",5
"149","2021-10-20 15:29:27",25,"Female","Hardwork",1,2,2,3,"5",161.544,45,"Walk",1,1,1,1500,4000,"Not at all","No","It's good",3
"150","2021-10-20 15:34:22",20,"Female","Moderate",2,3,4,4,"5",152.4,45,"Do not interested",0,2,1,0,40000,"Not at all","Common cold ","Very good initiative to grow awareness among people about their health and nutrition. It will be very helpful to our country. ",3
"151","2021-10-20 15:38:42",20,"Male","Moderate",10,3,10,10,"5",179.832,68,"Walk, Run, Outdoor Games",4,5,4,800,45000,"2 Hour","No ","No this is good , And cover all things",5
"152","2021-10-20 15:46:20",20,"Female","Moderate",3,3,3,4,"5",152.4,46,"Walk",2,2,2,1000,20000,"Not at all","No","",4
"153","2021-10-20 15:49:53",21,"Male","Hardwork",8,10,10,10,"4",179.832,100,"Walk",0,2,5,0,3500,"Not at all","No","",4
"154","2021-10-20 16:03:52",11,"Male","Hardwork",2,3,4,1,"1",140.208,59,"Others",3,0,3,2500,10000,"Not at all","Golblader stone","Good",3
"155","2021-10-20 16:09:56",26,"Male","Moderate",10,5,10,10,"5",170.688,56,"Walk",4,5,4,5000,20000,"Not at all","No","",3
"156","2021-10-20 16:14:17",24,"Male","Moderate",9,2,9,9,"5+",155.448,78,"Walk, Run, Outdoor Games",1,4,0,500,50000,"Not at all","No","Best of luck for your survey . Especially thanks for choosing such a beautiful and important topic .",4
"157","2021-10-20 16:17:57",24,"Male","Moderate",6,3,6,5,"4",173.736,76,"Walk, Run, Others",1,2,0,5000,25000,"Not at all","NO","",4
"158","2021-10-20 16:29:20",23,"Female","Moderate",1,3,1,2,"4",158.496,51,"Others",1,1,0,0,40000,"Not at all","No","",4
"159","2021-10-20 16:43:42",24,"Male","Moderate",3,4,4,4,"4",179.832,74,"Walk, Others",1,3,0,0,70000,"Not at all","No","",4
"160","2021-10-20 16:44:39",24,"Female","Moderate",2,2,4,3,"5",152.4,42,"Dance",1,3,0,2500,40000,"Not at all","No","",4
"161","2021-10-20 16:51:57",23,"Male","Moderate",3,2,4,3,"4",170.688,60,"Weight lifting",1,3,0,100,80000,"Not at all","No","",4
"162","2021-10-20 17:17:00",23,"Female","Moderate",3,2,3,3,"5",155.448,52,"Others",1,2,1,500,25000,"Not at all","No","",3
"163","2021-10-20 17:22:02",22,"Male","Moderate",3,1,4,4,"2",155.7528,75,"Run, Outdoor Games",2,3,1,2000,30000,"Not at all","0","",5
"164","2021-10-20 17:26:59",23,"Male","Moderate",2,4,4,4,"4",170.688,73,"Others",0,2,0,400,30000,"1 Hour","No","",5
"165","2021-10-20 17:33:34",24,"Male","Moderate",2,3,3,4,"4",161.544,59,"Run, Others",1,2,2,500,30000,"Not at all","No","",3
"166","2021-10-20 17:49:15",23,"Male","Moderate",4,5,10,8,"5",158.496,48,"Walk",0,2,2,0,40000,"Not at all","Yes, IBS","",4
"167","2021-10-20 17:51:59",25,"Male","Moderate",9,7,8,9,"4",179.832,59,"Outdoor Games, Others",1,2,1,1000,5000,"Not at all","No","",4
"168","2021-10-20 18:19:11",20,"Male","Moderate",2,3,2,3,"5+",173.736,62,"Run",1,1,1,0,80000,"Not at all","No","The level of questions is very good",5
"169","2021-10-20 18:19:23",19,"Male","Moderate",4,10,6,8,"2",167.64,58,"Walk",4,1,1,0,4000,"Not at all","Allergies and cold","Ok",4
"170","2021-10-20 18:36:03",24,"Female","Hardwork",9,5,10,10,"4",161.544,49,"Walk, Dance, Others",3,5,3,2000,15000,"Not at all","No","According to me there is a need of a option for writing.",4
"171","2021-10-20 19:14:24",22,"Male","Moderate",6,5,10,10,"5",167.64,49,"Walk, Run, Swim, Others",1,4,1,400,4000,"Not at all",NA,"",5
"172","2021-10-20 19:18:21",22,"Male","Moderate",10,6,10,10,"5+",173.736,58,"Walk",1,3,1,1000,9000,"1 Hour","No","",5
"173","2021-10-20 19:22:57",23,"Male","Hardwork",8,4,8,6,"4",170.688,60,"Outdoor Games",0,5,1,3000,36000,"Not at all","No","No comments",5
"174","2021-10-20 19:53:38",20,"Female","Hardwork",3,4,3,3,"5+",164.592,51,"Walk, Run, Dance, Swim, Outdoor Games",1,2,2,0,40000,"Not at all","No","",3
"175","2021-10-20 20:21:07",21,"Female","Moderate",8,5,9,10,"5",164.592,46,"Walk, Dance, Outdoor Games, Others",1,3,1,1000,20000,"Not at all","No","Very good",5
"176","2021-10-20 20:53:23",20,"Male","Hardwork",4,2,3,2,"5",182.88,65,"Walk, Swim, Others",1,1,2,0,12000,"1 Hour","No","Plase provide gym in free of cost...",4
"177","2021-10-20 21:39:14",23,"Female","Moderate",4,3,4,4,"5",155.448,48,"Walk, Run, Dance",3,3,1,2000,40000,"Not at all","No","",4
"178","2021-10-20 21:53:49",25,"Male","Moderate",3,4,4,4,"4",155.448,61,"Walk, Others",1,2,0,500,30000,"Not at all","No","",4
"179","2021-10-20 22:16:48",22,"Female","Moderate",5,6,5,5,"5",158.496,46,"Walk, Not interested",0,2,1,0,15000,"Not at all","No","",5
"180","2021-10-20 22:40:52",25,"Male","Moderate",8,3,10,9,"5",161,70,"Walk, Outdoor Games, Others",1,3,0,500,50000,"Not at all","No","",4
"181","2021-10-20 22:43:02",30,"Male","Moderate",3,2,3,4,"4",173.736,71,"Walk, Weight lifting",1,3,0,0,1,"1 Hour","No","",4
"182","2021-10-20 22:48:11",26,"Male","Moderate",7,1,1,10,"4",176.784,64,"Walk",1,2,1,0,25000,"Not at all","No","",5
"183","2021-10-20 23:11:42",18,"Female","Moderate",3,4,3,3,"5",158.496,45,"Not interested",0,3,0,0,100000,"Not at all","No","",4
"184","2021-10-21 01:18:23",19,"Male","Moderate",9,7,8,8,"4",170.688,68,"Walk, Run, Outdoor Games, Others",4,5,3,500,10000,"1 Hour","No","Great",5
"185","2021-10-21 07:40:38",24,"Female","Moderate",8,6,9,9,"5",161.544,52,"Walk, Dance, Others",3,5,1,0,20000,"Not at all","No","",4
"186","2021-10-21 07:53:59",23,"Male","Moderate",1,2,1,1,"4",173.736,77,"Run",0,1,0,1200,6500,"1 Hour","No","",2
"187","2021-10-21 08:09:26",21,"Male","Sedentary",1,3,2,1,"2",170.688,69,"Not interested",0,1,2,0,700000,"Not at all","Tooth decaying","",4
"188","2021-10-21 08:30:38",21,"Male","Moderate",6,8,10,9,"5",160.02,50.3,"Walk",0,2,0,0,9000,"Not at all","No","",5
"189","2021-10-21 08:31:39",26,"Male","Moderate",3,1,2,3,"4",167.64,72,"Weight lifting, Others",1,2,0,200,20000,"2 Hour","Dust Allergy","",4
"190","2021-10-21 08:50:57",22,"Male","Hardwork",9,4,8,10,"4",173.736,75,"Others",3,5,2,1200,30000,"1 Hour","No","Nothing",5
"191","2021-10-21 09:46:34",22,"Male","Moderate",3,4,4,3,"5",173.736,70,"Walk, Run, Weight lifting",1,3,1,800,1500000,"2 Hour","No","",3
"192","2021-10-21 10:02:50",22,"Male","Moderate",6,3,10,8,"3",173.736,75,"Run, Outdoor Games",0,2,0,0,10000,"Not at all","IBS ","",4
"193","2021-10-21 10:07:33",22,"Male","Moderate",1,5,6,1,"5",179.832,64,"Walk, Swim",1,3,0,0,60000,"Not at all","No","Answering height should be on centemetre it's more reliable and accurate. Also in feet it requires to write as 5'9"" which i can't in your form so i had to write 5.9 which doesn't mean 5feet9inches at all.",3
"194","2021-10-21 10:25:13",23,"Male","Moderate",3,2,4,4,"2",155.7528,88,"Walk",2,2,1,200,18000,"Not at all","High blood pressure","",5
"196","2021-10-21 12:27:41",22,"Female","Moderate",2,4,3,3,"5",155.448,66,"Walk",1,3,0,1000,30000,"2 Hour","Headache","",5
"197","2021-10-21 12:35:53",25,"Male","Sedentary",3,8,6,5,"4",188.976,110,"Walk",1,1,1,300,60000,"1 Hour","No","",3
"198","2021-10-21 12:40:30",23,"Male","Hardwork",9,3,7,8,"3",179.832,73,"Walk, Run, Weight lifting, Outdoor Games",2,2,1,0,20000,"Not at all","No","",4
"199","2021-10-21 13:03:39",21,"Male","Hardwork",2,1,2,3,"4",170.688,56,"Walk, Swim",1,2,0,100,15000,"Not at all","No","",4
"200","2021-10-21 13:35:20",22,"Male","Hardwork",2,4,1,1,"5+",170.688,56,"Outdoor Games",3,3,1,200,300000,"Not at all","No","",4
"201","2021-10-21 14:05:12",30,"Female","Moderate",3,3,3,3,"5",152.4,93,"Walk, Dance, Weight lifting",2,2,1,500,2000,"2 Hour","Rice","No",2
"202","2021-10-21 15:37:13",42,"Male","Moderate",9,2,5,9,"5",182.88,99,"Walk, Others",1,3,0,1500,100000,"Not at all","High Cholesterol","",3
"203","2021-10-21 15:43:06",23,"Male","Hardwork",9,7,10,8,"5",155.448,78,"Walk, Outdoor Games, Others",4,4,4,1000,35000,"Not at all","No","Good",4
"204","2021-10-23 20:09:06",19,"Male","Hardwork",10,7,10,10,"5",155.448,72,"Run, Swim, Outdoor Games",1,2,3,0,20000,"Not at all","no","",5
"205","2021-10-24 15:00:24",26,"Male","Moderate",7,5,7,10,"5",155.448,92,"Outdoor Games",2,4,3,1000,15000,"Not at all","No","Field study is a important to any research work...it's a process of data collection in a right way...keep it up...good job...",5
"206","2021-10-25 15:33:06",24,"Female","Moderate",2,3,1,1,"4",170.688,57,"Walk",1,1,1,1500,15000,"Not at all","No","",4
"207","2021-10-29 11:32:22",23,"Male","Moderate",6,5,8,6,"5",192.024,92,"Walk, Run",2,3,2,500,15000,"Not at all","No","",4
"208","2021-10-29 11:33:31",31,"Male","Sedentary",10,5,2,2,"4",164.592,50,"Not interested",0,2,0,0,18000,"Not at all","No","Good",3
"209","2021-10-29 11:34:03",27,"Male","Hardwork",2,1,3,2,"4",182.88,61.5,"Outdoor Games",1,2,0,1000,65000,"Not at all","No","Good question available here",3
"210","2021-10-29 11:45:54",23,"Male","Hardwork",7,2,10,10,"5",164.592,55,"Walk",1,3,0,500,5000,"1 Hour","No","",4
"211","2021-10-29 11:50:48",19,"Female","Moderate",7,5,9,8,"4",158.496,48,"Others",3,5,2,0,10000,"Not at all","No","Ok ",4
"212","2021-10-29 11:57:06",24,"Male","Moderate",3,3,3,3,"4",179.832,63,"Walk",1,2,0,0,35000,"Not at all","No","",4
"213","2021-10-29 11:57:07",20,"Male","Sedentary",3,3,3,2,"5",182,60,"Outdoor Games",0,0,1,2000,600000,"Not at all","No","",3
"214","2021-10-29 11:57:43",20,"Male","Hardwork",4,1,1,1,"4",173.736,70,"Walk, Run, Weight lifting, Others",0,0,0,1000,36000,"1 Hour","Yes in runny nose👃💦","",5
"215","2021-10-29 11:58:12",19,"Male","Moderate",6,6,8,5,"4",173.736,62,"Outdoor Games",1,4,3,0,40000,"Not at all","cold and cough ","",4
"216","2021-10-29 11:58:24",20,"Male","Moderate",1,1,2,1,"4",170.688,85,"Run",1,2,1,2000,25000,"Not at all","Yes.Masturbation problem and general weakness ","",5
"217","2021-10-29 11:58:43",21,"Male","Hardwork",4,2,4,4,"5",176.784,75,"Walk, Run",2,2,2,0,48000,"Not at all","Gas","",4
"218","2021-10-29 12:02:07",18,"Female","Hardwork",4,3,4,4,"4",161.544,50,"Others",3,0,2,0,100000,"Not at all","No","",3
"219","2021-10-29 12:04:10",22,"Male","Moderate",4,4,4,4,"5",173.736,74,"Outdoor Games",1,4,0,0,50000,"Not at all","No","",4
"220","2021-10-29 12:05:30",20,"Female","Hardwork",3,3,4,4,"4",161.544,61,"Walk, Others",1,3,1,0,30000,"Not at all","No","I would really appreciate if you could share your project with us after finishing your analysis and inferences.",5
"221","2021-10-29 12:09:32",20,"Male","Moderate",3,3,4,4,"4",170.688,70,"Walk, Run, Outdoor Games",2,3,2,200,50000,"Not at all","No","Good",5
"222","2021-10-29 12:21:14",18,"Male","Moderate",3,3,3,3,"2",164.592,54,"Walk, Run, Outdoor Games, Others",1,2,1,500,30000,"Not at all","No","",4
"223","2021-10-29 12:27:48",18,"Male","Sedentary",8,10,7,6,"5",167.64,65,"Walk, Run, Outdoor Games, Others",2,3,1,5000,550000,"Not at all","No","",5
"225","2021-10-29 12:31:50",23,"Male","Moderate",6,7,8,9,"5",176.784,80,"Walk",2,3,4,300,30000,"Not at all","Yes, cold","",4
"226","2021-10-29 12:32:20",17,"Female","Moderate",6,6,10,10,"5",161.544,64,"Dance, Others",1,2,0,500,60000,"Not at all","Ovary sist","",4
"227","2021-10-29 12:34:51",17,"Female","Hardwork",7,6,7,10,"5+",158.496,36,"Dance",5,5,5,2500,35500,"Not at all","10","Good",5
"228","2021-10-29 12:36:48",29,"Female","Hardwork",8,6,10,8,"5",158.496,50,"Others",2,5,2,500,30000,"Not at all","No","",4
"229","2021-10-29 12:37:37",21,"Female","Moderate",1,3,1,3,"2",167.64,67,"Walk, Others",1,2,0,0,300000,"Not at all","Painful periods, digestion","",5
"230","2021-10-29 12:49:09",18,"Male","Moderate",7,8,3,5,"4",167.64,57,"Walk",2,2,4,1500,20000,"Not at all","No","",5
"231","2021-10-29 12:53:05",20,"Female","Moderate",6,7,3,8,"3",164.592,39,"Walk",0,2,0,0,50000,"Not at all","No","",5
"232","2021-10-29 12:53:22",17,"Female","Hardwork",10,10,10,10,"5",152.4,40,"Walk",5,5,5,1000,21000,"1 Hour","No","",5
"233","2021-10-29 12:53:23",24,"Male","Moderate",3,2,3,2,"4",167.64,64,"Walk",0,2,0,0,5600,"Not at all","No","",5
"234","2021-10-29 13:00:06",18,"Male","Moderate",7,1,9,10,"5",179.832,58,"Weight lifting, Outdoor Games",1,2,0,4000,70000,"1 Hour","No","It's awesome",5
"235","2021-10-29 13:13:19",20,"Female","Hardwork",7,9,9,10,"4",170.688,58,"Walk",2,3,0,1000,1560000,"1 Hour","Headache","",4
"236","2021-10-29 13:21:48",27,"Male","Moderate",6,5,7,3,"5",173.736,65,"Walk, Others",0,4,3,0,240000,"Not at all","Allergy ","",4
"237","2021-10-29 13:39:18",17,"Female","Hardwork",5,10,9,10,"4",167.64,59,"Walk, Dance",1,2,1,1000,100000,"Not at all","No","",5
"238","2021-10-29 14:01:19",21,"Male","Hardwork",4,2,4,3,"4",176.784,67,"Walk, Run, Outdoor Games, Others",1,3,3,100,40000,"Not at all","No","",3
"239","2021-10-29 14:02:08",30,"Male","Moderate",3,2,3,4,"5",176.784,82,"Run, Outdoor Games, Others",1,2,1,1200,50000,"Not at all","Cold, dust allergic ","Very good ",4
"240","2021-10-29 17:22:09",19,"Female","Hardwork",4,4,4,4,"4",161.544,55,"Run",3,3,3,4000,80000,"2 Hour","No","Nothing",5
"241","2021-10-29 19:40:14",19,"Male","Moderate",3,2,4,3,"4",155.448,66,"Walk, Run, Others",2,2,1,1000,50000,"Not at all","No","",3
"242","2021-10-29 21:18:38",20,"Male","Moderate",6,10,7,6,"5",177,64,"Walk, Run, Outdoor Games, Others",0,1,1,0,21000,"Not at all","Nope","",5
"243","2021-10-29 21:28:43",22,"Male","Moderate",3,1,3,3,"5+",155.448,64,"Others",1,1,1,450,60000,"Not at all","No","",4
"244","2021-10-29 21:30:40",19,"Female","Hardwork",10,5,10,10,"5+",152.4,50,"Others",2,5,1,3000,30000,"Not at all","No","Thank you",5
"245","2021-10-29 22:06:39",20,"Female","Moderate",3,4,4,4,"5+",149.352,45,"Dance",1,2,1,700,30000,"2 Hour","No","",4
"246","2021-10-29 22:42:33",18,"Male","Hardwork",3,3,3,3,"5",167.64,49,"Walk, Run, Outdoor Games",2,2,2,1000,200000,"Not at all","No","Ok",3
"247","2021-10-29 23:06:06",19,"Female","Hardwork",1,2,4,2,"5+",161.544,41,"Walk",2,2,1,0,600000,"Not at all","Anxiety, Depression","Try asking something about time of meal, will it help in your survey?!",5
"248","2021-10-29 23:48:05",20,"Male","Moderate",9,6,10,8,"5",170.688,52,"Walk",1,3,0,1500,60000,"Not at all","No",".",4
"249","2021-10-30 06:36:45",55,"Female","Sedentary",2,2,3,3,"4",167.64,64,"Walk, Others",1,2,0,500,15000,"Not at all","No","",3
"250","2021-10-30 15:51:53",23,"Male","Hardwork",7,1,2,4,"4",182.88,78,"Walk, Run, Dance, Others",0,2,0,4000,30000,"Not at all","No","Work preference question is not clear to me. I consider it how hard working are you and answered.",4
"251","2021-10-31 15:59:53",23,"Male","Moderate",4,3,3,4,"1",173.736,54,"Walk, Run, Swim",2,2,1,3000,20000,"1 Hour","No","",4
"252","2021-10-31 16:12:34",23,"Male","Moderate",4,4,4,4,"4",164.592,65,"Others",2,1,2,0,12000,"Not at all","No","Nice project 👍",5
"253","2021-11-01 10:52:26",30,"Male","Moderate",2,2,2,2,"3",176.784,70,"Walk",0,1,0,0,30000,"Not at all","Allergies","Ok",2
"254","2021-11-02 07:36:14",19,"Male","Moderate",6,1,7,7,"3",167.64,78,"Run, Others",1,2,2,0,30000,"Not at all","No","",5
"255","2021-11-06 10:46:54",21,"Female","Hardwork",2,1,2,2,"5+",167.64,56,"Walk",1,1,1,20,35500,"Not at all","asidity","",5
"256","2021-11-06 14:14:40",21,"Female","Hardwork",1,1,1,1,"5+",167.64,56,"Walk",2,2,0,20,1000,"1 Hour","asidity","good",2
"257","2021-11-06 15:07:09",21,"Female","Hardwork",2,1,3,2,"4",152.4,56,"Others",1,1,2,2000,15000,"Not at all","No","",5
"258","2021-11-06 17:37:15",23,"Female","Moderate",3,2,3,2,"5",155.448,43,"Not interested",2,1,1,2000,50000,"Not at all","Yes, Hypothyroidism","",3
1 ts age sex work phy_ff phy_health phy_bw phy_ex meal height weight exercise fruit veg cook spend income gymtime disease review rate
2 2 2021-10-19 20:05:43 21 Male Moderate 7 8 6 7 5 182.88 76 Others 1 2 1 1000 50000 Not at all No 4
3 3 2021-10-19 20:18:40 23 Male Moderate 6 9 8 8 5 170.688 70 Others 1 2 4 300 6500 1 Hour No 5
4 4 2021-10-20 11:03:47 21 Male Moderate 7 6 7 8 5+ 182.88 77 Walk, Run, Outdoor Games, Others 1 2 2 3000 50000 Not at all No 5
5 5 2021-10-20 11:06:04 23 Male Moderate 6 9 8 8 5 170.688 70 Walk, Weight lifting, Outdoor Games, Others 0 2 3 500 6500 1 Hour No It's really good. 5
6 6 2021-10-20 11:19:54 22 Male Moderate 2 2 2 2 2 170.688 85 Walk, Do not interested 1 1 1 0 100000 Not at all Nahi 5
7 7 2021-10-20 11:44:06 22 Male Moderate 3 4 3 4 5 182.88 101 Run, Outdoor Games 0 3 1 500 35000 Not at all No 5
8 8 2021-10-20 11:44:44 21 Male Moderate 9 9 10 8 5 173.736 60 Run, Outdoor Games 1 2 1 2000 50000 Not at all No 5
9 9 2021-10-20 11:45:38 21 Male Hardwork 2 3 3 2 4 155.448 55 Walk 1 0 1 0 35000 Not at all No 4
10 10 2021-10-20 11:46:49 21 Male Moderate 3 2 3 1 5+ 173.736 69 Walk, Run, Outdoor Games 1 2 1 0 1000000 Not at all No 4
11 11 2021-10-20 11:47:29 22 Male Moderate 6 7 10 10 5 182.88 78 Others 4 4 3 0 200000 Not at all No Welcome 5
12 12 2021-10-20 11:48:16 23 Male Moderate 7 6 6 8 5 176.784 65 Walk, Run, Outdoor Games 4 4 2 1000 30000 Not at all No 3
13 13 2021-10-20 11:48:28 21 Male Moderate 3 4 4 4 4 170.688 80 Others 1 2 0 0 150000 Not at all Cough and cold It's nearly impossible for someone to know their exact family income. So that should have been in groups like Below 10,000 10,000-30,000 30,000-60,000 Above 60,000 3
14 14 2021-10-20 11:50:53 22 Male Moderate 4 6 7 5 5 179.832 58 Walk, Others 1 2 0 500 150000 Not at all Alopecia Totalis 4
15 15 2021-10-20 11:55:03 26 Male Moderate 3 2 4 3 4 158.496 53 Outdoor Games, Others 0 2 1 0 50000 Not at all Cholesterol 4
16 16 2021-10-20 11:57:30 22 Male Moderate 2 2 3 2 5 173.736 72 Walk, Outdoor Games 1 2 1 0 50000 Not at all Hyperthyroidism 4
17 17 2021-10-20 12:01:13 23 Male Moderate 3 2 2 3 4 182.88 72 Run, Outdoor Games, Others 1 2 0 0 10000 Not at all No 4
18 18 2021-10-20 12:04:42 23 Male Moderate 8 6 7 4 5 125.2728 48 Walk, Outdoor Games 1 3 1 0 5000 Not at all No 4
19 19 2021-10-20 12:07:32 22 Female Moderate 1 4 4 1 5+ 161.544 72 Walk, Others 1 2 3 0 50000 Not at all No You should've given income range. 3
20 20 2021-10-20 12:07:40 24 Male Moderate 7 5 10 10 4 167.64 82 Walk, Run, Weight lifting, Outdoor Games 1 3 0 2000 32000 1 Hour Asthma Questionnaire forming is good 4
21 21 2021-10-20 12:15:47 22 Male Sedentary 4 7 7 5 4 176.784 63 Walk, Outdoor Games, Others 1 2 3 0 10000 Not at all Nope 3
22 22 2021-10-20 12:20:34 22 Male Moderate 4 5 1 3 5 176.784 71 Walk, Run, Swim 1 2 5 0 26000 Not at all Stomach problems, basically Digestion problem. much good 5
23 23 2021-10-20 12:25:37 21 Male Hardwork 3 4 4 3 5 179.832 74 Walk, Others 1 2 0 500 30000 Not at all Lower back pain Home exercise should be considered in the fitness. Training option 4
24 24 2021-10-20 12:30:44 22 Male Moderate 7 10 10 10 4 173.736 78 Walk, Run, Weight lifting 0 2 1 1000 80000 2 Hour Common cold Go ahead w your project . All the best . 4
25 25 2021-10-20 13:00:29 20 Male Moderate 7 5 7 6 5 173.736 60 Walk, Run, Outdoor Games 1 3 3 0 40000 Not at all no 4
26 26 2021-10-20 13:01:48 23 Male Moderate 7 6 9 9 4 188.976 78 Run, Weight lifting 1 2 1 500 85000 1 Hour No 5
27 27 2021-10-20 13:08:54 20 Male Moderate 5 4 8 5 4 155.448 55 Walk, Others 1 1 0 0 22500 Not at all Hyperthyroidism 5
28 28 2021-10-20 13:17:33 23 Male Moderate 8 3 8 7 4 176.784 70 Walk, Run, Weight lifting 1 2 1 1000 5000 Not at all No 5
29 29 2021-10-20 14:27:29 22 Male Hardwork 2 2 3 3 5 155.7528 64 Walk, Run, Swim 2 2 2 500 15000 Not at all Appendicitis 5
30 30 2021-10-20 14:50:44 20 Male Moderate 8 8 6 9 5 155.7528 95 Walk 1 2 0 0 35000 Not at all No 5
31 31 2021-10-20 14:53:47 21 Male Moderate 3 3 3 2 2 179.832 65 Walk, Outdoor Games 1 2 1 0 0 Not at all No 3
32 32 2021-10-20 14:56:35 21 Male Moderate 2 5 6 2 5 164.592 52 Walk, Run, Swim 4 4 0 500 40000 Not at all Cold It's a good experience 5
33 33 2021-10-20 15:02:49 21 Male Moderate 3 4 1 3 2 179.832 95 Walk 1 1 2 0 80000 Not at all No 5
34 34 2021-10-20 15:34:32 28 Female Hardwork 3 3 4 4 4 121.92 40 Walk, Run, Others 3 3 0 10000 50000 More than 3 hours Migraine Thanks 5
35 35 2021-10-20 15:36:13 24 Male Sedentary 3 3 2 2 5 158.496 50 Run, Outdoor Games 1 2 1 100 30000 Not at all No 4
36 36 2021-10-20 15:41:09 22 Male Moderate 4 6 8 4 4 164.592 88 Do not interested 0 3 1 300 16000 Not at all No I would specify there should be an others section in gender . 4
37 37 2021-10-20 15:54:06 22 Female Moderate 2 4 3 2 5 164.592 65 Dance 2 2 1 0 35000 Not at all No Interesting 4
38 38 2021-10-20 17:17:30 18 Female Hardwork 3 2 4 3 5 170.688 56 Walk, Others 2 3 0 5000 2200000 1 Hour No 5
39 39 2021-10-20 19:16:57 23 Male Moderate 6 9 10 10 4 179.832 84 Weight lifting, Outdoor Games 0 1 2 2000 30000 2 Hour No Good 4
40 40 2021-10-20 21:04:27 22 Male Moderate 3 3 4 3 5 173.736 90 Run 3 3 1 500 40000 Not at all No 5
41 41 2021-10-21 09:01:29 21 Male Moderate 5 7 6 4 5+ 155.448 95 Do not interested 0 4 2 0 35500 Not at all Weak Stomach 4
42 42 2021-10-21 09:20:34 20 Male Moderate 9 8 10 10 5+ 155.7528 70 Walk, Outdoor Games, Others 3 4 4 200 5000 Not at all No 5
43 43 2021-10-21 11:24:37 21 Female Sedentary 5 5 5 5 5 161.544 49 Dance 2 4 3 100 20000 Not at all No No 5
44 44 2021-10-23 18:45:38 21 Male Moderate 7 5 6 7 4 182.88 72 Outdoor Games 0 3 3 0 11000 Not at all no 5
45 45 2021-10-23 23:17:57 22 Male Moderate 7 2 9 10 5+ 173.736 60 Walk, Run, Outdoor Games, Others 2 3 1 3000 27000 1 Hour No 5
46 46 2021-10-20 11:25:38 24 Male Moderate 2 2 3 3 4 173.736 73 Others 0 1 1 1000 12000 Not at all No 4
47 47 2021-10-20 11:26:15 24 Female Moderate 2 3 2 3 5 164.592 69 Walk, Dance, Outdoor Games, Others 0 2 1 0 10000 Not at all No 3
48 48 2021-10-20 11:30:49 20 Male Hardwork 5 10 10 10 5 167.64 70 Walk, Run 3 5 5 0 30000 Not at all No Very goodod survey it was. 5
49 49 2021-10-20 11:39:07 20 Male Moderate 3 3 2 1 5+ 155.7528 85 Walk, Weight lifting 1 2 1 200 90000 Not at all No ✌🏻 5
50 50 2021-10-20 11:39:28 20 Male Moderate 2 2 4 4 4 170.688 58 Walk, Weight lifting, Others 1 2 0 0 10000 1 Hour Sinusitis 3
51 51 2021-10-20 11:39:41 21 Male Hardwork 9 8 9 9 3 182.88 85 Walk, Outdoor Games 4 4 4 2000 0 2 Hour No Thanks 5
52 52 2021-10-20 11:41:26 25 Male Moderate 5 6 7 6 5 164.592 55 Walk, Run 1 2 1 200 20000 Not at all No 2
53 53 2021-10-20 11:43:39 23 Male Moderate 2 3 4 4 5 164.592 48 Walk, Outdoor Games 0 1 2 1000 35000 1 Hour Pancriatitis 4
54 54 2021-10-20 11:43:48 21 Male Moderate 10 5 10 8 5 180.4416 90 Walk, Outdoor Games 1 4 2 0 45000 Not at all No 5
55 55 2021-10-20 11:44:26 21 Male Moderate 2 2 3 3 5 182.88 76 Swim 0 1 1 1000 30000 Not at all No 5
56 56 2021-10-20 11:47:21 22 Female Sedentary 6 9 10 6 4 170.688 60 Walk 1 3 1 0 150000 Not at all No 3
57 57 2021-10-20 11:47:38 23 Male Moderate 1 1 2 1 5 167.64 70 Walk 0 1 0 0 10000 Not at all No 3
58 58 2021-10-20 11:48:08 23 Female Moderate 3 1 3 3 5+ 152.4 65 Others 1 3 0 500 50000 1 Hour PCOS 4
59 59 2021-10-20 11:51:59 20 Female Moderate 5 4 3 2 5 155.448 60 Walk 2 3 1 200 7000 Not at all No 4
60 60 2021-10-20 11:52:34 23 Female Hardwork 2 2 2 2 5 152.4 51 Dance 1 1 2 500 20000 Not at all No Eat healthy stay healthy 4
61 61 2021-10-20 11:54:14 26 Male Moderate 3 1 3 3 5 167 63 Walk 0 1 1 0 3600 Not at all No 4
62 62 2021-10-20 11:54:31 18 Female Hardwork 2 2 2 1 4 176.784 57 Walk, Run, Outdoor Games 1 1 1 2300 56000 Not at all No 5
63 63 2021-10-20 11:54:41 56 Male Hardwork 4 4 4 4 5+ 152.4 65 Dance 3 3 3 25000 300000 3 Hours No Yes 5
64 64 2021-10-20 11:57:15 23 Male Sedentary 7 9 9 8 3 185.928 56 Outdoor Games, Others 0 2 2 0 180000 Not at all No Thank you 4
65 65 2021-10-20 11:58:30 24 Female Moderate 7 7 9 8 4 164.592 65 Walk, Outdoor Games, Others 2 2 1 500 20000 Not at all No 4
66 66 2021-10-20 11:59:36 20 Male Moderate 10 10 10 10 2 173.736 71.4 Weight lifting 2 3 1 3000 40000 2 Hour No Pretty good survey 4
67 67 2021-10-20 11:59:42 20 Male Hardwork 2 1 1 3 5 155.7528 65 Swim 1 2 2 2500 100000 3 Hours No 5
68 68 2021-10-20 11:59:49 18 Female Moderate 5 10 5 7 5 164.592 56 Run, Dance, Outdoor Games 1 3 1 500 400000 Not at all No. 4
69 69 2021-10-20 12:00:54 24 Male Moderate 3 2 2 3 5 173.736 69 Walk, Run, Outdoor Games, Others 1 2 1 1500 25000 Not at all no 4
70 70 2021-10-20 12:01:03 20 Female Moderate 3 4 3 2 5 164.592 45 Walk 3 1 3 2500 40000 Not at all Thyroid 5
71 71 2021-10-20 12:03:22 21 Male Hardwork 2 2 4 4 4 170.688 85 Walk 1 2 1 0 5009 Not at all No Excellent 5
72 72 2021-10-20 12:06:20 21 Male Hardwork 4 2 4 4 4 182.88 80 Run, Swim, Outdoor Games 1 3 3 1000 50000 Not at all no 5
73 73 2021-10-20 12:06:21 23 Female Moderate 9 6 8 8 5 152.4 54 Walk, Dance, Others 1 3 1 700 50000 Not at all No 4
74 74 2021-10-20 12:06:58 22 Male Moderate 7 8 10 9 4 167.64 56 Walk, Others 1 4 1 300 8000 Not at all Yes, gastric 4
75 75 2021-10-20 12:08:13 20 Female Moderate 2 1 2 3 4 146.304 50 Walk 1 2 0 500 5000 Not at all PCOD 5
76 76 2021-10-20 12:08:20 23 Male Moderate 6 3 10 10 5+ 176.784 70 Walk, Weight lifting 1 2 1 3000 35000 1 Hour Crohn's disease 4
77 77 2021-10-20 12:09:33 24 Female Moderate 7 10 10 9 5 161.544 38 Others 1 3 1 3000 14000 Not at all No Interesting survey.. 5
78 78 2021-10-20 12:10:50 21 Female Moderate 7 4 9 6 4 170.688 58 Walk 0 2 1 0 40000 Not at all No 4
79 79 2021-10-20 12:11:02 21 Female Sedentary 2 3 3 3 5+ 161.544 49 Others 2 1 1 1000 70000 Not at all Anemia It's ammezing 4
80 80 2021-10-20 12:11:30 23 Female Moderate 7 10 9 8 5 161.544 70 Walk 1 2 1 2500 35000 2 Hour Diabetes 5
81 81 2021-10-20 12:11:46 25 Male Hardwork 8 7 9 10 5 176.784 60.2 Walk, Run 4 3 4 250 32500 Not at all Back pain. Please correct some spelling mistakes and write the meaning of each ratings. It will be helpful for the respondents. A little suggestion: try to put more questions about time of exercise, type of exercises, food habits etc. Wishing you a great success in this survey as well as analysis. 3
82 82 2021-10-20 12:11:49 20 Female Sedentary 2 2 2 2 5 161.544 56 Walk 1 1 1 3000 140000 Not at all no 4
83 83 2021-10-20 12:11:54 20 Female Sedentary 2 1 1 3 4 167.64 55 Walk 1 2 2 1500 50000 1 Hour No Good. 5
84 84 2021-10-20 12:13:57 21 Male Moderate 2 2 4 3 5 161.544 70 Walk, Outdoor Games 2 2 1 100 15000 Not at all No 5
85 85 2021-10-20 12:18:59 21 Male Hardwork 3 4 3 2 4 179.832 82 Others 1 1 0 0 40000 Not at all sneezing Nice 4
86 86 2021-10-20 12:19:37 20 Female Sedentary 2 10 10 10 5+ 164.592 65 Walk, Dance 1 2 1 2500 60000 2 Hour No Very good 4
87 87 2021-10-20 12:20:00 24 Female Moderate 7 9 9 8 4 155 45 Walk 3 4 1 200 100000 Not at all Psoriasis 4
88 88 2021-10-20 12:24:37 23 Female Moderate 8 5 6 7 4 173.4312 51 Walk 1 5 2 500 40000 Not at all No 4
89 89 2021-10-20 12:30:14 26 Female Hardwork 3 2 4 4 5 164.592 56 Walk, Others 2 3 1 1000 100000 Not at all No Hope your study completes soon, and come out with good result. 4
90 90 2021-10-20 12:34:50 25 Female Moderate 4 4 4 3 4 167.64 53 Walk 0 2 3 3000 0 Not at all No 3
91 91 2021-10-20 12:35:23 23 Male Moderate 8 2 10 10 5+ 176.784 64 Weight lifting 1 2 2 3000 80000 2 Hour No 4
92 92 2021-10-20 12:40:53 21 Female Moderate 5 4 4 8 5 152.4 51 Others 2 3 4 1000 30000 2 Hour No Very good 5
93 93 2021-10-20 12:45:36 24 Male Hardwork 5 5 5 4 4 173.736 56 Walk 1 1 2 3000 200000 Not at all No ❤️ 5
94 94 2021-10-20 12:50:18 25 Female Moderate 2 3 1 1 4 155.448 49 Walk, Dance 0 1 1 0 400000 Not at all No 5
95 95 2021-10-20 12:53:40 21 Male Moderate 9 1 9 9 3 173.736 70 Outdoor Games 0 4 2 1000 20000 2 Hour No 4
96 96 2021-10-20 12:55:11 20 Female Moderate 7 8 10 7 4 152.4 38 Walk 3 2 5 500 6000 Not at all No 5
97 97 2021-10-20 12:57:19 26 Female Moderate 4 2 4 3 5 161.544 55 Walk, Run 1 3 0 500 90000 1 Hour No 5
98 98 2021-10-20 12:57:36 20 Male Hardwork 4 3 4 4 5 182.88 77 Run, Weight lifting 2 3 3 500 45000 3 Hours No Very good 5
99 99 2021-10-20 12:59:28 30 Male Moderate 9 2 7 7 4 170.688 60 Walk 1 4 1 0 35000 Not at all No Na 4
100 100 2021-10-20 12:59:29 24 Female Hardwork 3 4 3 3 5 152.4 46 Walk, Dance, Swim 1 2 1 100 30000 Not at all Ni Good initiative 5
101 101 2021-10-20 13:04:42 23 Female Moderate 1 2 1 3 4 149.352 57 Do not interested 0 1 0 0 35000 Not at all No Great questionnaire 5
102 102 2021-10-20 13:05:49 22 Male Hardwork 4 3 4 4 5 176.784 69 Run, Weight lifting, Outdoor Games 1 3 1 1500 500000 1 Hour NO 5
103 103 2021-10-20 13:05:51 24 Male Moderate 2 1 2 2 5 176.784 70 Run 0 1 0 0 10000 Not at all No 5
104 104 2021-10-20 13:10:16 22 Male Moderate 2 3 3 1 5+ 173.736 60 Walk 2 2 2 500 7000 Not at all Cough and cold 5
105 105 2021-10-20 13:19:48 20 Female Moderate 6 10 7 5 5 176.784 57 Walk, Run, Dance 3 5 1 0 15000 Not at all No 3
106 106 2021-10-20 13:20:27 22 Male Moderate 1 4 3 3 5 173.736 65 Walk 1 2 1 0 50000 Not at all No 3
107 107 2021-10-20 13:21:21 24 Female Moderate 2 4 4 4 4 167.64 58 Walk 3 3 1 1000 80000 Not at all No 5
108 108 2021-10-20 13:26:17 22 Female Moderate 7 9 10 8 5 152.4 48 Do not interested 0 2 4 1000 20000 Not at all No 5
109 109 2021-10-20 13:26:48 24 Male Hardwork 8 8 10 2 4 172.5168 47 Walk 0 3 4 0 8000 Not at all No Good 4
110 110 2021-10-20 13:27:14 26 Male Moderate 4 2 3 3 5 176.784 72 Walk, Run, Outdoor Games, Others 3 3 1 100 25000 Not at all No 4
111 111 2021-10-20 13:29:56 20 Male Moderate 2 3 2 1 2 152.4 58 Walk, Run 0 1 2 400 10000 Not at all Vomiting It's a very good idea for health concern 5
112 112 2021-10-20 13:37:24 20 Male Moderate 2 3 3 3 3 167.64 67.5 Walk, Run 0 1 1 0 3000 Not at all No Yeah this is a good thing to do anylysis data 3
113 113 2021-10-20 13:37:24 23 Female Moderate 2 3 2 1 5 161.544 62 Do not interested 0 1 2 500 15000 Not at all No 5
114 114 2021-10-20 13:38:30 25 Male Moderate 2 1 3 2 3 173.736 65 Walk 1 2 0 3000 60000 Not at all No 4
115 115 2021-10-20 13:43:32 21 Male Moderate 7 9 9 10 5+ 173.736 68 Swim, Outdoor Games 3 4 3 500 7000 Not at all No 4
116 116 2021-10-20 13:49:25 19 Female Hardwork 3 2 4 3 5+ 161.544 57 Walk 1 2 1 500 25000 Not at all No 4
117 117 2021-10-20 13:59:13 24 Female Moderate 7 9 9 10 4 164.592 65 Dance 2 4 1 0 25000 Not at all No 4
118 118 2021-10-20 14:03:25 20 Female Moderate 3 4 2 4 5 161.544 65 Walk 1 2 3 7000 30000 2 Hour No 5
119 119 2021-10-20 14:05:11 23 Female Sedentary 3 2 1 1 4 170.688 53 Weight lifting, Outdoor Games 0 0 0 0 10000 Not at all No 5
120 120 2021-10-20 14:13:12 22 Female Moderate 2 4 3 4 4 152.4 43 Others 0 1 0 200 10000 Not at all Spondilytis, dry eyes, pcos 4
121 121 2021-10-20 14:15:36 20 Female Moderate 4 2 4 4 5+ 158.496 50 Others 1 1 2 0 65000 Not at all No 4
122 122 2021-10-20 14:17:26 20 Female Moderate 2 4 3 3 4 164.592 50 Dance 1 1 2 1000 5000 Not at all No 4
123 123 2021-10-20 14:19:59 20 Male Moderate 5 6 8 7 5+ 167.64 75 Walk 2 4 5 0 17000 Not at all No 4
124 124 2021-10-20 14:20:30 19 Female Moderate 6 10 10 7 5 124.968 36 Walk, Run, Outdoor Games 4 4 4 1600 4000 Not at all No Necessary survey 4
125 125 2021-10-20 14:21:06 19 Female Moderate 9 1 7 8 5 161.544 43.5 Walk, Run 1 2 0 2000 45000 1 Hour No 4
126 126 2021-10-20 14:21:55 26 Male Moderate 7 8 6 8 4 173.736 65 Walk 0 2 1 1000 50000 Not at all No 4
127 127 2021-10-20 14:22:58 19 Male Hardwork 2 4 1 1 5 179.832 72 Run, Outdoor Games 1 2 1 1000 20000 Not at all No Thank you,your survey form really helps me to evaluate my daily meals and it also helps me to put emphasis on how I utilize them in exercise and outdoor games... 5
128 128 2021-10-20 14:31:29 19 Male Hardwork 8 10 10 10 4 161.544 85 Outdoor Games 1 5 0 0 70000 Not at all Yes ( Cough ) 5
129 129 2021-10-20 14:35:18 20 Female Moderate 1 3 2 2 5 155.448 55 Walk 1 1 1 0 10000 Not at all No. Very good thing that you are doing, carry on. 5
130 130 2021-10-20 14:37:02 26 Female Hardwork 3 3 3 4 3 148 55 Dance 1 2 2 500 30000 Not at all No 3
131 131 2021-10-20 14:37:48 26 Female Moderate 6 10 10 7 5 161.544 73 Walk 1 3 1 1000 46000 Not at all PCOS 4
132 132 2021-10-20 14:39:32 24 Female Hardwork 9 8 9 6 5 161.544 60 Walk 1 4 2 0 30000 Not at all No Great project idea. Please share with me the end results. 5
133 133 2021-10-20 14:39:44 19 Male Hardwork 2 2 3 3 4 170.688 74 Walk, Outdoor Games 1 2 0 0 60000 Not at all Noo 4
134 134 2021-10-20 14:41:36 23 Male Hardwork 3 2 3 2 5 179.832 63 Walk 2 4 1 0 30000 Not at all Allergy 4
135 135 2021-10-20 14:41:50 23 Male Moderate 8 4 2 9 3 161.544 59 Others 3 4 1 0 6000 Not at all No - 4
136 136 2021-10-20 14:44:52 22 Male Hardwork 2 1 1 3 3 164.592 54 Walk, Others 0 2 1 1000 20000 1 Hour No This survey is realy very good. 5
137 137 2021-10-20 14:48:20 20 Male Moderate 2 4 3 3 5 176.784 52 Walk, Outdoor Games, Others 1 2 2 1000 60000 Not at all Allergy 4
138 138 2021-10-20 14:52:01 20 Male Hardwork 5 6 6 10 5 176.784 85 Walk, Run, Weight lifting, Others 5 4 5 4000 50000 2 Hour Allergy 3
139 139 2021-10-20 14:52:34 26 Male Moderate 5 10 10 5 4 167.64 68 Walk 1 1 2 2000 300000 Not at all No 5
140 140 2021-10-20 14:55:26 19 Female Sedentary 4 3 4 4 5 170.688 55 Walk, Others 2 3 0 200 7000 Not at all Yes, common cold disease 5
141 141 2021-10-20 14:55:38 23 Male Sedentary 7 5 5 7 4 152.4 52 Walk, Run, Swim 1 2 1 0 40000 Not at all No That's amazing. 4
142 142 2021-10-20 14:59:25 20 Female Moderate 3 3 1 3 4 161.544 50 Walk, Dance 2 2 1 0 5000 Not at all No 4
143 143 2021-10-20 15:03:55 24 Male Hardwork 10 5 9 10 5 176.784 68 Run, Weight lifting, Others 3 4 4 0 5000 1 Hour N0 4
144 144 2021-10-20 15:11:01 22 Male Hardwork 5 6 9 9 5 155.7528 66 Walk, Run, Weight lifting 1 0 1 4000 32000 1 Hour Yes.Irritable bowel syndrome.(IBS) You should what do you do for mental health.As that's very important now days. 4
145 145 2021-10-20 15:17:51 25 Male Moderate 2 1 2 3 4 179.832 64 Outdoor Games 1 2 1 1500 15000 Not at all No 4
146 146 2021-10-20 15:23:11 21 Male Moderate 3 3 4 4 5 157 55 Run, Outdoor Games 1 2 2 1500 15000 Not at all No Good 4
147 147 2021-10-20 15:23:32 20 Male Moderate 4 2 3 4 5+ 179.832 64 Walk, Run, Dance, Swim, Outdoor Games 2 1 2 0 30000 Not at all No 5
148 148 2021-10-20 15:27:02 20 Male Moderate 2 2 2 3 5 170.688 56 Walk, Run, Dance, Outdoor Games 1 1 1 2000 30000 1 Hour No All the best!👍 5
149 149 2021-10-20 15:29:27 25 Female Hardwork 1 2 2 3 5 161.544 45 Walk 1 1 1 1500 4000 Not at all No It's good 3
150 150 2021-10-20 15:34:22 20 Female Moderate 2 3 4 4 5 152.4 45 Do not interested 0 2 1 0 40000 Not at all Common cold Very good initiative to grow awareness among people about their health and nutrition. It will be very helpful to our country. 3
151 151 2021-10-20 15:38:42 20 Male Moderate 10 3 10 10 5 179.832 68 Walk, Run, Outdoor Games 4 5 4 800 45000 2 Hour No No this is good , And cover all things 5
152 152 2021-10-20 15:46:20 20 Female Moderate 3 3 3 4 5 152.4 46 Walk 2 2 2 1000 20000 Not at all No 4
153 153 2021-10-20 15:49:53 21 Male Hardwork 8 10 10 10 4 179.832 100 Walk 0 2 5 0 3500 Not at all No 4
154 154 2021-10-20 16:03:52 11 Male Hardwork 2 3 4 1 1 140.208 59 Others 3 0 3 2500 10000 Not at all Golblader stone Good 3
155 155 2021-10-20 16:09:56 26 Male Moderate 10 5 10 10 5 170.688 56 Walk 4 5 4 5000 20000 Not at all No 3
156 156 2021-10-20 16:14:17 24 Male Moderate 9 2 9 9 5+ 155.448 78 Walk, Run, Outdoor Games 1 4 0 500 50000 Not at all No Best of luck for your survey . Especially thanks for choosing such a beautiful and important topic . 4
157 157 2021-10-20 16:17:57 24 Male Moderate 6 3 6 5 4 173.736 76 Walk, Run, Others 1 2 0 5000 25000 Not at all NO 4
158 158 2021-10-20 16:29:20 23 Female Moderate 1 3 1 2 4 158.496 51 Others 1 1 0 0 40000 Not at all No 4
159 159 2021-10-20 16:43:42 24 Male Moderate 3 4 4 4 4 179.832 74 Walk, Others 1 3 0 0 70000 Not at all No 4
160 160 2021-10-20 16:44:39 24 Female Moderate 2 2 4 3 5 152.4 42 Dance 1 3 0 2500 40000 Not at all No 4
161 161 2021-10-20 16:51:57 23 Male Moderate 3 2 4 3 4 170.688 60 Weight lifting 1 3 0 100 80000 Not at all No 4
162 162 2021-10-20 17:17:00 23 Female Moderate 3 2 3 3 5 155.448 52 Others 1 2 1 500 25000 Not at all No 3
163 163 2021-10-20 17:22:02 22 Male Moderate 3 1 4 4 2 155.7528 75 Run, Outdoor Games 2 3 1 2000 30000 Not at all 0 5
164 164 2021-10-20 17:26:59 23 Male Moderate 2 4 4 4 4 170.688 73 Others 0 2 0 400 30000 1 Hour No 5
165 165 2021-10-20 17:33:34 24 Male Moderate 2 3 3 4 4 161.544 59 Run, Others 1 2 2 500 30000 Not at all No 3
166 166 2021-10-20 17:49:15 23 Male Moderate 4 5 10 8 5 158.496 48 Walk 0 2 2 0 40000 Not at all Yes, IBS 4
167 167 2021-10-20 17:51:59 25 Male Moderate 9 7 8 9 4 179.832 59 Outdoor Games, Others 1 2 1 1000 5000 Not at all No 4
168 168 2021-10-20 18:19:11 20 Male Moderate 2 3 2 3 5+ 173.736 62 Run 1 1 1 0 80000 Not at all No The level of questions is very good 5
169 169 2021-10-20 18:19:23 19 Male Moderate 4 10 6 8 2 167.64 58 Walk 4 1 1 0 4000 Not at all Allergies and cold Ok 4
170 170 2021-10-20 18:36:03 24 Female Hardwork 9 5 10 10 4 161.544 49 Walk, Dance, Others 3 5 3 2000 15000 Not at all No According to me there is a need of a option for writing. 4
171 171 2021-10-20 19:14:24 22 Male Moderate 6 5 10 10 5 167.64 49 Walk, Run, Swim, Others 1 4 1 400 4000 Not at all NA 5
172 172 2021-10-20 19:18:21 22 Male Moderate 10 6 10 10 5+ 173.736 58 Walk 1 3 1 1000 9000 1 Hour No 5
173 173 2021-10-20 19:22:57 23 Male Hardwork 8 4 8 6 4 170.688 60 Outdoor Games 0 5 1 3000 36000 Not at all No No comments 5
174 174 2021-10-20 19:53:38 20 Female Hardwork 3 4 3 3 5+ 164.592 51 Walk, Run, Dance, Swim, Outdoor Games 1 2 2 0 40000 Not at all No 3
175 175 2021-10-20 20:21:07 21 Female Moderate 8 5 9 10 5 164.592 46 Walk, Dance, Outdoor Games, Others 1 3 1 1000 20000 Not at all No Very good 5
176 176 2021-10-20 20:53:23 20 Male Hardwork 4 2 3 2 5 182.88 65 Walk, Swim, Others 1 1 2 0 12000 1 Hour No Plase provide gym in free of cost... 4
177 177 2021-10-20 21:39:14 23 Female Moderate 4 3 4 4 5 155.448 48 Walk, Run, Dance 3 3 1 2000 40000 Not at all No 4
178 178 2021-10-20 21:53:49 25 Male Moderate 3 4 4 4 4 155.448 61 Walk, Others 1 2 0 500 30000 Not at all No 4
179 179 2021-10-20 22:16:48 22 Female Moderate 5 6 5 5 5 158.496 46 Walk, Not interested 0 2 1 0 15000 Not at all No 5
180 180 2021-10-20 22:40:52 25 Male Moderate 8 3 10 9 5 161 70 Walk, Outdoor Games, Others 1 3 0 500 50000 Not at all No 4
181 181 2021-10-20 22:43:02 30 Male Moderate 3 2 3 4 4 173.736 71 Walk, Weight lifting 1 3 0 0 1 1 Hour No 4
182 182 2021-10-20 22:48:11 26 Male Moderate 7 1 1 10 4 176.784 64 Walk 1 2 1 0 25000 Not at all No 5
183 183 2021-10-20 23:11:42 18 Female Moderate 3 4 3 3 5 158.496 45 Not interested 0 3 0 0 100000 Not at all No 4
184 184 2021-10-21 01:18:23 19 Male Moderate 9 7 8 8 4 170.688 68 Walk, Run, Outdoor Games, Others 4 5 3 500 10000 1 Hour No Great 5
185 185 2021-10-21 07:40:38 24 Female Moderate 8 6 9 9 5 161.544 52 Walk, Dance, Others 3 5 1 0 20000 Not at all No 4
186 186 2021-10-21 07:53:59 23 Male Moderate 1 2 1 1 4 173.736 77 Run 0 1 0 1200 6500 1 Hour No 2
187 187 2021-10-21 08:09:26 21 Male Sedentary 1 3 2 1 2 170.688 69 Not interested 0 1 2 0 700000 Not at all Tooth decaying 4
188 188 2021-10-21 08:30:38 21 Male Moderate 6 8 10 9 5 160.02 50.3 Walk 0 2 0 0 9000 Not at all No 5
189 189 2021-10-21 08:31:39 26 Male Moderate 3 1 2 3 4 167.64 72 Weight lifting, Others 1 2 0 200 20000 2 Hour Dust Allergy 4
190 190 2021-10-21 08:50:57 22 Male Hardwork 9 4 8 10 4 173.736 75 Others 3 5 2 1200 30000 1 Hour No Nothing 5
191 191 2021-10-21 09:46:34 22 Male Moderate 3 4 4 3 5 173.736 70 Walk, Run, Weight lifting 1 3 1 800 1500000 2 Hour No 3
192 192 2021-10-21 10:02:50 22 Male Moderate 6 3 10 8 3 173.736 75 Run, Outdoor Games 0 2 0 0 10000 Not at all IBS 4
193 193 2021-10-21 10:07:33 22 Male Moderate 1 5 6 1 5 179.832 64 Walk, Swim 1 3 0 0 60000 Not at all No Answering height should be on centemetre it's more reliable and accurate. Also in feet it requires to write as 5'9" which i can't in your form so i had to write 5.9 which doesn't mean 5feet9inches at all. 3
194 194 2021-10-21 10:25:13 23 Male Moderate 3 2 4 4 2 155.7528 88 Walk 2 2 1 200 18000 Not at all High blood pressure 5
195 196 2021-10-21 12:27:41 22 Female Moderate 2 4 3 3 5 155.448 66 Walk 1 3 0 1000 30000 2 Hour Headache 5
196 197 2021-10-21 12:35:53 25 Male Sedentary 3 8 6 5 4 188.976 110 Walk 1 1 1 300 60000 1 Hour No 3
197 198 2021-10-21 12:40:30 23 Male Hardwork 9 3 7 8 3 179.832 73 Walk, Run, Weight lifting, Outdoor Games 2 2 1 0 20000 Not at all No 4
198 199 2021-10-21 13:03:39 21 Male Hardwork 2 1 2 3 4 170.688 56 Walk, Swim 1 2 0 100 15000 Not at all No 4
199 200 2021-10-21 13:35:20 22 Male Hardwork 2 4 1 1 5+ 170.688 56 Outdoor Games 3 3 1 200 300000 Not at all No 4
200 201 2021-10-21 14:05:12 30 Female Moderate 3 3 3 3 5 152.4 93 Walk, Dance, Weight lifting 2 2 1 500 2000 2 Hour Rice No 2
201 202 2021-10-21 15:37:13 42 Male Moderate 9 2 5 9 5 182.88 99 Walk, Others 1 3 0 1500 100000 Not at all High Cholesterol 3
202 203 2021-10-21 15:43:06 23 Male Hardwork 9 7 10 8 5 155.448 78 Walk, Outdoor Games, Others 4 4 4 1000 35000 Not at all No Good 4
203 204 2021-10-23 20:09:06 19 Male Hardwork 10 7 10 10 5 155.448 72 Run, Swim, Outdoor Games 1 2 3 0 20000 Not at all no 5
204 205 2021-10-24 15:00:24 26 Male Moderate 7 5 7 10 5 155.448 92 Outdoor Games 2 4 3 1000 15000 Not at all No Field study is a important to any research work...it's a process of data collection in a right way...keep it up...good job... 5
205 206 2021-10-25 15:33:06 24 Female Moderate 2 3 1 1 4 170.688 57 Walk 1 1 1 1500 15000 Not at all No 4
206 207 2021-10-29 11:32:22 23 Male Moderate 6 5 8 6 5 192.024 92 Walk, Run 2 3 2 500 15000 Not at all No 4
207 208 2021-10-29 11:33:31 31 Male Sedentary 10 5 2 2 4 164.592 50 Not interested 0 2 0 0 18000 Not at all No Good 3
208 209 2021-10-29 11:34:03 27 Male Hardwork 2 1 3 2 4 182.88 61.5 Outdoor Games 1 2 0 1000 65000 Not at all No Good question available here 3
209 210 2021-10-29 11:45:54 23 Male Hardwork 7 2 10 10 5 164.592 55 Walk 1 3 0 500 5000 1 Hour No 4
210 211 2021-10-29 11:50:48 19 Female Moderate 7 5 9 8 4 158.496 48 Others 3 5 2 0 10000 Not at all No Ok 4
211 212 2021-10-29 11:57:06 24 Male Moderate 3 3 3 3 4 179.832 63 Walk 1 2 0 0 35000 Not at all No 4
212 213 2021-10-29 11:57:07 20 Male Sedentary 3 3 3 2 5 182 60 Outdoor Games 0 0 1 2000 600000 Not at all No 3
213 214 2021-10-29 11:57:43 20 Male Hardwork 4 1 1 1 4 173.736 70 Walk, Run, Weight lifting, Others 0 0 0 1000 36000 1 Hour Yes in runny nose👃💦 5
214 215 2021-10-29 11:58:12 19 Male Moderate 6 6 8 5 4 173.736 62 Outdoor Games 1 4 3 0 40000 Not at all cold and cough 4
215 216 2021-10-29 11:58:24 20 Male Moderate 1 1 2 1 4 170.688 85 Run 1 2 1 2000 25000 Not at all Yes.Masturbation problem and general weakness 5
216 217 2021-10-29 11:58:43 21 Male Hardwork 4 2 4 4 5 176.784 75 Walk, Run 2 2 2 0 48000 Not at all Gas 4
217 218 2021-10-29 12:02:07 18 Female Hardwork 4 3 4 4 4 161.544 50 Others 3 0 2 0 100000 Not at all No 3
218 219 2021-10-29 12:04:10 22 Male Moderate 4 4 4 4 5 173.736 74 Outdoor Games 1 4 0 0 50000 Not at all No 4
219 220 2021-10-29 12:05:30 20 Female Hardwork 3 3 4 4 4 161.544 61 Walk, Others 1 3 1 0 30000 Not at all No I would really appreciate if you could share your project with us after finishing your analysis and inferences. 5
220 221 2021-10-29 12:09:32 20 Male Moderate 3 3 4 4 4 170.688 70 Walk, Run, Outdoor Games 2 3 2 200 50000 Not at all No Good 5
221 222 2021-10-29 12:21:14 18 Male Moderate 3 3 3 3 2 164.592 54 Walk, Run, Outdoor Games, Others 1 2 1 500 30000 Not at all No 4
222 223 2021-10-29 12:27:48 18 Male Sedentary 8 10 7 6 5 167.64 65 Walk, Run, Outdoor Games, Others 2 3 1 5000 550000 Not at all No 5
223 225 2021-10-29 12:31:50 23 Male Moderate 6 7 8 9 5 176.784 80 Walk 2 3 4 300 30000 Not at all Yes, cold 4
224 226 2021-10-29 12:32:20 17 Female Moderate 6 6 10 10 5 161.544 64 Dance, Others 1 2 0 500 60000 Not at all Ovary sist 4
225 227 2021-10-29 12:34:51 17 Female Hardwork 7 6 7 10 5+ 158.496 36 Dance 5 5 5 2500 35500 Not at all 10 Good 5
226 228 2021-10-29 12:36:48 29 Female Hardwork 8 6 10 8 5 158.496 50 Others 2 5 2 500 30000 Not at all No 4
227 229 2021-10-29 12:37:37 21 Female Moderate 1 3 1 3 2 167.64 67 Walk, Others 1 2 0 0 300000 Not at all Painful periods, digestion 5
228 230 2021-10-29 12:49:09 18 Male Moderate 7 8 3 5 4 167.64 57 Walk 2 2 4 1500 20000 Not at all No 5
229 231 2021-10-29 12:53:05 20 Female Moderate 6 7 3 8 3 164.592 39 Walk 0 2 0 0 50000 Not at all No 5
230 232 2021-10-29 12:53:22 17 Female Hardwork 10 10 10 10 5 152.4 40 Walk 5 5 5 1000 21000 1 Hour No 5
231 233 2021-10-29 12:53:23 24 Male Moderate 3 2 3 2 4 167.64 64 Walk 0 2 0 0 5600 Not at all No 5
232 234 2021-10-29 13:00:06 18 Male Moderate 7 1 9 10 5 179.832 58 Weight lifting, Outdoor Games 1 2 0 4000 70000 1 Hour No It's awesome 5
233 235 2021-10-29 13:13:19 20 Female Hardwork 7 9 9 10 4 170.688 58 Walk 2 3 0 1000 1560000 1 Hour Headache 4
234 236 2021-10-29 13:21:48 27 Male Moderate 6 5 7 3 5 173.736 65 Walk, Others 0 4 3 0 240000 Not at all Allergy 4
235 237 2021-10-29 13:39:18 17 Female Hardwork 5 10 9 10 4 167.64 59 Walk, Dance 1 2 1 1000 100000 Not at all No 5
236 238 2021-10-29 14:01:19 21 Male Hardwork 4 2 4 3 4 176.784 67 Walk, Run, Outdoor Games, Others 1 3 3 100 40000 Not at all No 3
237 239 2021-10-29 14:02:08 30 Male Moderate 3 2 3 4 5 176.784 82 Run, Outdoor Games, Others 1 2 1 1200 50000 Not at all Cold, dust allergic Very good 4
238 240 2021-10-29 17:22:09 19 Female Hardwork 4 4 4 4 4 161.544 55 Run 3 3 3 4000 80000 2 Hour No Nothing 5
239 241 2021-10-29 19:40:14 19 Male Moderate 3 2 4 3 4 155.448 66 Walk, Run, Others 2 2 1 1000 50000 Not at all No 3
240 242 2021-10-29 21:18:38 20 Male Moderate 6 10 7 6 5 177 64 Walk, Run, Outdoor Games, Others 0 1 1 0 21000 Not at all Nope 5
241 243 2021-10-29 21:28:43 22 Male Moderate 3 1 3 3 5+ 155.448 64 Others 1 1 1 450 60000 Not at all No 4
242 244 2021-10-29 21:30:40 19 Female Hardwork 10 5 10 10 5+ 152.4 50 Others 2 5 1 3000 30000 Not at all No Thank you 5
243 245 2021-10-29 22:06:39 20 Female Moderate 3 4 4 4 5+ 149.352 45 Dance 1 2 1 700 30000 2 Hour No 4
244 246 2021-10-29 22:42:33 18 Male Hardwork 3 3 3 3 5 167.64 49 Walk, Run, Outdoor Games 2 2 2 1000 200000 Not at all No Ok 3
245 247 2021-10-29 23:06:06 19 Female Hardwork 1 2 4 2 5+ 161.544 41 Walk 2 2 1 0 600000 Not at all Anxiety, Depression Try asking something about time of meal, will it help in your survey?! 5
246 248 2021-10-29 23:48:05 20 Male Moderate 9 6 10 8 5 170.688 52 Walk 1 3 0 1500 60000 Not at all No . 4
247 249 2021-10-30 06:36:45 55 Female Sedentary 2 2 3 3 4 167.64 64 Walk, Others 1 2 0 500 15000 Not at all No 3
248 250 2021-10-30 15:51:53 23 Male Hardwork 7 1 2 4 4 182.88 78 Walk, Run, Dance, Others 0 2 0 4000 30000 Not at all No Work preference question is not clear to me. I consider it how hard working are you and answered. 4
249 251 2021-10-31 15:59:53 23 Male Moderate 4 3 3 4 1 173.736 54 Walk, Run, Swim 2 2 1 3000 20000 1 Hour No 4
250 252 2021-10-31 16:12:34 23 Male Moderate 4 4 4 4 4 164.592 65 Others 2 1 2 0 12000 Not at all No Nice project 👍 5
251 253 2021-11-01 10:52:26 30 Male Moderate 2 2 2 2 3 176.784 70 Walk 0 1 0 0 30000 Not at all Allergies Ok 2
252 254 2021-11-02 07:36:14 19 Male Moderate 6 1 7 7 3 167.64 78 Run, Others 1 2 2 0 30000 Not at all No 5
253 255 2021-11-06 10:46:54 21 Female Hardwork 2 1 2 2 5+ 167.64 56 Walk 1 1 1 20 35500 Not at all asidity 5
254 256 2021-11-06 14:14:40 21 Female Hardwork 1 1 1 1 5+ 167.64 56 Walk 2 2 0 20 1000 1 Hour asidity good 2
255 257 2021-11-06 15:07:09 21 Female Hardwork 2 1 3 2 4 152.4 56 Others 1 1 2 2000 15000 Not at all No 5
256 258 2021-11-06 17:37:15 23 Female Moderate 3 2 3 2 5 155.448 43 Not interested 2 1 1 2000 50000 Not at all Yes, Hypothyroidism 3

View File

@@ -0,0 +1,47 @@
import pandas
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
#Данные
data = pandas.read_csv('clean_data.csv')
#Приведение строчных значений к числовым
#work
factorized_data_work, unique_values_work = pandas.factorize(data['work'])
data['work'] = factorized_data_work
#gymtime
factorized_data_gymtime, unique_values_gymtime = pandas.factorize(data['gymtime'])
data['gymtime'] = factorized_data_gymtime
#Создание столбца exercise_reg
data['exercise_reg'] = np.where(data['phy_ex'] >= 7, 1, 0)
#Отбор нужных столбцов
x = data[['age', 'weight', 'work', 'phy_health', 'gymtime']]
#Определение целевой переменной
y = data['exercise_reg']
#Получение обучающей и тестовой выборки
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
#Создание и обучение модели
model = DecisionTreeClassifier()
model.fit(x_train, y_train)
#Прогнозирование на тестовом наборе
y_pred = model.predict(x_test)
#Оценка производительности модели
accuracy = accuracy_score(y_test, y_pred)
#Важности признаков
importances_clf = model.feature_importances_
#Вывод результатов
print(x.head(15))
print(importances_clf)
print("Качество модели DecisionTreeClassifier: ", model.score(x_test, y_test))
print("Качество классификации accuracy:", accuracy)

View File

@@ -0,0 +1,59 @@
# Лабораторная работа №1
## ПИбд-42 Машкова Маргарита (Вариант 19)
## Задание
Cгенерировать определенный тип данных и сравнить на нем 3 модели по варианту.
Построить графики, отобразить качество моделей, объяснить полученные результаты.
### Данные:
> make_moons (noise=0.3, random_state=rs)
### Модели:
> - Линейная регрессия
> - Полиномиальная регрессия (со степенью 5)
> - Гребневая полиномиальная регрессия (со степенью 5, alpha= 1.0)
## Запуск программы
Для запуска программы необходимо запустить файл main.py
## Используемые технологии
> **Язык программирования:** python
>
> **Библиотеки:**
> - `matplotlib` - пакет для визуализации данных.
> - `sklearn` - предоставляет широкий спектр инструментов для машинного обучения, статистики и анализа данных.
## Описание работы программы
Изначально генерируются синтетические данные Х и у для проведения экспериментов при помощи функции `make_moons` с заданными параметрами.
Функция `train_test_split` делит данные так, что тестовая выборка составляет 40% от исходного набора данных.
Разделение происходит случайным образом (т.е. элементы берутся из исходной выборки не последовательно).
После чего для каждой из заданных по варианту моделей выполняются следующие действия:
1. Создание модели с заданными параметрами.
2. Обучение модели на исходных данных.
3. Предсказание модели на тестовых данных.
4. Оценка качества модели по 3 метрикам:
* **Коэффициент детерминации**: метрика, которая измеряет, насколько хорошо модель соответствует данным.
Принимает значения от 0 до 1, где 1 означает идеальное соответствие модели данным, а значения ближе к 0 указывают на то,
что модель плохо объясняет вариацию в данных. Для вычисления коэффициента детерминации модели используется метод `score` библиотеки scikit-learn.
* **Средняя абсолютная ошибка (MAE):** Для вычисления данной метрики используется метод `mean_absolute_error` библиотеки scikit-learn.
* **Средняя квадратичная ошибка (MSE):** Для вычисления данной метрики используется метод `mean_squared_error` библиотеки scikit-learn.
Последние 2 метрики измеряют разницу между предсказанными значениями модели и фактическими значениями.
Меньшие значения MAE и MSE указывают на лучшую производительность модели.
Вычисленные значения метрик выводятся в консоль.
После чего строятся графики, отображающие работу моделей. На первом графике отображаются ожидаемые результаты предсказания,
на остальных - предсказания моделей. Чем меньше прозрачных точек - тем лучше отработала модель.
## Тесты
![Вывод в консоли](console.png)
![Графики](plots.png)
**Вывод:** исходя из полученных результатов, схожую хорошую производительность имеют
линейная регрессия и гребневая полиномиальная регрессия.
Самую низкую производительность имеет полиномиальная регрессия.

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

View File

@@ -0,0 +1,85 @@
from random import randrange
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_absolute_error, mean_squared_error
import matplotlib.pyplot as plt
rs = randrange(42)
# Генерация данных
X, y = make_moons(noise=0.3, random_state=rs)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=rs)
# Линейная регрессия
linear = LinearRegression()
# Обучение модели
linear.fit(X_train, y_train)
# Предсказание на тестовых данных
y_pred_linear = linear.predict(X_test)
# Оценка модели на тестовых данных (коэффициент детерминации)
linear_score = linear.score(X_test, y_test)
# Оценка модели на тестовых данных (cредняя абсолютная ошибка)
linear_mae = mean_absolute_error(y_test, y_pred_linear)
# Оценка модели на тестовых данных (cредняя квадратичная ошибка)
linear_mse = mean_squared_error(y_test, y_pred_linear)
# Полиномиальная регрессия
poly = make_pipeline(PolynomialFeatures(degree=5), LinearRegression())
# Обучение модели
poly.fit(X_train, y_train)
# Предсказание на тестовых данных
y_pred_poly = poly.predict(X_test)
# Оценка модели на тестовых данных (коэффициент детерминации)
poly_score = poly.score(X_test, y_test)
# Оценка модели на тестовых данных (cредняя абсолютная ошибка)
poly_mae = mean_absolute_error(y_test, y_pred_poly)
# Оценка модели на тестовых данных (cредняя квадратичная ошибка)
poly_mse = mean_squared_error(y_test, y_pred_poly)
# Гребневая полиномиальная регрессия
ridge = make_pipeline(PolynomialFeatures(degree=5), Ridge(alpha=1.0))
# Обучение модели
ridge.fit(X_train, y_train)
# Предсказание на тестовых данных
y_pred_ridge = ridge.predict(X_test)
# Оценка модели на тестовых данных (коэффициент детерминации)
ridge_score = ridge.score(X_test, y_test)
# Оценка модели на тестовых данных (cредняя абсолютная ошибка)
ridge_mae = mean_absolute_error(y_test, y_pred_ridge)
# Оценка модели на тестовых данных (cредняя квадратичная ошибка)
ridge_mse = mean_squared_error(y_test, y_pred_ridge)
# Вывод оценки качества моделей в консоль
print("Оценка качества предсказания моделей на тестовых данных:\n")
print("Линейная регрессия:")
print("Коэффициент детерминации: %f" % linear_score)
print("Средняя абсолютная ошибка: %f" % linear_mae)
print("Средняя квадратичная ошибка: %f\n" % linear_mse)
print("Полиномиальная регрессия:")
print("Коэффициент детерминации: %f" % poly_score)
print("Средняя абсолютная ошибка: %f" % poly_mae)
print("Средняя квадратичная ошибка: %f\n" % poly_mse)
print("Гребневая полиномиальная регрессия:")
print("Коэффициент детерминации: %f" % ridge_score)
print("Средняя абсолютная ошибка: %f" % ridge_mae)
print("Средняя квадратичная ошибка: %f\n" % ridge_mse)
# Отображение графиков
fig, axs = plt.subplots(1, 4, figsize=(15, 5))
axs[0].set_title("Исходные тестовые данные")
axs[0].scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap="bwr")
axs[1].set_title("Линейная регрессия")
axs[1].scatter(X_test[:, 0], X_test[:, 1], c=y_pred_linear, cmap="bwr")
axs[2].set_title("Полиномиальная регрессия")
axs[2].scatter(X_test[:, 0], X_test[:, 1], c=y_pred_poly, cmap="bwr")
axs[3].set_title("Гребневая полиномиальная регрессия")
axs[3].scatter(X_test[:, 0], X_test[:, 1], c=y_pred_ridge, cmap="bwr")
plt.savefig('plots.png')
plt.show()

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

View File

@@ -0,0 +1,63 @@
# Лабораторная работа №2
## ПИбд-42 Машкова Маргарита (Вариант 19)
## Задание
Выполнить ранжирование признаков с помощью указанных по варианту моделей.
Отобразить получившиеся значения\оценки каждого признака каждым методом\моделью и среднюю оценку.
Провести анализ получившихся результатов. Какие четыре признака оказались самыми важными по среднему значению?
### Модели:
> - Линейная регрессия (LinearRegression)
> - Гребневая регрессия (Ridge)
> - Лассо (Lasso)
> - Случайное Лассо (RandomizedLasso)
> **Note**
>
> Модель `RandomizedLasso` была признана устаревшей в scikit-learn 0.19 и удалена в 0.21.
Вместо нее будет использоваться регрессор случайного леса `RandomForestRegressor`.
## Запуск программы
Для запуска программы необходимо запустить файл main.py
## Используемые технологии
> **Язык программирования:** python
>
> **Библиотеки:**
> - `numpy` - используется для работы с массивами.
> - `sklearn` - предоставляет широкий спектр инструментов для машинного обучения, статистики и анализа данных.
## Описание работы программы
Для начала необходимо сгенерировать исходные данные (Х) - 750 строк-наблюдений и 14 столбцов-признаков.
Затем задать функцию-выход (Y): регрессионную проблему Фридмана, когда на вход моделей подается 14 факторов,
выход рассчитывается по формуле, использующей только пять факторов, но факторы 11-14 зависят от факторов 1-4.
Соотвественно, далее добавляется зависимость для признаков (факторов) х11, х12, х13, х14 от х1, х2, х3, х4.
Далее создаются модели, указанные в варианте задания, и выполняется их обучение.
После чего в единый массив размера 4×14 (количествооделей и количество_признаков) выгружаются все оценки
моделей по признакам. Находятся средние оценки и выводится результат в формате списка пар `{номер_признака средняя_оценка}`,
отсортированном по убыванию. Оценки признаков получаются через поле `coef_` у моделей LinearRegression, Ridge и Lasso.
У модели RandomForestRegressor - через поле `feature_importances_`.
Для удобства отображения данных оценки помещаются в конструкцию вида:
`[имя_модели : [{имя_признака : оценка},{имя_признака : оценка}...]]`.
Таким образом, получаем словарь, в котором располагаются 4 записи из четырнадцати пар каждая.
Ключом является имя модели.
## Тесты
### Оценки важности признаков моделями
![Оценки важности признаков моделями](ranks.png)
### Оценки важности признаков моделями, отсортированные по убыванию
![Оценки важности признаков моделями, отсортированные по убыванию](ranks_sorted.png)
### Средние оценки важности признаков
![Средние оценки важности признаков](means.png)
**Вывод:** основываясь на средних оценках, четырьмя наиболее важными празнаками оказались:
`x4 (0.86), x1 (0.8), x2 (0.73), x14 (0.51)`.
Все модели оценили как наиболее важные признаки x1, x2, x4, и четвертым важным признаком выбрали зависимые признаки:
LinearRegression - х11, Ridge - х14, RandomForestRegressor - х14. Модель Lasso включила также независимый признак - х5.

View File

@@ -0,0 +1,100 @@
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import MinMaxScaler
import numpy as np
# Генерация исходных данных: 750 строк-наблюдений и 14 столбцов-признаков
np.random.seed(0)
size = 750
X = np.random.uniform(0, 1, (size, 14))
# Задаем функцию-выход: регрессионную проблему Фридмана
Y = (10 * np.sin(np.pi * X[:, 0] * X[:, 1]) + 20 * (X[:, 2] - .5) ** 2 + 10 * X[:, 3] + 5 * X[:, 4] ** 5
+ np.random.normal(0, 1))
# Добавление зависимости признаков
X[:, 10:] = X[:, :4] + np.random.normal(0, .025, (size, 4))
# Создание моделей и их обучение
# Линейная модель
lr = LinearRegression()
lr.fit(X, Y)
# Гребневая модель
ridge = Ridge(alpha=7)
ridge.fit(X, Y)
# Лассо
lasso = Lasso(alpha=.05)
lasso.fit(X, Y)
# Регрессор случайного леса
rfr = RandomForestRegressor()
rfr.fit(X, Y)
# Список, содержащий имена признаков
names = ["x%s" % i for i in range(1, 15)]
# Функция создания записи в словаре оценок важности признаков
def rank_to_dict(ranks):
ranks = np.abs(ranks)
minmax = MinMaxScaler()
ranks = minmax.fit_transform(np.array(ranks).reshape(14, 1)).ravel()
ranks = map(lambda x: round(x, 2), ranks)
return dict(zip(names, ranks))
# Словарь, содержащий оценки важности признаков
ranks_dict = dict()
# Добавление записей в словарь
ranks_dict["Linear regression"] = rank_to_dict(lr.coef_)
ranks_dict["Ridge"] = rank_to_dict(ridge.coef_)
ranks_dict["Lasso"] = rank_to_dict(lasso.coef_)
ranks_dict["Random Forest Regressor"] = rank_to_dict(rfr.feature_importances_)
def print_ranks():
for key, value in ranks_dict.items():
print(key)
print(value)
def print_ranks_sorted():
for key, value in ranks_dict.items():
print(key)
value_sorted = sorted(value.items(), key=lambda x: x[1], reverse=True)
print(value_sorted)
def get_means():
# Создаем пустой список для средних оценок
mean = {}
for key, value in ranks_dict.items():
# Пробегаемся по словарю значений ranks, которые являются парой имя:оценка
for item in value.items():
# Имя будет ключом для нашего mean
# Если элемента с текущим ключом в mean нет - добавляем
if item[0] not in mean:
mean[item[0]] = 0
# Суммируем значения по каждому ключу-имени признака
mean[item[0]] += item[1]
# Находим среднее по каждому признаку
for key, value in mean.items():
res = value / len(ranks_dict)
mean[key] = round(res, 2)
# сортируем список
mean_sorted = sorted(mean.items(), key=lambda x: x[1], reverse=True)
return mean_sorted
def print_means():
for item in get_means():
print(item)
print("Оценки каждого признака каждой моделью:")
print_ranks()
print("\nОценки каждого признака каждой моделью, отсортированные по убыванию:")
print_ranks_sorted()
print("\nСредние оценки признаков:")
print_means()

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

View File

@@ -0,0 +1,88 @@
# Лабораторная работа №3
## ПИбд-42 Машкова Маргарита (Вариант 19)
## Задание
Решить с помощью библиотечной реализации дерева решений на 99% данных задачу:
Выявить зависимость стоимости телефона от других его признаков.
Проверить работу модели на оставшемся проценте, сделать вывод.
### Данные:
> Датасет о характеристиках мобильных телефонов и их ценах
>
> Ссылка на датасет в kaggle: [Mobile Phone Specifications and Prices](https://www.kaggle.com/datasets/pratikgarai/mobile-phone-specifications-and-prices/data)
### Модели:
> - DecisionTreeClassifier
## Запуск программы
Для запуска программы необходимо запустить файл main.py
## Используемые технологии
> **Язык программирования:** python
>
> **Библиотеки:**
> - `pandas` - предоставляет функциональность для обработки и анализа набора данных.
> - `sklearn` - предоставляет широкий спектр инструментов для машинного обучения, статистики и анализа данных.
## Описание работы программы
### Описание набора данных
Данный набор содержит характеристики различных телефонов, в том числе их цену.
Названия столбцов набора данных и их описание:
- **Id** - идентификатор строки (int)
- **Name** - наименование телефона (string)
- **Brand** - наименование бренда телефона (string)
- **Model** - модель телефона (string)
- **Battery capacity (mAh)** - емкость аккумулятора в мАч (int)
- **Screen size (inches)** - размер экрана в дюймах по противоположным углам (float)
- **Touchscreen** - имеет телефон сенсорный экран или нет (string - Yes/No)
- **Resolution x** - разрешение телефона по ширине экрана (int)
- **Resolution y** - разрешение телефона по высоте экрана (int)
- **Processor** - количество ядер процессора (int)
- **RAM (MB)** - доступная оперативная память телефона в МБ (int)
- **Internal storage (GB)** - внутренняя память телефона в ГБ (float)
- **Rear camera** - разрешение задней камеры в МП (0, если недоступно) (float)
- **Front camera** - разрешение фронтальной камеры в МП (0, если недоступно) (float)
- **Operating system** - ОС, используемая в телефоне (string)
- **Wi-Fi** - имеет ли телефон функция Wi-Fi (string - Yes/No)
- **Bluetooth** - имеет ли телефон функцию Bluetooth (string - Yes/No)
- **GPS** - имеет ли телефон функцию GPS (string - Yes/No)
- **Number of SIMs** - количество слотов для SIM-карт в телефоне (int)
- **3G** - имеет ли телефон сетевую функкцию 3G (string - Yes/No)
- **4G/ LTE** - имеет ли телефон сетевую функкцию 4G/LTE (string - Yes/No)
- **Price** - цена телефона в индийских рупиях (int)
### Обработка данных
Выведем информацию о данных при помощи функции DataFrame `data.info()`:
![Информация о данных](data_info.png)
Данные не содержат пустых строк. Все значения в столбцах необходимо привести к численным значениям.
Для преобразования полей, содержащих значения Yes/No воспользуемся числовым кодированием `LabelEncoder`.
Значение "Yes" станет равным 1, значение "No" - 0.
Остальные строковых поля будем преобразовывать при помощи векторайзер с суммированием `TfidfVectorizer`.
Данные после обработки:
![Обработанные данные](data_processed.png)
Далее создается Y - массив значений целового признака (цены).
Задача классификации решается дважды: сначала на всех признаках, затем на выявленных четырех важных.
Сначала в X передаются все признаки, выборка разделяется на тестовые (1%) и обучающие данные (99%),
модель дерева решений обучается, отображаются список важности признаков (по убыванию) и оценка модели.
Затем из списка берутся первые 4 признаки и задача решается повторно.
## Тесты
### Результат решения задачи классификации на всех признаках:
![Решение задачи на всех признаках](all_features.png)
### Результат решения задачи классификации на выявленных четырех важных признаках:
![Решение задачи на значимых признаках](important_features.png)
**Вывод:** исходя из полученных результатов, средняя точность работы модели на всех признаках составляет 7%,
т.е. модель работает слишком плохо на данных. При выборе только важных признаков средняя точность падает до 0.
Большое значение среднеквадратичной ошибки подтверждает тот факт, что модель имеет низкое качество.

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

View File

@@ -0,0 +1,68 @@
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import mean_squared_error
filename = "mobiles.csv"
# Считываем данные из файла в DataFrame
data = pd.read_csv(filename, sep=',')
# Удаляем столбец с идентификатором
data.pop("Id")
# print(data)
# data.info()
# Приведение строковых значений признаков к численным при помощи векторайзера с суммированием
FEATURE_COLUMNS_TO_PROC = ['Name', 'Brand', 'Model', 'Operating system']
for column_name in FEATURE_COLUMNS_TO_PROC:
vectorizer = TfidfVectorizer()
train_text_feature_matrix = vectorizer.fit_transform(data[column_name]).toarray()
a = pd.DataFrame(train_text_feature_matrix)
data[column_name] = a[a.columns[1:]].apply(lambda x: sum(x.dropna().astype(float)), axis = 1)
# Приведение строковых значений к численным при помощи числового кодирования LabelEncoder
le = LabelEncoder()
data['Touchscreen'] = le.fit_transform(data['Touchscreen'])
data['Wi-Fi'] = le.fit_transform(data['Wi-Fi'])
data['Bluetooth'] = le.fit_transform(data['Bluetooth'])
data['GPS'] = le.fit_transform(data['GPS'])
data['3G'] = le.fit_transform(data['3G'])
data['4G/ LTE'] = le.fit_transform(data['4G/ LTE'])
# Разделение данных на обучающую и тестовую выборки
# В Y помещаем целевой признак - цену
Y = data['Price']
def predict(X):
# Размер обучающей выборки - 99%, тестовой - 1%
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.01, random_state=42)
# Создание и обучение модели DecisionTreeClassifier
dtc = DecisionTreeClassifier(max_depth=5, random_state=241)
dtc.fit(X_train, y_train)
# Создание DataFrame с именами признаков и их важностью
feature_importance_df = pd.DataFrame({'Признак': X_train.columns, 'Важность': dtc.feature_importances_}) \
.sort_values(by='Важность', ascending=False)
print("Важность признаков:")
print(feature_importance_df)
mean_accuracy = dtc.score(X_test, y_test)
y_pred = dtc.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Среднеквадратичная ошибка: {mse}")
print(f"Средняя точность: {mean_accuracy}")
return feature_importance_df
print("\n***Решение задачи классификации, используя все параметры***")
# В Х помещаем все признаки, кроме цены
X = data.drop(columns='Price')
importance_df = predict(X)
print("\n***Решение задачи классификации, используя только 4 значимых параметра***")
# Выбираем только 4 значимых признака
important_features = importance_df.iloc[:4, 0].tolist()
X_important = data[important_features]
predict(X_important)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,27 @@
### Задание:
Данные: make_moons (noise=0.3, random_state=rs)
Модели:
· Линейную регрессию
· Полиномиальную регрессию (со степенью 5)
· Гребневую полиномиальную регрессию (со степенью 5, alpha= 1.0)
### как запустить лабораторную работу:
Лабораторная работа запускается в файле `main.py` через Run, должно запуститься окно с графиками и вычисления в консоли
### Технологии:
Для решения задач будем использовать следующие библиотеки
matplotlib для построения графиков,
numpy для математических операций,
sklearn для обучения моделей и получения результатов
### Что делает лабораторная:
Выполнение кода выводит точность каждой модели (в консоль) и отображает графики решений для каждой модели.
### Пример выходных значений:
Консоль:
![результат в консоль](console.png)
Графики:
![img.png](gr.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Some files were not shown because too many files have changed in this diff Show More