753 lines
130 KiB
Plaintext
753 lines
130 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Лабораторная работа 3\n",
|
|||
|
"\n",
|
|||
|
"Датасет - **Цены на золото**\thttps://www.kaggle.com/datasets/sid321axn/gold-price-prediction-dataset\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"**Бизнес-цели**: \n",
|
|||
|
"1. Прогнозирование цены золота на момент закрытия для поддержки принятия решений по инвестициям.\n",
|
|||
|
"2. Оценка волатильности цен золота для долгосрочных стратегий инвестирования."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"**Цели технического проекта**: \n",
|
|||
|
"1. Создание модели машинного обучения для прогнозирования цены закрытия акций на золото на основе исторических данных (дат, цен открытия, максимальных и минимальных цен, объёма торгов).\n",
|
|||
|
"2. Разработка системы, которая вычисляет и анализирует волатильность на основе исторической ценовой информации и объёмов торгов."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 13,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
" Date Open High Low Close Adj Close \\\n",
|
|||
|
"0 2011-12-15 154.740005 154.949997 151.710007 152.330002 152.330002 \n",
|
|||
|
"1 2011-12-16 154.309998 155.369995 153.899994 155.229996 155.229996 \n",
|
|||
|
"2 2011-12-19 155.479996 155.860001 154.360001 154.869995 154.869995 \n",
|
|||
|
"3 2011-12-20 156.820007 157.429993 156.580002 156.979996 156.979996 \n",
|
|||
|
"4 2011-12-21 156.979996 157.529999 156.130005 157.160004 157.160004 \n",
|
|||
|
"... ... ... ... ... ... ... \n",
|
|||
|
"1713 2018-12-24 119.570000 120.139999 119.570000 120.019997 120.019997 \n",
|
|||
|
"1714 2018-12-26 120.620003 121.000000 119.570000 119.660004 119.660004 \n",
|
|||
|
"1715 2018-12-27 120.570000 120.900002 120.139999 120.570000 120.570000 \n",
|
|||
|
"1716 2018-12-28 120.800003 121.080002 120.720001 121.059998 121.059998 \n",
|
|||
|
"1717 2018-12-31 120.980003 121.260002 120.830002 121.250000 121.250000 \n",
|
|||
|
"\n",
|
|||
|
" Volume SP_open SP_high SP_low ... GDX_Low GDX_Close \\\n",
|
|||
|
"0 21521900 123.029999 123.199997 121.989998 ... 51.570000 51.680000 \n",
|
|||
|
"1 18124300 122.230003 122.949997 121.300003 ... 52.040001 52.680000 \n",
|
|||
|
"2 12547200 122.059998 122.320000 120.029999 ... 51.029999 51.169998 \n",
|
|||
|
"3 9136300 122.180000 124.139999 120.370003 ... 52.369999 52.990002 \n",
|
|||
|
"4 11996100 123.930000 124.360001 122.750000 ... 52.419998 52.959999 \n",
|
|||
|
"... ... ... ... ... ... ... ... \n",
|
|||
|
"1713 9736400 239.039993 240.839996 234.270004 ... 20.650000 21.090000 \n",
|
|||
|
"1714 14293500 235.970001 246.179993 233.759995 ... 20.530001 20.620001 \n",
|
|||
|
"1715 11874400 242.570007 248.289993 238.960007 ... 20.700001 20.969999 \n",
|
|||
|
"1716 6864700 249.580002 251.399994 246.449997 ... 20.570000 20.600000 \n",
|
|||
|
"1717 8449400 249.559998 250.190002 247.470001 ... 20.559999 21.090000 \n",
|
|||
|
"\n",
|
|||
|
" GDX_Adj Close GDX_Volume USO_Open USO_High USO_Low USO_Close \\\n",
|
|||
|
"0 48.973877 20605600 36.900002 36.939999 36.049999 36.130001 \n",
|
|||
|
"1 49.921513 16285400 36.180000 36.500000 35.730000 36.270000 \n",
|
|||
|
"2 48.490578 15120200 36.389999 36.450001 35.930000 36.200001 \n",
|
|||
|
"3 50.215282 11644900 37.299999 37.610001 37.220001 37.560001 \n",
|
|||
|
"4 50.186852 8724300 37.669998 38.240002 37.520000 38.110001 \n",
|
|||
|
"... ... ... ... ... ... ... \n",
|
|||
|
"1713 21.090000 60507000 9.490000 9.520000 9.280000 9.290000 \n",
|
|||
|
"1714 20.620001 76365200 9.250000 9.920000 9.230000 9.900000 \n",
|
|||
|
"1715 20.969999 52393000 9.590000 9.650000 9.370000 9.620000 \n",
|
|||
|
"1716 20.600000 49835000 9.540000 9.650000 9.380000 9.530000 \n",
|
|||
|
"1717 21.090000 53866600 9.630000 9.710000 9.440000 9.660000 \n",
|
|||
|
"\n",
|
|||
|
" USO_Adj Close USO_Volume \n",
|
|||
|
"0 36.130001 12616700 \n",
|
|||
|
"1 36.270000 12578800 \n",
|
|||
|
"2 36.200001 7418200 \n",
|
|||
|
"3 37.560001 10041600 \n",
|
|||
|
"4 38.110001 10728000 \n",
|
|||
|
"... ... ... \n",
|
|||
|
"1713 9.290000 21598200 \n",
|
|||
|
"1714 9.900000 40978800 \n",
|
|||
|
"1715 9.620000 36578700 \n",
|
|||
|
"1716 9.530000 22803400 \n",
|
|||
|
"1717 9.660000 28417400 \n",
|
|||
|
"\n",
|
|||
|
"[1718 rows x 81 columns]\n",
|
|||
|
"0 15323\n",
|
|||
|
"1 15324\n",
|
|||
|
"2 15327\n",
|
|||
|
"3 15328\n",
|
|||
|
"4 15329\n",
|
|||
|
" ... \n",
|
|||
|
"1713 17889\n",
|
|||
|
"1714 17891\n",
|
|||
|
"1715 17892\n",
|
|||
|
"1716 17893\n",
|
|||
|
"1717 17896\n",
|
|||
|
"Name: Date_numeric, Length: 1718, dtype: int64\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"from imblearn.under_sampling import RandomUnderSampler\n",
|
|||
|
"\n",
|
|||
|
"df = pd.read_csv(\"data/Gold.csv\")\n",
|
|||
|
"print(df)\n",
|
|||
|
"\n",
|
|||
|
"# Преобразование даты продажи в числовой формат (кол-во дней с 01.01.1970)\n",
|
|||
|
"df['Date'] = pd.to_datetime(df['Date'])\n",
|
|||
|
"df['Date_numeric'] = (df['Date'] - pd.Timestamp('1970-01-01')).dt.days\n",
|
|||
|
"print(df['Date_numeric'])"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 14,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Index(['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume',\n",
|
|||
|
" 'SP_open', 'SP_high', 'SP_low', 'SP_close', 'SP_Ajclose', 'SP_volume',\n",
|
|||
|
" 'DJ_open', 'DJ_high', 'DJ_low', 'DJ_close', 'DJ_Ajclose', 'DJ_volume',\n",
|
|||
|
" 'EG_open', 'EG_high', 'EG_low', 'EG_close', 'EG_Ajclose', 'EG_volume',\n",
|
|||
|
" 'EU_Price', 'EU_open', 'EU_high', 'EU_low', 'EU_Trend', 'OF_Price',\n",
|
|||
|
" 'OF_Open', 'OF_High', 'OF_Low', 'OF_Volume', 'OF_Trend', 'OS_Price',\n",
|
|||
|
" 'OS_Open', 'OS_High', 'OS_Low', 'OS_Trend', 'SF_Price', 'SF_Open',\n",
|
|||
|
" 'SF_High', 'SF_Low', 'SF_Volume', 'SF_Trend', 'USB_Price', 'USB_Open',\n",
|
|||
|
" 'USB_High', 'USB_Low', 'USB_Trend', 'PLT_Price', 'PLT_Open', 'PLT_High',\n",
|
|||
|
" 'PLT_Low', 'PLT_Trend', 'PLD_Price', 'PLD_Open', 'PLD_High', 'PLD_Low',\n",
|
|||
|
" 'PLD_Trend', 'RHO_PRICE', 'USDI_Price', 'USDI_Open', 'USDI_High',\n",
|
|||
|
" 'USDI_Low', 'USDI_Volume', 'USDI_Trend', 'GDX_Open', 'GDX_High',\n",
|
|||
|
" 'GDX_Low', 'GDX_Close', 'GDX_Adj Close', 'GDX_Volume', 'USO_Open',\n",
|
|||
|
" 'USO_High', 'USO_Low', 'USO_Close', 'USO_Adj Close', 'USO_Volume',\n",
|
|||
|
" 'Date_numeric', 'Close_binned'],\n",
|
|||
|
" dtype='object')\n",
|
|||
|
"Обучающая выборка: (1030, 83)\n",
|
|||
|
"Close\n",
|
|||
|
"124.589996 4\n",
|
|||
|
"126.180000 3\n",
|
|||
|
"116.330002 3\n",
|
|||
|
"126.449997 3\n",
|
|||
|
"121.309998 3\n",
|
|||
|
" ..\n",
|
|||
|
"115.489998 1\n",
|
|||
|
"131.759995 1\n",
|
|||
|
"121.169998 1\n",
|
|||
|
"118.989998 1\n",
|
|||
|
"124.500000 1\n",
|
|||
|
"Name: count, Length: 900, dtype: int64\n",
|
|||
|
"Контрольная выборка: (343, 83)\n",
|
|||
|
"Close\n",
|
|||
|
"113.019997 2\n",
|
|||
|
"112.570000 2\n",
|
|||
|
"118.360001 2\n",
|
|||
|
"151.619995 2\n",
|
|||
|
"126.300003 2\n",
|
|||
|
" ..\n",
|
|||
|
"170.770004 1\n",
|
|||
|
"117.550003 1\n",
|
|||
|
"124.279999 1\n",
|
|||
|
"157.429993 1\n",
|
|||
|
"121.339996 1\n",
|
|||
|
"Name: count, Length: 329, dtype: int64\n",
|
|||
|
"Тестовая выборка: (344, 83)\n",
|
|||
|
"Close\n",
|
|||
|
"114.769997 3\n",
|
|||
|
"117.120003 2\n",
|
|||
|
"107.790001 2\n",
|
|||
|
"123.209999 2\n",
|
|||
|
"155.550003 2\n",
|
|||
|
" ..\n",
|
|||
|
"160.440002 1\n",
|
|||
|
"117.599998 1\n",
|
|||
|
"113.419998 1\n",
|
|||
|
"119.750000 1\n",
|
|||
|
"114.830002 1\n",
|
|||
|
"Name: count, Length: 331, dtype: int64\n",
|
|||
|
"Обучающая выборка: (1030, 83)\n",
|
|||
|
"Close_binned\n",
|
|||
|
"High 350\n",
|
|||
|
"Low 340\n",
|
|||
|
"Medium 340\n",
|
|||
|
"Name: count, dtype: int64\n",
|
|||
|
"Контрольная выборка: (343, 83)\n",
|
|||
|
"Close_binned\n",
|
|||
|
"High 117\n",
|
|||
|
"Low 113\n",
|
|||
|
"Medium 113\n",
|
|||
|
"Name: count, dtype: int64\n",
|
|||
|
"Тестовая выборка: (344, 83)\n",
|
|||
|
"Close_binned\n",
|
|||
|
"High 117\n",
|
|||
|
"Medium 114\n",
|
|||
|
"Low 113\n",
|
|||
|
"Name: count, dtype: int64\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Функция для разбиения на обучающую, валидационную, тестовую выборки\n",
|
|||
|
"def split_stratified_into_train_val_test(\n",
|
|||
|
" df_input,\n",
|
|||
|
" stratify_colname=\"y\",\n",
|
|||
|
" frac_train=0.6,\n",
|
|||
|
" frac_val=0.15,\n",
|
|||
|
" frac_test=0.25,\n",
|
|||
|
" random_state=None,\n",
|
|||
|
"):\n",
|
|||
|
" #проверка, что сумма долей выборок равна 1\n",
|
|||
|
" if frac_train + frac_val + frac_test != 1.0:\n",
|
|||
|
" raise ValueError(\n",
|
|||
|
" \"fractions %f, %f, %f do not add up to 1.0\"\n",
|
|||
|
" % (frac_train, frac_val, frac_test)\n",
|
|||
|
" )\n",
|
|||
|
"\n",
|
|||
|
" #проверка наличия указанного столбца для стратификации\n",
|
|||
|
" if stratify_colname not in df_input.columns:\n",
|
|||
|
" raise ValueError(\"%s is not a column in the dataframe\" % (stratify_colname))\n",
|
|||
|
"\n",
|
|||
|
" #разделение на признаки х и целевую переменную у\n",
|
|||
|
" X = df_input \n",
|
|||
|
" y = df_input[\n",
|
|||
|
" [stratify_colname]\n",
|
|||
|
" ] \n",
|
|||
|
"\n",
|
|||
|
" #разделение данных на обучающую и временную выборку с учетом стратификации\n",
|
|||
|
" df_train, df_temp, y_train, y_temp = train_test_split(\n",
|
|||
|
" X, y, stratify=y, test_size=(1.0 - frac_train), random_state=random_state\n",
|
|||
|
" )\n",
|
|||
|
"\n",
|
|||
|
" #вычисление относительной доли тестовой выборки по отношению к временной\n",
|
|||
|
" relative_frac_test = frac_test / (frac_val + frac_test)\n",
|
|||
|
" #разделение временной выборки на валидационную и тестовую\n",
|
|||
|
" df_val, df_test, y_val, y_test = train_test_split(\n",
|
|||
|
" df_temp,\n",
|
|||
|
" y_temp,\n",
|
|||
|
" stratify=y_temp,\n",
|
|||
|
" test_size=relative_frac_test,\n",
|
|||
|
" random_state=random_state,\n",
|
|||
|
" )\n",
|
|||
|
" #проверка что общее кол-во данных равно сумме трех выборок\n",
|
|||
|
" assert len(df_input) == len(df_train) + len(df_val) + len(df_test)\n",
|
|||
|
"\n",
|
|||
|
" return df_train, df_val, df_test\n",
|
|||
|
"\n",
|
|||
|
"#создаем бины (интервалы) для столбца Close и присваиваем метки в новый столбец\n",
|
|||
|
"bins = [df['Close'].min(), df['Close'].quantile(0.33), df['Close'].quantile(0.66), df['Close'].max()]\n",
|
|||
|
"labels = ['Low', 'Medium', 'High']\n",
|
|||
|
"df['Close_binned'] = pd.cut(df['Close'], bins=bins, labels=labels)\n",
|
|||
|
"#удаляем строки с пропущенными значениями\n",
|
|||
|
"df = df.dropna()\n",
|
|||
|
"# вызываем ф-ию для разделения данных на выборки с стратификацией по новому столбцу Close_binned\n",
|
|||
|
"df_train, df_val, df_test = split_stratified_into_train_val_test(\n",
|
|||
|
" df, stratify_colname=\"Close_binned\", frac_train=0.60, frac_val=0.20, frac_test=0.20\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"print(df_train.columns) \n",
|
|||
|
" \n",
|
|||
|
"print(\"Обучающая выборка: \", df_train.shape)\n",
|
|||
|
"print(df_train.Close.value_counts()) \n",
|
|||
|
"\n",
|
|||
|
"print(\"Контрольная выборка: \", df_val.shape)\n",
|
|||
|
"print(df_val.Close.value_counts())\n",
|
|||
|
"\n",
|
|||
|
"print(\"Тестовая выборка: \", df_test.shape)\n",
|
|||
|
"print(df_test.Close.value_counts())\n",
|
|||
|
"\n",
|
|||
|
"print(\"Обучающая выборка: \", df_train.shape)\n",
|
|||
|
"print(df_train['Close_binned'].value_counts())\n",
|
|||
|
"\n",
|
|||
|
"print(\"Контрольная выборка: \", df_val.shape)\n",
|
|||
|
"print(df_val['Close_binned'].value_counts())\n",
|
|||
|
"\n",
|
|||
|
"print(\"Тестовая выборка: \", df_test.shape)\n",
|
|||
|
"print(df_test['Close_binned'].value_counts())\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 15,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Обучающая выборка после undersampling: (1020, 83)\n",
|
|||
|
"Close\n",
|
|||
|
"124.589996 4\n",
|
|||
|
"125.540001 3\n",
|
|||
|
"122.209999 3\n",
|
|||
|
"121.110001 3\n",
|
|||
|
"121.580002 3\n",
|
|||
|
" ..\n",
|
|||
|
"126.989998 1\n",
|
|||
|
"156.279999 1\n",
|
|||
|
"157.929993 1\n",
|
|||
|
"127.150002 1\n",
|
|||
|
"149.460007 1\n",
|
|||
|
"Name: count, Length: 892, dtype: int64\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"#уменьшаем дисбаланс и количество уникальных значений в столбце Close\n",
|
|||
|
"rus = RandomUnderSampler(random_state=42)\n",
|
|||
|
"X_resampled, y_resampled = rus.fit_resample(df_train, df_train[\"Close_binned\"])\n",
|
|||
|
"# Создание датафрейма для результирующей выборки\n",
|
|||
|
"df_train_rus = pd.DataFrame(X_resampled)\n",
|
|||
|
"print(\"Обучающая выборка после undersampling: \", df_train_rus.shape)\n",
|
|||
|
"print(df_train_rus.Close.value_counts())"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 16,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
" Date Open High Low Close Adj Close \\\n",
|
|||
|
"1540 2018-04-16 127.739998 128.050003 127.570000 127.629997 127.629997 \n",
|
|||
|
"568 2014-05-16 124.349998 124.769997 124.290001 124.500000 124.500000 \n",
|
|||
|
"972 2015-12-30 101.470001 101.599998 101.349998 101.419998 101.419998 \n",
|
|||
|
"1087 2016-06-16 125.169998 125.669998 122.230003 122.379997 122.379997 \n",
|
|||
|
"401 2013-08-22 132.559998 133.460007 132.270004 132.809998 132.809998 \n",
|
|||
|
"... ... ... ... ... ... ... \n",
|
|||
|
"524 2014-03-12 131.559998 132.119995 131.389999 131.759995 131.759995 \n",
|
|||
|
"1409 2017-10-04 121.209999 121.269997 120.709999 121.169998 121.169998 \n",
|
|||
|
"672 2014-10-17 119.059998 119.230003 118.419998 118.989998 118.989998 \n",
|
|||
|
"1333 2017-06-14 121.510002 121.879997 119.570000 119.820000 119.820000 \n",
|
|||
|
"1695 2018-11-27 115.550003 115.629997 114.599998 114.949997 114.949997 \n",
|
|||
|
"\n",
|
|||
|
" Volume SP_open SP_high SP_low ... USO_Low USO_Close \\\n",
|
|||
|
"1540 4600000 267.000000 268.200012 266.070007 ... 13.340000 13.380000 \n",
|
|||
|
"568 4052100 187.509995 188.130005 186.720001 ... 37.090000 37.230000 \n",
|
|||
|
"972 3745000 207.110001 207.210007 205.759995 ... 10.840000 10.930000 \n",
|
|||
|
"1087 26635000 207.750000 208.570007 205.589996 ... 11.110000 11.140000 \n",
|
|||
|
"401 5741800 164.899994 166.300003 164.889999 ... 37.090000 37.540001 \n",
|
|||
|
"... ... ... ... ... ... ... ... \n",
|
|||
|
"524 11094200 186.320007 187.350006 185.899994 ... 35.040001 35.349998 \n",
|
|||
|
"1409 6475200 252.690002 253.440002 252.559998 ... 10.060000 10.080000 \n",
|
|||
|
"672 8059000 188.419998 189.750000 187.619995 ... 31.030001 31.250000 \n",
|
|||
|
"1333 21124800 244.860001 244.869995 243.289993 ... 9.200000 9.230000 \n",
|
|||
|
"1695 9671100 266.339996 268.399994 265.660004 ... 10.640000 10.950000 \n",
|
|||
|
"\n",
|
|||
|
" USO_Adj Close USO_Volume Date_numeric Close_binned_Low \\\n",
|
|||
|
"1540 13.380000 14024600 17637 False \n",
|
|||
|
"568 37.230000 1574900 16206 False \n",
|
|||
|
"972 10.930000 24796400 16799 True \n",
|
|||
|
"1087 11.140000 31068700 16968 False \n",
|
|||
|
"401 37.540001 4511400 15939 False \n",
|
|||
|
"... ... ... ... ... \n",
|
|||
|
"524 35.349998 6572600 16141 False \n",
|
|||
|
"1409 10.080000 13078000 17443 False \n",
|
|||
|
"672 31.250000 7509000 16360 False \n",
|
|||
|
"1333 9.230000 60687800 17331 False \n",
|
|||
|
"1695 10.950000 36086100 17862 True \n",
|
|||
|
"\n",
|
|||
|
" Close_binned_Medium Close_binned_High Volume_binned Price_change \n",
|
|||
|
"1540 False True 0 -0.110001 \n",
|
|||
|
"568 True False 0 0.150002 \n",
|
|||
|
"972 False False 0 -0.050003 \n",
|
|||
|
"1087 True False 3 -2.790001 \n",
|
|||
|
"401 False True 1 0.250000 \n",
|
|||
|
"... ... ... ... ... \n",
|
|||
|
"524 False True 3 0.199997 \n",
|
|||
|
"1409 True False 1 -0.040001 \n",
|
|||
|
"672 True False 2 -0.070000 \n",
|
|||
|
"1333 True False 3 -1.690002 \n",
|
|||
|
"1695 False False 2 -0.600006 \n",
|
|||
|
"\n",
|
|||
|
"[1030 rows x 87 columns]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_train = pd.get_dummies(df_train, columns=['Close_binned'])\n",
|
|||
|
"df_train['Volume_binned'] = pd.qcut(df_train['Volume'], q=4, labels=False)\n",
|
|||
|
"#Создание нового столбца 'Price_change', который показывает изменение цены (разница между закрытием и открытием).\n",
|
|||
|
"df_train['Price_change'] = df_train['Close'] - df_train['Open']\n",
|
|||
|
"print(df_train) "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 17,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
" Date Open High Low Close Adj Close Volume \\\n",
|
|||
|
"1540 2018-04-16 0.043550 0.030843 0.067156 0.038908 127.629997 -0.713195 \n",
|
|||
|
"568 2014-05-16 -0.154408 -0.159690 -0.126032 -0.143968 124.500000 -0.814939 \n",
|
|||
|
"972 2015-12-30 -1.490481 -1.505617 -1.477167 -1.492459 101.419998 -0.871967 \n",
|
|||
|
"1087 2016-06-16 -0.106524 -0.107409 -0.247363 -0.267833 122.379997 3.378675 \n",
|
|||
|
"401 2013-08-22 0.325013 0.345106 0.343980 0.341559 132.809998 -0.501164 \n",
|
|||
|
"... ... ... ... ... ... ... ... \n",
|
|||
|
"524 2014-03-12 0.266619 0.267266 0.292149 0.280211 131.759995 0.492769 \n",
|
|||
|
"1409 2017-10-04 -0.337768 -0.363002 -0.336889 -0.338529 121.169998 -0.364973 \n",
|
|||
|
"672 2014-10-17 -0.463316 -0.481504 -0.471767 -0.465900 118.989998 -0.070863 \n",
|
|||
|
"1333 2017-06-14 -0.320249 -0.327568 -0.404033 -0.417405 119.820000 2.355439 \n",
|
|||
|
"1695 2018-11-27 -0.668282 -0.690625 -0.696760 -0.701944 114.949997 0.228502 \n",
|
|||
|
"\n",
|
|||
|
" SP_open SP_high SP_low ... USO_Close USO_Adj Close \\\n",
|
|||
|
"1540 267.000000 268.200012 266.070007 ... 13.380000 13.380000 \n",
|
|||
|
"568 187.509995 188.130005 186.720001 ... 37.230000 37.230000 \n",
|
|||
|
"972 207.110001 207.210007 205.759995 ... 10.930000 10.930000 \n",
|
|||
|
"1087 207.750000 208.570007 205.589996 ... 11.140000 11.140000 \n",
|
|||
|
"401 164.899994 166.300003 164.889999 ... 37.540001 37.540001 \n",
|
|||
|
"... ... ... ... ... ... ... \n",
|
|||
|
"524 186.320007 187.350006 185.899994 ... 35.349998 35.349998 \n",
|
|||
|
"1409 252.690002 253.440002 252.559998 ... 10.080000 10.080000 \n",
|
|||
|
"672 188.419998 189.750000 187.619995 ... 31.250000 31.250000 \n",
|
|||
|
"1333 244.860001 244.869995 243.289993 ... 9.230000 9.230000 \n",
|
|||
|
"1695 266.339996 268.399994 265.660004 ... 10.950000 10.950000 \n",
|
|||
|
"\n",
|
|||
|
" USO_Volume Date_numeric Close_binned_Low Close_binned_Medium \\\n",
|
|||
|
"1540 14024600 17637 False False \n",
|
|||
|
"568 1574900 16206 False True \n",
|
|||
|
"972 24796400 16799 True False \n",
|
|||
|
"1087 31068700 16968 False True \n",
|
|||
|
"401 4511400 15939 False False \n",
|
|||
|
"... ... ... ... ... \n",
|
|||
|
"524 6572600 16141 False False \n",
|
|||
|
"1409 13078000 17443 False True \n",
|
|||
|
"672 7509000 16360 False True \n",
|
|||
|
"1333 60687800 17331 False True \n",
|
|||
|
"1695 36086100 17862 True False \n",
|
|||
|
"\n",
|
|||
|
" Close_binned_High Volume_binned Price_change Volatility \n",
|
|||
|
"1540 True 0 -0.110001 -0.036313 \n",
|
|||
|
"568 False 0 0.150002 -0.033658 \n",
|
|||
|
"972 False 0 -0.050003 -0.028451 \n",
|
|||
|
"1087 False 3 -2.790001 0.139953 \n",
|
|||
|
"401 True 1 0.250000 0.001127 \n",
|
|||
|
"... ... ... ... ... \n",
|
|||
|
"524 True 3 0.199997 -0.024883 \n",
|
|||
|
"1409 False 1 -0.040001 -0.026113 \n",
|
|||
|
"672 False 2 -0.070000 -0.009737 \n",
|
|||
|
"1333 False 3 -1.690002 0.076466 \n",
|
|||
|
"1695 False 2 -0.600006 0.006134 \n",
|
|||
|
"\n",
|
|||
|
"[1030 rows x 88 columns]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.preprocessing import StandardScaler\n",
|
|||
|
"# Нормализация значений для указанных столбцов\n",
|
|||
|
"# чтобы значения разных столбов в среднем были 0, а стандартное отклонение - 1\n",
|
|||
|
"scaler = StandardScaler()\n",
|
|||
|
"df_train[['Open', 'Close', 'High', 'Low', 'Volume']] = scaler.fit_transform(\n",
|
|||
|
" df_train[['Open', 'Close', 'High', 'Low', 'Volume']])\n",
|
|||
|
"#Создание нового столбца 'Volatility', который показывает волатильность (разницу между максимальной и минимальной ценой).\n",
|
|||
|
"df_train['Volatility'] = df_train['High'] - df_train['Low']\n",
|
|||
|
"print(df_train) "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 18,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"d:\\3 kurs\\МИИ\\1 лаб\\mai-main\\.venv\\Lib\\site-packages\\featuretools\\synthesis\\deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"[<Feature: Open>,\n",
|
|||
|
" <Feature: High>,\n",
|
|||
|
" <Feature: Low>,\n",
|
|||
|
" <Feature: Close>,\n",
|
|||
|
" <Feature: Adj Close>,\n",
|
|||
|
" <Feature: Volume>,\n",
|
|||
|
" <Feature: SP_open>,\n",
|
|||
|
" <Feature: SP_high>,\n",
|
|||
|
" <Feature: SP_low>,\n",
|
|||
|
" <Feature: SP_close>,\n",
|
|||
|
" <Feature: SP_Ajclose>,\n",
|
|||
|
" <Feature: SP_volume>,\n",
|
|||
|
" <Feature: DJ_open>,\n",
|
|||
|
" <Feature: DJ_high>,\n",
|
|||
|
" <Feature: DJ_low>,\n",
|
|||
|
" <Feature: DJ_close>,\n",
|
|||
|
" <Feature: DJ_Ajclose>,\n",
|
|||
|
" <Feature: DJ_volume>,\n",
|
|||
|
" <Feature: EG_open>,\n",
|
|||
|
" <Feature: EG_high>,\n",
|
|||
|
" <Feature: EG_low>,\n",
|
|||
|
" <Feature: EG_close>,\n",
|
|||
|
" <Feature: EG_Ajclose>,\n",
|
|||
|
" <Feature: EG_volume>,\n",
|
|||
|
" <Feature: EU_Price>,\n",
|
|||
|
" <Feature: EU_open>,\n",
|
|||
|
" <Feature: EU_high>,\n",
|
|||
|
" <Feature: EU_low>,\n",
|
|||
|
" <Feature: EU_Trend>,\n",
|
|||
|
" <Feature: OF_Price>,\n",
|
|||
|
" <Feature: OF_Open>,\n",
|
|||
|
" <Feature: OF_High>,\n",
|
|||
|
" <Feature: OF_Low>,\n",
|
|||
|
" <Feature: OF_Volume>,\n",
|
|||
|
" <Feature: OF_Trend>,\n",
|
|||
|
" <Feature: OS_Price>,\n",
|
|||
|
" <Feature: OS_Open>,\n",
|
|||
|
" <Feature: OS_High>,\n",
|
|||
|
" <Feature: OS_Low>,\n",
|
|||
|
" <Feature: OS_Trend>,\n",
|
|||
|
" <Feature: SF_Price>,\n",
|
|||
|
" <Feature: SF_Open>,\n",
|
|||
|
" <Feature: SF_High>,\n",
|
|||
|
" <Feature: SF_Low>,\n",
|
|||
|
" <Feature: SF_Volume>,\n",
|
|||
|
" <Feature: SF_Trend>,\n",
|
|||
|
" <Feature: USB_Price>,\n",
|
|||
|
" <Feature: USB_Open>,\n",
|
|||
|
" <Feature: USB_High>,\n",
|
|||
|
" <Feature: USB_Low>,\n",
|
|||
|
" <Feature: USB_Trend>,\n",
|
|||
|
" <Feature: PLT_Price>,\n",
|
|||
|
" <Feature: PLT_Open>,\n",
|
|||
|
" <Feature: PLT_High>,\n",
|
|||
|
" <Feature: PLT_Low>,\n",
|
|||
|
" <Feature: PLT_Trend>,\n",
|
|||
|
" <Feature: PLD_Price>,\n",
|
|||
|
" <Feature: PLD_Open>,\n",
|
|||
|
" <Feature: PLD_High>,\n",
|
|||
|
" <Feature: PLD_Low>,\n",
|
|||
|
" <Feature: PLD_Trend>,\n",
|
|||
|
" <Feature: RHO_PRICE>,\n",
|
|||
|
" <Feature: USDI_Price>,\n",
|
|||
|
" <Feature: USDI_Open>,\n",
|
|||
|
" <Feature: USDI_High>,\n",
|
|||
|
" <Feature: USDI_Low>,\n",
|
|||
|
" <Feature: USDI_Volume>,\n",
|
|||
|
" <Feature: USDI_Trend>,\n",
|
|||
|
" <Feature: GDX_Open>,\n",
|
|||
|
" <Feature: GDX_High>,\n",
|
|||
|
" <Feature: GDX_Low>,\n",
|
|||
|
" <Feature: GDX_Close>,\n",
|
|||
|
" <Feature: GDX_Adj Close>,\n",
|
|||
|
" <Feature: GDX_Volume>,\n",
|
|||
|
" <Feature: USO_Open>,\n",
|
|||
|
" <Feature: USO_High>,\n",
|
|||
|
" <Feature: USO_Low>,\n",
|
|||
|
" <Feature: USO_Close>,\n",
|
|||
|
" <Feature: USO_Adj Close>,\n",
|
|||
|
" <Feature: USO_Volume>,\n",
|
|||
|
" <Feature: Date_numeric>,\n",
|
|||
|
" <Feature: Close_binned_Low>,\n",
|
|||
|
" <Feature: Close_binned_Medium>,\n",
|
|||
|
" <Feature: Close_binned_High>,\n",
|
|||
|
" <Feature: Volume_binned>,\n",
|
|||
|
" <Feature: Price_change>,\n",
|
|||
|
" <Feature: Volatility>,\n",
|
|||
|
" <Feature: DAY(Date)>,\n",
|
|||
|
" <Feature: MONTH(Date)>,\n",
|
|||
|
" <Feature: WEEKDAY(Date)>,\n",
|
|||
|
" <Feature: YEAR(Date)>]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 18,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"#генерация новых признаков из данных с помощью featuretools\n",
|
|||
|
"import featuretools as ft\n",
|
|||
|
"# Создание EntitySet для объединения разных датасетов для удобного использования\n",
|
|||
|
"es = ft.EntitySet(id=\"stocks\")\n",
|
|||
|
"es = es.add_dataframe(\n",
|
|||
|
" dataframe_name=\"stock_data\", \n",
|
|||
|
" dataframe=df_train, \n",
|
|||
|
" index=\"Date\")\n",
|
|||
|
"# Генерация признаков\n",
|
|||
|
"feature_matrix, feature_defs = ft.dfs(\n",
|
|||
|
" entityset=es, \n",
|
|||
|
" target_dataframe_name=\"stock_data\")\n",
|
|||
|
"\n",
|
|||
|
"feature_defs"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 19,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Open float64\n",
|
|||
|
"High float64\n",
|
|||
|
"Low float64\n",
|
|||
|
"Adj Close float64\n",
|
|||
|
"Volume float64\n",
|
|||
|
" ... \n",
|
|||
|
"Close_binned_Medium bool\n",
|
|||
|
"Close_binned_High bool\n",
|
|||
|
"Volume_binned int64\n",
|
|||
|
"Price_change float64\n",
|
|||
|
"Volatility float64\n",
|
|||
|
"Length: 86, dtype: object\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Оценка предсказательной способности\n",
|
|||
|
"from sklearn.linear_model import LinearRegression\n",
|
|||
|
"from sklearn.metrics import mean_absolute_error, mean_squared_error\n",
|
|||
|
"# Копирование датафрейма для регрессионного анализа\n",
|
|||
|
"df_train_regression = df_train.copy()\n",
|
|||
|
"# Определение признаков и целевой переменной\n",
|
|||
|
"X_train = df_train_regression.drop(['Close', 'Date'], axis=1)\n",
|
|||
|
"y_train = df_train_regression['Close']\n",
|
|||
|
"X_test = df_test.drop(['Close', 'Date'], axis=1)\n",
|
|||
|
"y_test = df_test['Close']\n",
|
|||
|
"# Преобразование категориальных признаков в дамми-переменные\n",
|
|||
|
"# (создание столбцов со значениями 0 или 1, если это булевой столбец)\n",
|
|||
|
"X_train_encoded = pd.get_dummies(X_train, drop_first=True)\n",
|
|||
|
"X_test_encoded = pd.get_dummies(X_test, drop_first=True)\n",
|
|||
|
"# Устранение различий в количестве столбцов между обучающей и тестовой выборками\n",
|
|||
|
"X_test_encoded = X_test_encoded.reindex(columns=X_train_encoded.columns, fill_value=0)\n",
|
|||
|
"# Проверка типов данных\n",
|
|||
|
"print(X_train_encoded.dtypes)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 20,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Средняя абсолютная ошибка: 127.64575343475643\n",
|
|||
|
"Среднеквадратичная ошибка: 16571.79812749788\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Обучение модели линейной регрессии (поиск зависимостей между признаками)\n",
|
|||
|
"model = LinearRegression()\n",
|
|||
|
"model.fit(X_train_encoded, y_train)\n",
|
|||
|
"# Предсказание цены на тестовой выборке\n",
|
|||
|
"predictions = model.predict(X_test_encoded)\n",
|
|||
|
"# Оценка качества модели\n",
|
|||
|
"mae = mean_absolute_error(y_test, predictions)\n",
|
|||
|
"mse = mean_squared_error(y_test, predictions)\n",
|
|||
|
"print(\"Средняя абсолютная ошибка:\", mae)\n",
|
|||
|
"print(\"Среднеквадратичная ошибка:\", mse)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 21,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"время, затраченное на обучение модели: 0.18988895416259766. Время, затраченное на предсказание: 0.002999544143676758\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Оценка скорости вычисления\n",
|
|||
|
"import time\n",
|
|||
|
"start_time = time.time()\n",
|
|||
|
"model.fit(X_train_encoded, y_train)\n",
|
|||
|
"training_time = time.time() - start_time\n",
|
|||
|
"\n",
|
|||
|
"start_time = time.time()\n",
|
|||
|
"predictions = model.predict(X_test_encoded)\n",
|
|||
|
"prediction_time = time.time() - start_time\n",
|
|||
|
"\n",
|
|||
|
"print(f'время, затраченное на обучение модели: {training_time}. Время, затраченное на предсказание: {prediction_time}')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 22,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAo0AAAIYCAYAAAAIBta9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOydeVyU1ffHP88MMOy7MuCGiCioiIoirmgolJaWueU3xH1Pcw0zxZU0LdytLNHSskXNpdxIzC3NBVfccE3FHRTUAWbm94c/J+9zH5yHYQZQztvXedW9z7nLs3Lm3nPPFfR6vR4EQRAEQRAE8QIUJd0BgiAIgiAIovRDRiNBEARBEARhFDIaCYIgCIIgCKOQ0UgQBEEQBEEYhYxGgiAIgiAIwihkNBIEQRAEQRBGIaORIAiCIAiCMAoZjQRBEARBEIRRyGgkCIIgCIIgjEJGI0EQBEEQBGEUMhoJgiAIgiAsxF9//YU333wTPj4+EAQB69atM1omJSUF9evXh0qlgr+/P5KSkjidhQsXwtfXF7a2tggLC8OBAwfM33kRZDS+hMTGxkIQBAiCAGtra3h5eaFNmzb49ttvodPpZNeTlJQEV1dXy3WUIAiCIMo4OTk5qFu3LhYuXChL/+LFi2jXrh1atWqF1NRUjBgxAn379sWWLVsMOqtXr8bIkSMxadIkHD58GHXr1kVUVBRu3bplqdMAAAh6vV5v0RYIsxMbG4ubN29i2bJl0Gq1uHnzJjZv3oyEhAQ0b94c69evh5WVldF6kpKSMGLECGRmZlq+0wRBEARRxhEEAWvXrkXHjh0L1Bk3bhw2bdqEEydOGPK6deuGzMxMbN68GQAQFhaGhg0bYsGCBQAAnU6HSpUqYdiwYfjoo48s1n8aaXxJUalUUKvVqFChAurXr4/x48fjt99+wx9//GEYxv78889Rp04dODg4oFKlShg8eDCys7MBPB367tWrF7KysgyjlvHx8QAAjUaD0aNHo0KFCnBwcEBYWBhSUlJK5kQJgiAIopSh0Wjw4MEDRjQajVnq3rdvHyIjI5m8qKgo7Nu3DwCQm5uLQ4cOMToKhQKRkZEGHUthfDiKeGlo3bo16tatizVr1qBv375QKBSYN28eqlatigsXLmDw4MEYO3YsFi1ahCZNmiAxMRETJ07EmTNnAACOjo4AgKFDh+LUqVP48ccf4ePjg7Vr1yI6OhrHjx9H9erVZfcn784FJt2ibh9ORzzQLQhCYU8bADBS58PldTw8waS6OBTsb6sJYfGcytRdlvllp93xI5fnOfAHi7R1qWkVLq/TKeOfCPFUhVLiHs7UOTPpfrobnI4Atpyeq5nX2f2OC6dz+jdbJj1Id4fTkYP4WVwkeHA6curWic5DKfC/1d2U9kz6U6g4nYBuMn7jW4l0NHmcyoU1bH/Cbx03Xq8E4nuxyzOY06kRV1lGRRLnpWfdbI5NvcbXHX6PSdsOjzXeloT7jqC0ZtKpXddyOlod28fRyrucjinfMiuJcZvfO7PPgt+ys0brMZVLo0KZtM3gqZxOzjD+uy2my272ee2Tz78rVYTHTDod9pxOj+vfG22rqIj/LplKwoIVmDx5MpM3adIkw+BLUcjIyICXlxeT5+XlhQcPHuDx48e4f/8+tFqtpM7p06eL3P6LIKPxFaNmzZo4duwYAGDEiBGGfF9fX0ybNg0DBw7EokWLYGNjAxcXFwiCALVabdC7cuUKli1bhitXrsDH56khNnr0aGzevBnLli3DjBkzivV8CIIgCKK0ERcXh5EjRzJ5KhX/Y+9Vg4zGVwy9Xm/4hbt9+3YkJCTg9OnTePDgAfLz8/HkyRM8evQI9vb8rzwAOH78OLRaLQICAph8jUYDDw/+1+Pzx8VD8wqNpky8RARBEMRLgk5rlmpUKpXF/r6p1WrcvHmTybt58yacnZ1hZ2cHpVIJpVIpqfP8IJAlIJ/GV4y0tDRUrVoVly5dQvv27REcHIxff/0Vhw4dMqzcys3NLbB8dnY2lEolDh06hNTUVIOkpaVh7ty5BZZLSEiAi4sLIzPnLjH7+REEQRCEyeh15hELEh4ejuTkZCZv27ZtCA8PBwDY2NigQYMGjI5Op0NycrJBx1LQSOMrxJ9//onjx4/jww8/xKFDh6DT6TBnzhwo/t8n76effmL0bWxsoNWyv7rq1asHrVaLW7duoXnz5rLblhqqVzzk/ZAIgiAIoiyRnZ2N8+fPG9IXL15Eamoq3N3dUblyZcTFxeHatWtYsWIFAGDgwIFYsGABxo4di969e+PPP//ETz/9hE2bNhnqGDlyJHr27InQ0FA0atQIiYmJyMnJQa9evSx6LmQ0vqRoNBpkZGRwIXfat2+PmJgYnDhxAnl5eZg/fz7efPNN7NmzB0uWsCN/vr6+yM7ORnJyMurWrQt7e3sEBASgR48eiImJwZw5c1CvXj3cvn0bycnJCA4ORrt27ST7IzVUHx7YgUn/dfQb816E58jszr8oLUIHm6VusbP6upB8TqdT04/N0paYnnn8Qo+7+xaZp3LRr+Ut0as4lc2fuJulqZMz2B8Q+7982yz1/jj4GJcXZnufSe+aIOOXt0Ji0kW0aCJ96ilOZe+OgkffC4P2n9+Z9D9Dj3I6H/1sfGIoD2yf7SUWG7z7hH1+72wYL6eLRjnYYyuX98P0+xKahae2RN7dNPZ7062reb4vA5T8grBgPGLSfx000zsoQd9wdgHflXnmeVekGDWJXRRy5Et+0YuHaJHW2zo3Tue3VexK3ydzkzid43vKMWl/xSNOp1goRCxjc3Hw4EG0atXKkH42wNKzZ08kJSXhxo0buHLliuF41apVsWnTJnz44YeYO3cuKlasiKVLlyIqKsqg07VrV9y+fRsTJ05ERkYGQkJCsHnzZm5xjLkho/ElZfPmzfD29oaVlRXc3NxQt25dzJs3Dz179oRCoUDdunXx+eefY+bMmYiLi0OLFi2QkJCAmJgYQx1NmjTBwIED0bVrV9y9e9ew8mvZsmWYNm0aRo0ahWvXrsHT0xONGzdG+/btS/CMCYIgCKJo6C08tSxFREQEt7r+eaR2e4mIiMCRI0deWO/QoUMxdOjQonavUJDR+BKSlJQk+ZCJ+fDDD/Hhhx8yee+//z6TXrx4MRYvXszkWVtbY/LkyVw4AYIgCIIgyi5kNBIEQRAEUTYogenpVwkyGgmLUZw7VErGBzZX+zLijUsFoi71SF00U1CILpBOIii38BJen+JE5FMpSDxP4iDhCokH06Q/h1L+nHIQ/fE19WmS6rO4LoWM50d8fZ7WY/zllSonpqw8vaZ+MwVByWZIPAxWAnunlYoSMt5KYHr6VYJC7hAEQRAEQRBGoZFGgiAIgiDKBmYK7l1WoZHGVwRBELBu3bqS7gZBEARBlF5eguDepRkaaXxJyMjIwPTp07Fp0yZcu3YN5cuXR0hICEaMGIHXXnutpLsnybPtDIsDrcQmN+ZqX1yL1PdCkOP4aAIKS/qFmuvDJ+HDSBSSfD72pxg5/nlyRgG4u27BhQEKOY+GxGnJKSd2yZVzfaQQlytpc0BXjL7gYkz9Zur17OidPl/Cv1Tkl6rJLyHzgxbCFAkyGl8CLl26hKZNm8LV1RWfffYZ6tSpg7y8PGzZsgVDhgzB6dOnS7qLBEEQBEG84tD09EvA4MGDIQgCDhw4gE6dOiEgIAC1atXCyJEj8ffff0uWOX78OFq3bg07Ozt4eHigf//+yM7ONhxPSUlBo0aN4ODgAFdXVzRt2hSXL182HP/tt99Qv3592Nraws/PD5MnT0a+jNEQgiAIgiit6PU6s0hZhYzGUs69e/ewefNmDBkyBA4ODtxxV1dXLi8nJwdRUVFwc3PDP//8g59//hnbt283RI7Pz89Hx44d0bJlSxw7dgz79u1D//79DVMTu3btQkxMDIYPH45Tp07hyy+/RFJSEqZPn27RcyUIgiAIi6LTmUfKKDQ9Xco5f/489Ho9atasKbvMqlWr8OTJE6xYscJgaC5YsABvvvkmZs6cCWtra2RlZaF9+/aoVq0aACAwMNBQfvLkyfjoo4/Qs2dPAIC
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 2 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Оценка корреляции\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"\n",
|
|||
|
"corr_matrix = df_train_regression.corr()\n",
|
|||
|
"sns.heatmap(corr_matrix, annot=False)\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.12.5"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|