628 lines
112 KiB
Plaintext
628 lines
112 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Лабораторная работа 2\n",
|
|||
|
"\n",
|
|||
|
"Вариант - 9\n",
|
|||
|
"\n",
|
|||
|
"Датасеты - магазины\n",
|
|||
|
"\n",
|
|||
|
"1. Цены на кофе\thttps://www.kaggle.com/datasets/mayankanand2701/starbucks-stock-price-dataset\n",
|
|||
|
"2. Цены на акции\thttps://www.kaggle.com/datasets/nancyalaswad90/yamana-gold-inc-stock-price\n",
|
|||
|
"3. Цены на золото\thttps://www.kaggle.com/datasets/sid321axn/gold-price-prediction-dataset\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 26,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Index(['id', 'carat', 'cut', 'color', 'clarity', 'depth', 'table', 'price',\n",
|
|||
|
" 'x', 'y', 'z'],\n",
|
|||
|
" dtype='object')\n",
|
|||
|
"Зашумленные столбцы: []\n",
|
|||
|
"Смещение: id 0.000000\n",
|
|||
|
"carat 1.116705\n",
|
|||
|
"cut -0.717161\n",
|
|||
|
"color -0.189454\n",
|
|||
|
"clarity 0.551503\n",
|
|||
|
"depth -0.082187\n",
|
|||
|
"table 0.796836\n",
|
|||
|
"price 1.618476\n",
|
|||
|
"x 0.378685\n",
|
|||
|
"y 2.434233\n",
|
|||
|
"z 1.522481\n",
|
|||
|
"dtype: float64\n",
|
|||
|
"Сильно смещенные столбцы: ['carat', 'price', 'y', 'z']\n",
|
|||
|
"Данные 2022 года, возможна неактуальность\n",
|
|||
|
"Выбросы в столбце 'id':\n",
|
|||
|
"Series([], Name: id, dtype: int64)\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'carat':\n",
|
|||
|
"12246 2.06\n",
|
|||
|
"13002 2.14\n",
|
|||
|
"13118 2.15\n",
|
|||
|
"13757 2.22\n",
|
|||
|
"13991 2.01\n",
|
|||
|
" ... \n",
|
|||
|
"27741 2.15\n",
|
|||
|
"27742 2.04\n",
|
|||
|
"27744 2.29\n",
|
|||
|
"27746 2.07\n",
|
|||
|
"27749 2.29\n",
|
|||
|
"Name: carat, Length: 1889, dtype: float64\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'cut':\n",
|
|||
|
"Series([], Name: cut, dtype: int64)\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'color':\n",
|
|||
|
"Series([], Name: color, dtype: int64)\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'clarity':\n",
|
|||
|
"Series([], Name: clarity, dtype: int64)\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'depth':\n",
|
|||
|
"2 56.9\n",
|
|||
|
"8 65.1\n",
|
|||
|
"24 58.1\n",
|
|||
|
"35 58.2\n",
|
|||
|
"42 65.2\n",
|
|||
|
" ... \n",
|
|||
|
"53882 65.4\n",
|
|||
|
"53886 58.0\n",
|
|||
|
"53890 57.9\n",
|
|||
|
"53895 57.8\n",
|
|||
|
"53927 58.1\n",
|
|||
|
"Name: depth, Length: 2545, dtype: float64\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'table':\n",
|
|||
|
"2 65.0\n",
|
|||
|
"91 69.0\n",
|
|||
|
"145 64.0\n",
|
|||
|
"219 64.0\n",
|
|||
|
"227 67.0\n",
|
|||
|
" ... \n",
|
|||
|
"53695 65.0\n",
|
|||
|
"53697 65.0\n",
|
|||
|
"53756 64.0\n",
|
|||
|
"53757 64.0\n",
|
|||
|
"53785 65.0\n",
|
|||
|
"Name: table, Length: 605, dtype: float64\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'price':\n",
|
|||
|
"23820 11886\n",
|
|||
|
"23821 11886\n",
|
|||
|
"23822 11888\n",
|
|||
|
"23823 11888\n",
|
|||
|
"23824 11888\n",
|
|||
|
" ... \n",
|
|||
|
"27745 18803\n",
|
|||
|
"27746 18804\n",
|
|||
|
"27747 18806\n",
|
|||
|
"27748 18818\n",
|
|||
|
"27749 18823\n",
|
|||
|
"Name: price, Length: 3540, dtype: int64\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'x':\n",
|
|||
|
"11182 0.00\n",
|
|||
|
"11963 0.00\n",
|
|||
|
"15951 0.00\n",
|
|||
|
"22741 9.54\n",
|
|||
|
"22831 9.38\n",
|
|||
|
"23644 9.53\n",
|
|||
|
"24131 9.44\n",
|
|||
|
"24297 9.49\n",
|
|||
|
"24328 9.65\n",
|
|||
|
"24520 0.00\n",
|
|||
|
"24816 9.42\n",
|
|||
|
"25460 9.44\n",
|
|||
|
"25850 9.32\n",
|
|||
|
"25998 10.14\n",
|
|||
|
"25999 10.02\n",
|
|||
|
"26243 0.00\n",
|
|||
|
"26431 9.42\n",
|
|||
|
"26444 10.01\n",
|
|||
|
"26534 9.86\n",
|
|||
|
"26932 9.30\n",
|
|||
|
"27130 10.00\n",
|
|||
|
"27415 10.74\n",
|
|||
|
"27429 0.00\n",
|
|||
|
"27514 9.36\n",
|
|||
|
"27630 10.23\n",
|
|||
|
"27638 9.51\n",
|
|||
|
"27649 9.44\n",
|
|||
|
"27679 9.66\n",
|
|||
|
"27684 9.35\n",
|
|||
|
"27685 9.41\n",
|
|||
|
"49556 0.00\n",
|
|||
|
"49557 0.00\n",
|
|||
|
"Name: x, dtype: float64\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'y':\n",
|
|||
|
"11963 0.00\n",
|
|||
|
"15951 0.00\n",
|
|||
|
"22741 9.38\n",
|
|||
|
"22831 9.31\n",
|
|||
|
"23644 9.48\n",
|
|||
|
"24067 58.90\n",
|
|||
|
"24131 9.40\n",
|
|||
|
"24297 9.42\n",
|
|||
|
"24328 9.59\n",
|
|||
|
"24520 0.00\n",
|
|||
|
"25460 9.37\n",
|
|||
|
"25998 10.10\n",
|
|||
|
"25999 9.94\n",
|
|||
|
"26243 0.00\n",
|
|||
|
"26431 9.34\n",
|
|||
|
"26444 9.94\n",
|
|||
|
"26534 9.81\n",
|
|||
|
"27130 9.85\n",
|
|||
|
"27415 10.54\n",
|
|||
|
"27429 0.00\n",
|
|||
|
"27514 9.31\n",
|
|||
|
"27630 10.16\n",
|
|||
|
"27638 9.46\n",
|
|||
|
"27649 9.38\n",
|
|||
|
"27679 9.63\n",
|
|||
|
"27685 9.32\n",
|
|||
|
"49189 31.80\n",
|
|||
|
"49556 0.00\n",
|
|||
|
"49557 0.00\n",
|
|||
|
"Name: y, dtype: float64\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'z':\n",
|
|||
|
"2207 0.00\n",
|
|||
|
"2314 0.00\n",
|
|||
|
"4791 0.00\n",
|
|||
|
"5471 0.00\n",
|
|||
|
"10167 0.00\n",
|
|||
|
"11182 0.00\n",
|
|||
|
"11963 0.00\n",
|
|||
|
"13601 0.00\n",
|
|||
|
"14635 1.07\n",
|
|||
|
"15951 0.00\n",
|
|||
|
"16283 5.77\n",
|
|||
|
"17196 5.76\n",
|
|||
|
"19346 5.97\n",
|
|||
|
"21758 5.98\n",
|
|||
|
"22540 5.91\n",
|
|||
|
"23539 5.79\n",
|
|||
|
"23644 6.38\n",
|
|||
|
"24067 8.06\n",
|
|||
|
"24131 5.85\n",
|
|||
|
"24297 5.92\n",
|
|||
|
"24328 6.03\n",
|
|||
|
"24394 0.00\n",
|
|||
|
"24520 0.00\n",
|
|||
|
"25998 6.17\n",
|
|||
|
"25999 6.24\n",
|
|||
|
"26100 5.75\n",
|
|||
|
"26123 0.00\n",
|
|||
|
"26194 6.16\n",
|
|||
|
"26243 0.00\n",
|
|||
|
"26431 6.27\n",
|
|||
|
"26444 6.31\n",
|
|||
|
"26534 6.13\n",
|
|||
|
"26744 5.86\n",
|
|||
|
"27112 0.00\n",
|
|||
|
"27130 6.43\n",
|
|||
|
"27415 6.98\n",
|
|||
|
"27429 0.00\n",
|
|||
|
"27503 0.00\n",
|
|||
|
"27515 5.90\n",
|
|||
|
"27516 5.90\n",
|
|||
|
"27517 5.77\n",
|
|||
|
"27518 5.77\n",
|
|||
|
"27630 6.72\n",
|
|||
|
"27679 6.03\n",
|
|||
|
"27739 0.00\n",
|
|||
|
"48410 31.80\n",
|
|||
|
"49556 0.00\n",
|
|||
|
"49557 0.00\n",
|
|||
|
"51506 0.00\n",
|
|||
|
"Name: z, dtype: float64\n",
|
|||
|
"\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.92) между столбцами 'carat' и 'price'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.98) между столбцами 'carat' и 'x'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.95) между столбцами 'carat' и 'y'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.95) между столбцами 'carat' и 'z'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.92) между столбцами 'price' и 'carat'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.98) между столбцами 'x' и 'carat'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.97) между столбцами 'x' и 'y'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.97) между столбцами 'x' и 'z'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.95) между столбцами 'y' и 'carat'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.97) между столбцами 'y' и 'x'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.95) между столбцами 'y' и 'z'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.95) между столбцами 'z' и 'carat'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.97) между столбцами 'z' и 'x'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.95) между столбцами 'z' и 'y'\n",
|
|||
|
"log_price 8.319386\n",
|
|||
|
"id 8.135995\n",
|
|||
|
"log_carat 1.963082\n",
|
|||
|
"carat 1.961620\n",
|
|||
|
"y 1.491815\n",
|
|||
|
"x 1.481414\n",
|
|||
|
"clarity 0.359837\n",
|
|||
|
"color 0.288875\n",
|
|||
|
"cut 0.104551\n",
|
|||
|
"table 0.057094\n",
|
|||
|
"depth 0.037126\n",
|
|||
|
"dtype: float64\n",
|
|||
|
"Данные по массе достоверны\n",
|
|||
|
"Уникальные значения 'cut': [4 3 1 2 0]\n",
|
|||
|
"Уникальные значения 'clarity': [1 2 4 3 5 6 0 7]\n",
|
|||
|
"carat\n",
|
|||
|
"0.30 2604\n",
|
|||
|
"0.31 2249\n",
|
|||
|
"1.01 2242\n",
|
|||
|
"2.00 2154\n",
|
|||
|
"0.70 1982\n",
|
|||
|
" ... \n",
|
|||
|
"1.95 3\n",
|
|||
|
"1.85 3\n",
|
|||
|
"1.94 3\n",
|
|||
|
"1.99 3\n",
|
|||
|
"1.92 2\n",
|
|||
|
"Name: count, Length: 181, dtype: int64\n",
|
|||
|
"Index(['carat', 'price', 'cut'], dtype='object')\n",
|
|||
|
"Обучающая выборка: (32365, 3)\n",
|
|||
|
"carat\n",
|
|||
|
"0.30 1562\n",
|
|||
|
"1.01 1355\n",
|
|||
|
"0.31 1338\n",
|
|||
|
"2.00 1269\n",
|
|||
|
"0.70 1156\n",
|
|||
|
" ... \n",
|
|||
|
"1.85 2\n",
|
|||
|
"1.89 2\n",
|
|||
|
"1.97 1\n",
|
|||
|
"1.88 1\n",
|
|||
|
"1.92 1\n",
|
|||
|
"Name: count, Length: 181, dtype: int64\n",
|
|||
|
"Контрольная выборка: (10789, 3)\n",
|
|||
|
"carat\n",
|
|||
|
"0.30 500\n",
|
|||
|
"0.31 474\n",
|
|||
|
"2.00 441\n",
|
|||
|
"1.01 435\n",
|
|||
|
"0.70 425\n",
|
|||
|
" ... \n",
|
|||
|
"1.84 1\n",
|
|||
|
"0.88 1\n",
|
|||
|
"1.83 1\n",
|
|||
|
"0.20 1\n",
|
|||
|
"1.85 1\n",
|
|||
|
"Name: count, Length: 173, dtype: int64\n",
|
|||
|
"Тестовая выборка: (10789, 3)\n",
|
|||
|
"carat\n",
|
|||
|
"0.30 542\n",
|
|||
|
"1.01 452\n",
|
|||
|
"2.00 444\n",
|
|||
|
"0.31 437\n",
|
|||
|
"0.70 401\n",
|
|||
|
" ... \n",
|
|||
|
"1.68 1\n",
|
|||
|
"1.98 1\n",
|
|||
|
"1.87 1\n",
|
|||
|
"1.48 1\n",
|
|||
|
"1.99 1\n",
|
|||
|
"Name: count, Length: 175, dtype: int64\n",
|
|||
|
"Обучающая выборка: (32365, 3)\n",
|
|||
|
"carat\n",
|
|||
|
"0.30 1562\n",
|
|||
|
"1.01 1355\n",
|
|||
|
"0.31 1338\n",
|
|||
|
"2.00 1269\n",
|
|||
|
"0.70 1156\n",
|
|||
|
" ... \n",
|
|||
|
"1.85 2\n",
|
|||
|
"1.89 2\n",
|
|||
|
"1.97 1\n",
|
|||
|
"1.88 1\n",
|
|||
|
"1.92 1\n",
|
|||
|
"Name: count, Length: 181, dtype: int64\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"ename": "ValueError",
|
|||
|
"evalue": "Unknown label type: continuous. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values.",
|
|||
|
"output_type": "error",
|
|||
|
"traceback": [
|
|||
|
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
|
|||
|
"\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)",
|
|||
|
"Cell \u001b[1;32mIn[26], line 157\u001b[0m\n\u001b[0;32m 154\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mОбучающая выборка: \u001b[39m\u001b[38;5;124m\"\u001b[39m, df_train\u001b[38;5;241m.\u001b[39mshape)\n\u001b[0;32m 155\u001b[0m \u001b[38;5;28mprint\u001b[39m(df_train\u001b[38;5;241m.\u001b[39mcarat\u001b[38;5;241m.\u001b[39mvalue_counts())\n\u001b[1;32m--> 157\u001b[0m X_resampled, y_resampled \u001b[38;5;241m=\u001b[39m \u001b[43mada\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfit_resample\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdf_train\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdf_train\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mcarat\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 158\u001b[0m df_train_adasyn \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mDataFrame(X_resampled)\n\u001b[0;32m 160\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mОбучающая выборка после oversampling: \u001b[39m\u001b[38;5;124m\"\u001b[39m, df_train_adasyn\u001b[38;5;241m.\u001b[39mshape)\n",
|
|||
|
"File \u001b[1;32mc:\\Python312\\Lib\\site-packages\\imblearn\\base.py:208\u001b[0m, in \u001b[0;36mBaseSampler.fit_resample\u001b[1;34m(self, X, y)\u001b[0m\n\u001b[0;32m 187\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Resample the dataset.\u001b[39;00m\n\u001b[0;32m 188\u001b[0m \n\u001b[0;32m 189\u001b[0m \u001b[38;5;124;03mParameters\u001b[39;00m\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 205\u001b[0m \u001b[38;5;124;03m The corresponding label of `X_resampled`.\u001b[39;00m\n\u001b[0;32m 206\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m 207\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_validate_params()\n\u001b[1;32m--> 208\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfit_resample\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m)\u001b[49m\n",
|
|||
|
"File \u001b[1;32mc:\\Python312\\Lib\\site-packages\\imblearn\\base.py:104\u001b[0m, in \u001b[0;36mSamplerMixin.fit_resample\u001b[1;34m(self, X, y)\u001b[0m\n\u001b[0;32m 83\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mfit_resample\u001b[39m(\u001b[38;5;28mself\u001b[39m, X, y):\n\u001b[0;32m 84\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"Resample the dataset.\u001b[39;00m\n\u001b[0;32m 85\u001b[0m \n\u001b[0;32m 86\u001b[0m \u001b[38;5;124;03m Parameters\u001b[39;00m\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 102\u001b[0m \u001b[38;5;124;03m The corresponding label of `X_resampled`.\u001b[39;00m\n\u001b[0;32m 103\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[1;32m--> 104\u001b[0m \u001b[43mcheck_classification_targets\u001b[49m\u001b[43m(\u001b[49m\u001b[43my\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 105\u001b[0m arrays_transformer \u001b[38;5;241m=\u001b[39m ArraysTransformer(X, y)\n\u001b[0;32m 106\u001b[0m X, y, binarize_y \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_check_X_y(X, y)\n",
|
|||
|
"File \u001b[1;32mc:\\Python312\\Lib\\site-packages\\sklearn\\utils\\multiclass.py:219\u001b[0m, in \u001b[0;36mcheck_classification_targets\u001b[1;34m(y)\u001b[0m\n\u001b[0;32m 211\u001b[0m y_type \u001b[38;5;241m=\u001b[39m type_of_target(y, input_name\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124my\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m 212\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m y_type \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m [\n\u001b[0;32m 213\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mbinary\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[0;32m 214\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mmulticlass\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 217\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mmultilabel-sequences\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[0;32m 218\u001b[0m ]:\n\u001b[1;32m--> 219\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[0;32m 220\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mUnknown label type: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00my_type\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m. Maybe you are trying to fit a \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m 221\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mclassifier, which expects discrete classes on a \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m 222\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mregression target with continuous values.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m 223\u001b[0m )\n",
|
|||
|
"\u001b[1;31mValueError\u001b[0m: Unknown label type: continuous. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1UAAANECAYAAABYdQX4AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADrxElEQVR4nOzde1xUdf4/8NdwmQHUAVG5JSqphXgPE0fLSyKjsm0ka1luoqGuNLgim7dSRK0oy1uG0k1xH+mmtOWmtsCIgpl4QykvaVYYfdMBS2EUFEbm/P7wN2cducrMMBdez8eDR8457znz/nyG+TRvzjmfj0QQBAFERERERETULE7WToCIiIiIiMiesagiIiIiIiIyAYsqIiIiIiIiE7CoIiIiIiIiMgGLKiIiIiIiIhOwqCIiIiIiIjIBiyoiIiIiIiITsKgiIiIiIiIyAYsqIiIiIiIiE7CoIpuTnp4OiUSCixcvNhjXrVs3TJ06tUVyIiIiIiKqD4sqIiIiE126dAnJyckoLCy0dipEZGc2bNiA9PR0a6dBJmJRRTbnhRdewM2bN9G1a1drp0JE1CSXLl3CsmXLWFQR0X1jUeUYWFSRzXF2doabmxskEom1UyGiVurWrVvQ6/XWToOIiOwEiyqyOffeUyUIAl577TV07twZHh4eGDVqFM6cOWPdJInIJvz222+IjY1FQEAAZDIZgoKCEBcXh+rqaly9ehUvv/wy+vbti7Zt20Iul2PcuHH49ttvjY6Rm5sLiUSCTz/9FIsXL8YDDzwADw8PaLXaJh0jNzcXjz76KABg2rRpkEgkkEgk/MszkYNraPxJTk6u84/D937H6datG86cOYO8vDxx7Bg5cmTLNoTMwsXaCRA1JikpCa+99hrGjx+P8ePH48SJE4iIiEB1dbW1UyMiK7p06RIGDx6MsrIyzJw5E8HBwfjtt9/w2WefobKyEj///DN27tyJiRMnIigoCCUlJXj//fcxYsQInD17FgEBAUbHW7FiBaRSKV5++WVUVVVBKpXi7NmzjR6jV69eWL58OZKSkjBz5kw8/vjjAIChQ4dao1uIqAU0Nv401dq1azF79my0bdsWr776KgDA19fXUmmTJQlENmbz5s0CAKGoqEgoLS0VpFKpEBkZKej1ejHmlVdeEQAIMTEx1kuUiKxqypQpgpOTk3Ds2LFa+/R6vXDr1i2hpqbGaHtRUZEgk8mE5cuXi9v2798vABAefPBBobKy0ii+qcc4duyYAEDYvHmzGVpGRLausfFn6dKlQl1fs+/+jmPQu3dvYcSIERbMlloCL/8jm7Z3715UV1dj9uzZRqfRExISrJcUEVmdXq/Hzp078eSTT2LQoEG19kskEshkMjg53fnfXE1NDf744w+0bdsWDz/8ME6cOFHrOTExMXB3dzfadr/HICLH15Txh1ofFlVk03755RcAQM+ePY22d+rUCe3bt7dGSkRkA65cuQKtVos+ffrUG6PX67FmzRr07NkTMpkMHTt2RKdOnfDdd9+hvLy8VnxQUJDJxyAix9eU8YdaHxZVRETkkN544w0kJiZi+PDh+OSTT5CVlQW1Wo3evXvXObPfvWepmnMMIiKg/rNVNTU1LZwJtRROVEE2zbBW1YULF/Dggw+K269cuYJr165ZKy0isrJOnTpBLpfj9OnT9cZ89tlnGDVqFD7++GOj7WVlZejYsWOTXqepx+DlPkStR1PGH8PVNGVlZfDy8hK3G67AuRvHD8fAM1Vk08LDw+Hq6or169dDEARx+9q1a62XFBFZnZOTE6KiorBr1y4cP3681n5BEODs7Gw0bgBARkYGfvvttya/TlOP0aZNGwB3vkARkWNryvjTvXt3AMCBAwfE7RUVFdiyZUut+DZt2nDscAA8U0U2rVOnTnj55ZeRkpKCP/3pTxg/fjxOnjyJ//73v03+SzMROaY33ngD2dnZGDFiBGbOnIlevXrh8uXLyMjIwMGDB/GnP/0Jy5cvx7Rp0zB06FCcOnUKW7duNTrr3ZimHqN79+7w8vJCWloa2rVrhzZt2iAsLKzO+7SIyP41Nv5ERESgS5cuiI2Nxbx58+Ds7IxNmzahU6dOKC4uNjpWaGgoNm7ciNdeew09evSAj48PnnjiCSu1jJqLRRXZvNdeew1ubm5IS0vD/v37ERYWhuzsbERGRlo7NSKyogceeABHjhzBkiVLsHXrVmi1WjzwwAMYN24cPDw88Morr6CiogLbtm3D9u3b8cgjj2DPnj1YuHBhk1+jqcdwdXXFli1bsGjRIsyaNQu3b9/G5s2bWVQROajGxh9XV1d88cUXeOmll7BkyRL4+fkhISEB7du3x7Rp04yOlZSUhF9++QUrV67E9evXMWLECBZVdkgi3HtdAxERERERETUZ76kiIiIiIiIyAYsqIiIiIiIiE7CoIiIiIiIiMgGLKiIiIiIiIhOwqCIiIiIiIjIBiyoiIiIiIiITtOp1qvR6PS5duoR27dpBIpFYOx0ihyEIAq5fv46AgAA4ObW+v91wbCGyHI4vHF+ILMWU8aVVF1WXLl1CYGCgtdMgcli//vorOnfubO00WhzHFiLL4/hCRJbSnPGlVRdV7dq1A3Cn4+RyeZ0xOp0O2dnZiIiIgKura0umZzfYR41rbX2k1WoRGBgofsZaG0O7i4qKkJ+f32red0tobZ8dS3GkfuT40vh3F8Ax3nO2wTa0pjaYMr606qLKcNpcLpc3WFR5eHhALpfb7S+SpbGPGtda+6i1XppiaHe7du1a5ftuTq31s2NujtiPrX18aei7C+AY7znbYBtaYxuaM760vouRiYiIiIiIzIhFFRERERERkQlYVBEREREREZmARRUREREREZEJWFQRERERERGZoFXP/nc/+iRnoaqmdc401BiZs4CVg9lHDXGkPrr4ZqS1U3Ao3RbuMevx+P4QEZEjMOf/Hw3fwyyJZ6qIiIiIiIhMwKKKiIiIiIjIBCyqiIiIiIiITMCiioiIiIiIyAQsqoiIiIiIiEzAooqIiIgcWkpKCh599FG0a9cOPj4+iIqKwvnz541ibt26BZVKhQ4dOqBt27aIjo5GSUmJUUxxcTEiIyPh4eEBHx8fzJs3D7dv3zaKyc3NxSOPPAKZTIYePXogPT29Vj6pqano1q0b3NzcEBYWhqNHj5q9zUTUslhUERERkUPLy8uDSqXC4cOHoVarodPpEBERgYqKCjFm7ty52LVrFzIyMpCXl4dLly5hwoQJ4v6amhpERkaiuroahw4dwpYtW5Ceno6kpCQxpqioCJGRkRg1ahQKCwuRkJCA6dOnIysrS4zZvn07EhMTsXTpUpw4cQL9+/eHUqlEaWlpy3QGEVkE16kiIiIih5aZmWn0OD09HT4+PigoKMDw4cNRXl6Ojz/+GNu2bcMTTzwBANi8eTN69eqFw4cPY8iQIcjOzsbZs2exd+9e+Pr6YsCAAVixYgUWLFiA5ORkSKVSpKWlISgoCKtWrQIA9OrVCwcPHsSaNWugVCoBAKtXr8aMGTMwbdo0AEBaWhr27NmDTZs2YeHChS3YK0RkTjxTRURERK1KeXk5AMDb2xsAUFBQAJ1Oh/DwcDEmODgYXbp0QX5+PgAgPz8fffv2ha+vrxijVCqh1Wpx5swZMebuYxhiDMeorq5GQUGBUYyTkxPCw8PFGCKyTzxTRURERK2GXq9HQkIChg0bhj59+gAANBoNpFIpvLy8jGJ9fX2h0WjEmLsLKsN+w76GYrRaLW7evIlr166hpqamzphz587VmW9VVRWqqqrEx1qtFgCg0+mg0+nqbadhX0Mxto5tsA3WaoPMWTDfsZzuHKuxNpjSRhZVRERE1GqoVCqcPn0aBw8etHYqTZKSkoJly5bV2p6dnQ0PD49Gn69Wqy2RVotiG2xDS7dh5WDzH7OxNlRWVjb72CyqiIiIqFWIj4/H7t27ceDAAXTu3Fnc7ufnh+rqapSVlRmdrSopKYGfn58Yc+8sfYbZAe+OuXfGwJKSEsjlcri7u8PZ2RnOzs51xhiOca9FixYhMTFRfKzVahEYGIiIiAjI5fJ626rT6aBWqzFmzBi
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x1000 with 12 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import numpy as np\n",
|
|||
|
"import pandas as pd\n",
|
|||
|
"from sklearn.feature_selection import mutual_info_regression\n",
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"from imblearn.over_sampling import ADASYN\n",
|
|||
|
"\n",
|
|||
|
"df = pd.read_csv(\"data/Diamonds.csv\")\n",
|
|||
|
"print(df.columns)\n",
|
|||
|
"\n",
|
|||
|
"# Оценка зашумленности\n",
|
|||
|
"noisy_features = []\n",
|
|||
|
"for col in df.columns:\n",
|
|||
|
" if df[col].isnull().sum() / len(df) > 0.1: # Если более 10% пропусков\n",
|
|||
|
" noisy_features.append(col)\n",
|
|||
|
" \n",
|
|||
|
"print(f\"Зашумленные столбцы: {noisy_features}\")\n",
|
|||
|
"\n",
|
|||
|
"cut_mapping = {'Fair': 0, 'Good': 1, 'Very Good': 2, 'Premium': 3, 'Ideal': 4}\n",
|
|||
|
"df['cut'] = df['cut'].map(cut_mapping)\n",
|
|||
|
"\n",
|
|||
|
"color_mapping = {'J': 0, 'I': 1, 'H': 2, 'G': 3, 'F': 4, 'E': 5, 'D': 6} \n",
|
|||
|
"df['color'] = df['color'].map(color_mapping)\n",
|
|||
|
"\n",
|
|||
|
"clarity_mapping = {'I1': 0, 'SI2': 1, 'SI1': 2, 'VS2': 3, 'VS1': 4, 'VVS2': 5, 'VVS1': 6, 'IF': 7} \n",
|
|||
|
"df['clarity'] = df['clarity'].map(clarity_mapping)\n",
|
|||
|
"\n",
|
|||
|
"skewness = df.skew()\n",
|
|||
|
"print(f\"Смещение: {skewness}\")\n",
|
|||
|
"\n",
|
|||
|
"skewed_features = skewness[abs(skewness) > 1].index.tolist()\n",
|
|||
|
"print(f\"Сильно смещенные столбцы: {skewed_features}\")\n",
|
|||
|
"\n",
|
|||
|
"# Оценка актуальности данных\n",
|
|||
|
"print(f\"Данные 2022 года, возможна неактуальность\")\n",
|
|||
|
"\n",
|
|||
|
"for col in df.select_dtypes(include=['number']).columns:\n",
|
|||
|
" Q1 = df[col].quantile(0.25)\n",
|
|||
|
" Q3 = df[col].quantile(0.75)\n",
|
|||
|
" IQR = Q3 - Q1\n",
|
|||
|
" lower_bound = Q1 - 1.5 * IQR\n",
|
|||
|
" upper_bound = Q3 + 1.5 * IQR\n",
|
|||
|
" outliers = df[col][(df[col] < lower_bound) | (df[col] > upper_bound)]\n",
|
|||
|
" print(f\"Выбросы в столбце '{col}':\\n{outliers}\\n\")\n",
|
|||
|
"\n",
|
|||
|
"if len(df.columns) >= 2:\n",
|
|||
|
" for col1 in df.columns:\n",
|
|||
|
" for col2 in df.columns:\n",
|
|||
|
" if col1 != col2:\n",
|
|||
|
" correlation = df[col1].corr(df[col2])\n",
|
|||
|
" if abs(correlation) > 0.9:\n",
|
|||
|
" print(f\"Просачивание данных: Высокая корреляция ({correlation:.2f}) между столбцами '{col1}' и '{col2}'\")\n",
|
|||
|
"\n",
|
|||
|
"df.hist(figsize=(10, 10))\n",
|
|||
|
"\n",
|
|||
|
"# решение смещения\n",
|
|||
|
"df['log_price'] = np.log(df['price'] + 1)\n",
|
|||
|
"df['log_carat'] = np.log(df['carat'] + 1)\n",
|
|||
|
"\n",
|
|||
|
"# решение выбросов\n",
|
|||
|
"Q1 = df['carat'].quantile(0.25)\n",
|
|||
|
"Q3 = df['carat'].quantile(0.75)\n",
|
|||
|
"IQR = Q3 - Q1\n",
|
|||
|
"lower_bound = Q1 - 1.5 * IQR\n",
|
|||
|
"upper_bound = Q3 + 1.5 * IQR\n",
|
|||
|
"df['carat'] = np.where(df['carat'] > upper_bound, upper_bound, df['carat'])\n",
|
|||
|
"\n",
|
|||
|
"df.drop(columns=['z'], inplace=True) # Если z сильно коррелирует с y и x\n",
|
|||
|
"\n",
|
|||
|
"# Пример оценки информативности для целевой переменной 'price'\n",
|
|||
|
"X = df.drop(columns=['price'])\n",
|
|||
|
"y = df['price']\n",
|
|||
|
"mi_scores = mutual_info_regression(X, y)\n",
|
|||
|
"print(pd.Series(mi_scores, index=X.columns).sort_values(ascending=False))\n",
|
|||
|
"\n",
|
|||
|
"if df['carat'].max() > 5:\n",
|
|||
|
" print(\"Ошибка: Обнаружены значения массы, не соответствующие реальным бриллиантам.\")\n",
|
|||
|
"else:\n",
|
|||
|
" print(\"Данные по массе достоверны\")\n",
|
|||
|
"\n",
|
|||
|
"print(\"Уникальные значения 'cut':\", df['cut'].unique())\n",
|
|||
|
"print(\"Уникальные значения 'clarity':\", df['clarity'].unique())\n",
|
|||
|
"\n",
|
|||
|
"df['cut'] = df['cut'].fillna('unknown')\n",
|
|||
|
"\n",
|
|||
|
"df['carat'] = df['carat'].fillna(df['carat'].mean())\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"print(df.carat.value_counts())\n",
|
|||
|
"\n",
|
|||
|
"def split_stratified_into_train_val_test(\n",
|
|||
|
" df_input,\n",
|
|||
|
" stratify_colname=\"y\",\n",
|
|||
|
" frac_train=0.6,\n",
|
|||
|
" frac_val=0.15,\n",
|
|||
|
" frac_test=0.25,\n",
|
|||
|
" random_state=None,\n",
|
|||
|
"):\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
" if frac_train + frac_val + frac_test != 1.0:\n",
|
|||
|
" raise ValueError(\n",
|
|||
|
" \"fractions %f, %f, %f do not add up to 1.0\"\n",
|
|||
|
" % (frac_train, frac_val, frac_test)\n",
|
|||
|
" )\n",
|
|||
|
"\n",
|
|||
|
" if stratify_colname not in df_input.columns:\n",
|
|||
|
" raise ValueError(\"%s is not a column in the dataframe\" % (stratify_colname))\n",
|
|||
|
"\n",
|
|||
|
" X = df_input # Contains all columns.\n",
|
|||
|
" y = df_input[\n",
|
|||
|
" [stratify_colname]\n",
|
|||
|
" ] # Dataframe of just the column on which to stratify.\n",
|
|||
|
"\n",
|
|||
|
" # Split original dataframe into train and temp dataframes.\n",
|
|||
|
" df_train, df_temp, y_train, y_temp = train_test_split(\n",
|
|||
|
" X, y, stratify=y, test_size=(1.0 - frac_train), random_state=random_state\n",
|
|||
|
" )\n",
|
|||
|
"\n",
|
|||
|
" # Split the temp dataframe into val and test dataframes.\n",
|
|||
|
" relative_frac_test = frac_test / (frac_val + frac_test)\n",
|
|||
|
" df_val, df_test, y_val, y_test = train_test_split(\n",
|
|||
|
" df_temp,\n",
|
|||
|
" y_temp,\n",
|
|||
|
" stratify=y_temp,\n",
|
|||
|
" test_size=relative_frac_test,\n",
|
|||
|
" random_state=random_state,\n",
|
|||
|
" )\n",
|
|||
|
"\n",
|
|||
|
" assert len(df_input) == len(df_train) + len(df_val) + len(df_test)\n",
|
|||
|
"\n",
|
|||
|
" return df_train, df_val, df_test\n",
|
|||
|
"\n",
|
|||
|
"data = df[[\"carat\", \"price\", \"cut\"]].copy()\n",
|
|||
|
"\n",
|
|||
|
"df_train, df_val, df_test = split_stratified_into_train_val_test(\n",
|
|||
|
" data, stratify_colname=\"cut\", frac_train=0.60, frac_val=0.20, frac_test=0.20\n",
|
|||
|
")\n",
|
|||
|
" \n",
|
|||
|
"\n",
|
|||
|
"print(df_train.columns) \n",
|
|||
|
" \n",
|
|||
|
"\n",
|
|||
|
"print(\"Обучающая выборка: \", df_train.shape)\n",
|
|||
|
"print(df_train.carat.value_counts()) \n",
|
|||
|
"\n",
|
|||
|
"print(\"Контрольная выборка: \", df_val.shape)\n",
|
|||
|
"print(df_val.carat.value_counts())\n",
|
|||
|
"\n",
|
|||
|
"print(\"Тестовая выборка: \", df_test.shape)\n",
|
|||
|
"print(df_test.carat.value_counts())\n",
|
|||
|
"\n",
|
|||
|
"ada = ADASYN()\n",
|
|||
|
"\n",
|
|||
|
"print(\"Обучающая выборка: \", df_train.shape)\n",
|
|||
|
"print(df_train.carat.value_counts())\n",
|
|||
|
"\n",
|
|||
|
"X_resampled, y_resampled = ada.fit_resample(df_train, df_train[\"carat\"])\n",
|
|||
|
"df_train_adasyn = pd.DataFrame(X_resampled)\n",
|
|||
|
"\n",
|
|||
|
"print(\"Обучающая выборка после oversampling: \", df_train_adasyn.shape)\n",
|
|||
|
"print(df_train_adasyn.carat.value_counts())\n",
|
|||
|
"\n",
|
|||
|
"df_train_adasyn"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"**Датасет 1. Цены бриллиантов**\n",
|
|||
|
"1. **carat**: Вес бриллианта в каратах\n",
|
|||
|
"2. **cut**: Качество огранки.\n",
|
|||
|
"3. **color**: Цвет бриллианта\n",
|
|||
|
"4. **clarity**: Чистота бриллианта\n",
|
|||
|
"5. **depth**: Процент глубины бриллианта\n",
|
|||
|
"6. **table**: Процент ширины бриллианта\n",
|
|||
|
"7. **price**: Цена бриллианта в долларах США\n",
|
|||
|
"8. **x**: Длина бриллианта в миллиметрах\n",
|
|||
|
"9. **y**: Ширина бриллианта в миллиметрах\n",
|
|||
|
"10. **z**: Глубина бриллианта в миллиметрах\n",
|
|||
|
"\n",
|
|||
|
"**Объект наблюдения**: Каждый объект представляет собой отдельный бриллиант.\\\n",
|
|||
|
"**Связи между объектами**: Внутри одного объекта есть взаимосвязь между характеристиками и его ценой. Например, вес, цвет, чистота и огранка могут влиять на цену.\\\n",
|
|||
|
"**Бизнес-цель**: Оптимизация продаж бриллиантов, оценка цен в зависимости от характеристик.\\\n",
|
|||
|
"**Эффект для бизнеса**: Более точная оценка стоимости бриллиантов может помочь ювелирам предлагать конкурентоспособные цены и максимизировать прибыль.\\\n",
|
|||
|
"**Техническая цель**: Построение модели машинного обучения для прогнозирования цены бриллианта на основе его характеристик.\\\n",
|
|||
|
"* **Вход**: Характеристики бриллианта (вес, огранка, цвет, чистота, размеры).\\\n",
|
|||
|
"* **Целевой признак**: Цена бриллианта. \n",
|
|||
|
"\n",
|
|||
|
"**Информативность**: Высокая. Набор данных содержит важные характеристики бриллиантов, которые влияют на их цену: карат, огранка, цвет, чистота, размеры.\\\n",
|
|||
|
"**Степень покрытия**: Высокая. В наборе данных представлено 53 940 бриллиантов, что является достаточно большим объемом для анализа. \\\n",
|
|||
|
"**Соответствие реальным данным**: Высокая. Характеристики бриллиантов в наборе данных соответствуют реальным характеристикам бриллиантов, определяемым геммологами. \\\n",
|
|||
|
"**Согласованность меток**: Высокая. В данном наборе данных нет проблем с несогласованностью меток, так как все данные соответствуют описанию в заголовках столбцов. \n",
|
|||
|
"\n",
|
|||
|
"**Датасет 2. Цены акций Starbucks**\n",
|
|||
|
"1. **Date**: Дата торгов\n",
|
|||
|
"2. **Open**: Цена открытия торгов\n",
|
|||
|
"3. **High**: Максимальная цена акции за день\n",
|
|||
|
"4. **Low**: Минимальная цена акции за день\n",
|
|||
|
"5. **Close**: Цена закрытия торгов в данный день\n",
|
|||
|
"6. **Adj Close**: Скоректированная цена закрытия.\n",
|
|||
|
"7. **Volume**: Объем торгов акциями в данный день.\n",
|
|||
|
"\n",
|
|||
|
"**Объект наблюдения**: Объектом наблюдения является торговый день на рынке акций компании Starbucks.\\\n",
|
|||
|
"**Связи между объектами**: Временная связь между днями торгов. Важна динамика изменений цен и объемов торгов в зависимости от времени.\\\n",
|
|||
|
"**Бизнес-цель**: Прогнозирование цен акций для управления портфелем акций.\\\n",
|
|||
|
"**Эффект для бизнеса**: Прогнозирование позволит трейдерам принимать более информированные решения, оптимизировать инвестиции и минимизировать риски.\\\n",
|
|||
|
"**Техническая цель**: Прогнозирование цены закрытия акций на основе временных рядов.\\\n",
|
|||
|
"* **Вход**: Временные ряды с историческими данными по ценам открытия, закрытия, объёмам.\\\n",
|
|||
|
"* **Целевой признак**: Цена закрытия на следующий день.\n",
|
|||
|
"\n",
|
|||
|
"**Датасет 3. Цены на золото**\n",
|
|||
|
"1. **Date**: Дата\n",
|
|||
|
"2. **Open**: Цена открытия торгов\n",
|
|||
|
"3. **High**: Максимальная цена за день\n",
|
|||
|
"4. **Low**: Минимальная цена за день\n",
|
|||
|
"5. **Close**: Цена закрытия торгов\n",
|
|||
|
"6. **Adjusted Close**: Скоректированная цена закрытия\n",
|
|||
|
"7. **Volume**: Объем торгов за день\n",
|
|||
|
"\n",
|
|||
|
"**Дополнительные столбцы (факторы, влияющие на цену золота):**\n",
|
|||
|
"\n",
|
|||
|
"8. **SP_open**, **SP_high**, **SP_low**, **SP_close**, **SP_Ajclose**, **SP_volume**: Данные индекса S&P 500.\n",
|
|||
|
"9. **DJ_open**, **DJ_high**, **DJ_low**, **DJ_close**, **DJ_Ajclose**, **DJ_volume**: Данные индекса Dow Jones.\n",
|
|||
|
"10. **EG_open**, **EG_high**, **EG_low**, **EG_close**, **EG_Ajclose**, **EG_volume**: Данные компании Eldorado Gold Corporation (EGO).\n",
|
|||
|
"11. **EU_Price**, **EU_open**, **EU_high**, **EU_low**, **EU_Trend**: Курс валютной пары EUR/USD.\n",
|
|||
|
"12. **OF_Price**, **OF_Open**, **OF_High**, **OF_Low**, **OF_Volume**, **OF_Trend**: Цена фьючерсов на нефть Brent.\n",
|
|||
|
"13. **OS_Price**, **OS_Open**, **OS_High**, **OS_Low**, **OS_Trend**: Цена нефти WTI.\n",
|
|||
|
"14. **SF_Price**, **SF_Open**, **SF_High**, **SF_Low**, **SF_Volume, SF_Trend**: Цена фьючерсов на серебро.\n",
|
|||
|
"15. **USB_Price**, **USB_Open**, **USB_High**, **USB_Low**, **USB_Trend**: Ставка по облигациям США.\n",
|
|||
|
"16. **PLT_Price**, **PLT_Open**, **PLT_High**, **PLT_Low**, **PLT_Trend**: Цена платины.\n",
|
|||
|
"17. **PLD_Price**, **PLD_Open**, **PLD_High**, **PLD_Low**, **PLD_Trend**: Цена палладия.\n",
|
|||
|
"18. **RHO_PRICE**: Цена родия.\n",
|
|||
|
"19. **USDI_Price**, **USDI_Open**, **USDI_High**, **USDI_Low**, **USDI_Volume**, **USDI_Trend**: Индекс доллара США.\n",
|
|||
|
"20. **GDX_Open**, **GDX_High**, **GDX_Low**, **GDX_Close**, **GDX_Adj Close**, **GDX_Volume**: Данные ETF на золотые шахты.\n",
|
|||
|
"21. **USO_Open**, **USO_High**, **USO_Low**, **USO_Close**, **USO_Adj Close**, **USO_Volume**: Данные ETF на нефть USO.\n",
|
|||
|
"\n",
|
|||
|
"**Объект наблюдения**: Объектом наблюдения является торговый день для цены золота с дополнительными факторами влияния.\\\n",
|
|||
|
"**Связи между объектами**: Взаимосвязь между движением цен на золото и другими экономическими показателями и активами (например, нефть, фондовые индексы). Золото часто коррелирует с другими активами в периоды нестабильности.\\\n",
|
|||
|
"**Бизнес-цель**: Управление инвестициями в золото и связанные активы (нефть, индексы).\\\n",
|
|||
|
"**Эффект для бизнеса**: Правильное прогнозирование цен на золото и связанных активов может помочь инвесторам защитить капитал.\\\n",
|
|||
|
"**Техническая цель**: Построение модели для анализа взаимосвязи между ценами на золото и дополнительными факторами (нефть, фондовые индексы, валютные курсы).\n",
|
|||
|
"* **Вход**: Данные по ценам на золото и дополнительным факторам (нефть, индексы, валюты).\\\n",
|
|||
|
"* **Целевой признак**: Цена закрытия золота."
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": ".venv",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.12.5"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|