838 lines
563 KiB
Plaintext
Raw Permalink Normal View History

2024-12-07 09:11:03 +04:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Лабораторная работа 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>1. Определить две бизнес-цели</b><br><br> \n",
"Во-первых, у нас есть в датасете есть столбец NumWebPurchases — количество покупок через интернет.<br>Ставим первую бизнес цель: Увеличение продаж через интернет-магазин. А еще у нас имеется столбец Response — отклик на текущую кампанию. Ставим вторую бизнес цель: Анализ отклика на предыдущие кампании для повышения их эффективности."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>2. Определить цели технического проекта </b><br><br>\n",
"\n",
"<b>Для увеличения интернет-продаж:</b>\n",
"\n",
"Разработать модели сегментации клиентов на основе их характеристик (доход, покупки).\n",
"Создать прогнозные модели для определения вероятности веб-покупок.\n",
"\n",
"<b>Для оптимизации кампаний:</b>\n",
"\n",
"Провести анализ данных об откликах клиентов на прошлые кампании.\n",
"Сформировать рекомендации по улучшению таргетирования на основе анализа успешных кампаний."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>4. Выполнить разбиение каждого набора данных</b> \n"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"((1329, 16), (443, 16), (444, 16))"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"import pandas as pd\n",
"dataset = pd.read_csv(\".//datasetlab1//marketing_campaign.csv\", sep=\"\\t\")\n",
"\n",
"# Удаление неинформативных столбцов и выбор целевых данных для бизнес-целей\n",
"columns_to_use = [\n",
" \"Income\", \"Kidhome\", \"Teenhome\", \"NumWebPurchases\", \"MntWines\", \n",
" \"MntFruits\", \"MntMeatProducts\", \"MntFishProducts\", \"MntSweetProducts\",\n",
" \"MntGoldProds\", \"AcceptedCmp1\", \"AcceptedCmp2\", \"AcceptedCmp3\", \n",
" \"AcceptedCmp4\", \"AcceptedCmp5\", \"Response\", \"Recency\"\n",
"]\n",
"\n",
"# Очистка данных от пропусков и выбор только необходимых столбцов\n",
"filtered_data = dataset[columns_to_use].dropna()\n",
"\n",
"# Разделение данных на признаки (X) и целевую переменную (y) для оптимизации кампаний\n",
"X = filtered_data.drop(columns=[\"Response\"])\n",
"y = filtered_data[\"Response\"]\n",
"\n",
"# Разбиение на обучающую (60%), контрольную (20%) и тестовую (20%) выборки\n",
"X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state=42, stratify=y)\n",
"X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp)\n",
"\n",
"# Проверка размера выборок\n",
"X_train.shape, X_val.shape, X_test.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>5. Оценить сбалансированность выборок </b><br><br>\n",
"\n",
"За 0 берем не отклик, за 1 - отклик клиента на рекламу."
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAHHCAYAAABeLEexAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABRDElEQVR4nO3deVgVZf8/8PdhP2wHAdkUhRQV3AM0xC1FUVzT9FEpyUx93Mg9Lbc043HHLZceQ01LU1PLklQ0NSU1XHELe0BxAVRkNUA59+8Pf8zX4zkoh4ADzvt1Xee6OPfcM/OZ4SxvZu4ZFEIIASIiIiIZMzJ0AURERESGxkBEREREssdARERERLLHQERERESyx0BEREREssdARERERLLHQERERESyx0BEREREssdARERUBeTm5iI5ORkPHz40dClUxrKzs5GUlITc3FxDlyJrDERERJXU9u3b0bFjR9jY2MDa2hq1atXCggULDF1WlZCTk4PIyEjpeUZGBlatWmW4gp4hhMC6devwxhtvwNLSEra2tvD09MTmzZsNXZqsKfivO15NGzZswJAhQ6Tn5ubmqFWrFjp37owZM2bA2dnZgNUR0ctMnToV8+fPR69evTBgwAA4OjpCoVCgXr16cHd3N3R5lV5hYSFUKhXWrl2Ltm3bYvHixbh69Sqio6MNXRoGDhyIbdu2ISwsDN27d4dKpYJCoUCTJk1QvXp1Q5cnWyaGLoDK15w5c+Dp6Ym8vDz89ttvWL16NX7++WfEx8fD0tLS0OURkQ5HjhzB/PnzERERgalTpxq6nCrJ2NgYn376KQYPHgy1Wg1bW1v89NNPhi4LmzZtwrZt27B582YMGjTI0OXQM3iE6BVVdITo9OnT8PPzk9onTpyIJUuW4JtvvsHAgQMNWCERFadHjx5IT0/H8ePHDV1KlXfr1i0kJyfD29sbdnZ2hi4HjRs3RpMmTbBlyxZDl0LP4RgimenQoQMAIDExEQCQnp6OSZMmoXHjxrC2toatrS26du2K8+fPa82bl5eH2bNno169erCwsICrqyv69OmDv/76CwCQlJQEhUJR7KN9+/bSsn799VcoFAps27YNH3/8MVxcXGBlZYWePXsiOTlZa90nT55Ely5doFKpYGlpiXbt2hX7ZdG+fXud6589e7ZW382bN8PX1xdKpRL29vYYMGCAzvW/aNuepVarERkZiYYNG8LCwgLOzs4YMWKE1kBYDw8PdO/eXWs9Y8aM0VqmrtoXLlyotU8BID8/H7NmzULdunVhbm4Od3d3TJkyBfn5+Tr31bPat2+vtbx58+bByMgI33zzTan2x6JFi9CqVSs4ODhAqVTC19cXO3bs0Ln+zZs3o0WLFrC0tES1atXQtm1b7N+/X6PPvn370K5dO9jY2MDW1hb+/v5atW3fvl36nTo6OuKdd97B7du3Nfq89957GjVXq1YN7du3x7Fjx166n/7JvABw6NAhtGnTBlZWVrCzs0OvXr1w5coVjT6///47GjVqhAEDBsDe3h5KpRL+/v7YvXu31CcnJwdWVlb48MMPtdZx69YtGBsbIyIiQqrZw8NDq9/zr60bN25g1KhRqF+/PpRKJRwcHNCvXz8kJSVpzFf0/v3111+lttOnT6NTp06wsbGBlZWVzn2yYcMGKBQK/PHHH1Lb/fv3db7Gu3fvrrPmknwWzJ49W3ot1qxZEwEBATAxMYGLi4tW3boUzV/0sLGxQYsWLTT2P/D0PdOoUaNil1P0PtmwYQOApwPj4+Pj4e7ujm7dusHW1rbYfQUA//vf/9CvXz/Y29vD0tISb7zxhtZRLn0+S/V5j+vzmfuq4CkzmSkKLw4ODgCevuF2796Nfv36wdPTE6mpqVi7di3atWuHy5cvw83NDcDT8/Hdu3dHTEwMBgwYgA8//BDZ2dk4cOAA4uPjUadOHWkdAwcOREhIiMZ6p02bprOeefPmQaFQ4KOPPkJaWhoiIyMRFBSEc+fOQalUAnj6BdK1a1f4+vpi1qxZMDIyQlRUFDp06IBjx46hRYsWWsutWbOm9GWQk5ODkSNH6lz3jBkz0L9/f3zwwQe4d+8eVqxYgbZt2+Ls2bM6/5ocPnw42rRpAwD4/vvvsWvXLo3pI0aMkI7OhYeHIzExEStXrsTZs2dx/PhxmJqa6twP+sjIyJC27VlqtRo9e/bEb7/9huHDh8Pb2xsXL17E0qVL8eeff2p9mL9MVFQUpk+fjsWLFxd7aP9l+2PZsmXo2bMnQkNDUVBQgK1bt6Jfv37Yu3cvunXrJvX79NNPMXv2bLRq1Qpz5syBmZkZTp48iUOHDqFz584Ann6Zvv/++2jYsCGmTZsGOzs7nD17FtHR0VJ9Rfve398fERERSE1NxbJly3D8+HGt36mjoyOWLl0K4GmAWLZsGUJCQpCcnPzSIwmlnffgwYPo2rUrXnvtNcyePRt///03VqxYgcDAQJw5c0YKAA8ePMC6detgbW2N8PBwVK9eHZs3b0afPn2wZcsWDBw4ENbW1njrrbewbds2LFmyBMbGxtJ6vv32WwghEBoa+sLteN7p06dx4sQJDBgwADVr1kRSUhJWr16N9u3b4/Lly8WeZr9+/Trat28PS0tLTJ48GZaWlvjyyy8RFBSEAwcOoG3btnrVUZzSfBYUWbx4MVJTU/Va39dffw3gaWj74osv0K9fP8THx6N+/fqlqv/BgwcAgPnz58PFxQWTJ0+GhYWFzn2VmpqKVq1a4dGjRwgPD4eDgwM2btyInj17YseOHXjrrbc0ll2Sz9LnFfce/yf7uUoT9EqKiooSAMTBgwfFvXv3RHJysti6datwcHAQSqVS3Lp1SwghRF5enigsLNSYNzExUZibm4s5c+ZIbV999ZUAIJYsWaK1LrVaLc0HQCxcuFCrT8OGDUW7du2k54cPHxYARI0aNURWVpbU/t133wkAYtmyZdKyvby8RHBwsLQeIYR49OiR8PT0FJ06ddJaV6tWrUSjRo2k5/fu3RMAxKxZs6S2pKQkYWxsLObNm6cx78WLF4WJiYlWe0JCggAgNm7cKLXNmjVLPPsWOnbsmAAgtmzZojFvdHS0Vnvt2rVFt27dtGofPXq0eP5t+XztU6ZMEU5OTsLX11djn3799dfCyMhIHDt2TGP+NWvWCADi+PHjWut7Vrt27aTl/fTTT8LExERMnDhRZ9+S7A8hnv6enlVQUCAaNWokOnTooLEsIyMj8dZbb2m9Fot+5xkZGcLGxka0bNlS/P333zr7FBQUCCcnJ9GoUSONPnv37hUAxMyZM6W2sLAwUbt2bY3lrFu3TgAQp06d0rnNZTFvs2bNhJOTk3jw4IHUdv78eWFkZCQGDx4stQEQAMSvv/4qtT169Eh4e3sLFxcXUVBQIIQQ4pdffhEAxL59+zTW06RJE43XxpAhQ0StWrW06nn+tfX870sIIWJjYwUAsWnTJqmt6P17+PBhIYQQffv2FcbGxiI+Pl7qc//+feHg4CB8fX2ltqLPpdOnT0ttut6fQgjRrVs3jf2sz2fB86/FtLQ0YWNjI7p27apRd3F0vZb3798vAIjvvvtOamvXrp1o2LBhscsp+kyMiorSeG5mZib+/PNPjX3w/L4aN26cAKDxfs7Ozhaenp7Cw8NDeq+U9LO0qN6XvcdL85n7quAps1dcUFAQqlevDnd3dwwYMADW1tbYtWsXatSoAeDp1WdGRk9fBoWFhXjw4AGsra1Rv359nDlzRlrOzp074ejoiLFjx2qt4/nTJPoYPHgwbGxspOdvv/02XF1d8fPPPwMAzp07h4SEBAwaNAgPHjzA/fv3cf/+feTm5qJjx444evQo1Gq1xjLz8vJgYWHxwvV+//33UKvV6N+/v7TM+/fvw8XFBV5eXjh8+LBG/4KCAgBP91dxtm/fDpVKhU6dOmks09fXF9bW1lrLfPz4sUa/+/fvIy8v74V13759GytWrMCMGTNgbW2ttX5vb280aNBAY5lFp0mfX39xTp06hf79+6Nv375YuHChzj4l2R8ANP4yffjwITIzM9GmTRuN19bu3buhVqsxc+ZM6bVYpOi1deDAAWR
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAHHCAYAAABeLEexAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABRfElEQVR4nO3deVgV5f8+8Puw7yAiW6KQooJ7gIZrKYpKLrllUpJ7CpmCmpqKS0buuODWR1HTstTUsiQVS80QzR230HAXEJHVWM/z+8Mv8+N4UDkIHHDu13Wd6/I888zMe+ZstzPPDAohhAARERGRjOlouwAiIiIibWMgIiIiItljICIiIiLZYyAiIiIi2WMgIiIiItljICIiIiLZYyAiIiIi2WMgIiIiItljICIiIqqClEolUlJS8O+//2q7FFlgICIiIln5888/8ccff0jP//jjDxw7dkx7BRWTmJiI8ePHo27dujAwMECtWrXg7u6OjIwMbZf2ymMgquY2btwIhUIhPYyMjNCgQQMEBQUhKSlJ2+UREVU5t2/fxtixY3HhwgVcuHABY8eOxe3bt7VdFq5duwYvLy9s27YNo0ePxt69e3HgwAFER0fD1NRU2+W98vS0XQCVjzlz5sDFxQU5OTn4888/sXr1avz666+Ii4uDiYmJtssjIqoy+vbti/DwcDRr1gwA4O3tjb59+2q5KmD06NEwMDDA8ePH8dprr2m7HNlhIHpFdO/eHZ6engCAESNGoGbNmliyZAn27NmD999/X8vVERFVHYaGhvjrr78QFxcHAGjSpAl0dXW1WtOpU6dw6NAh7N+/n2FIS3jK7BXVqVMnAEBCQgIAIDU1FRMnTkTTpk1hZmYGCwsLdO/eHefOnVObNycnB7NmzUKDBg1gZGQEBwcH9O3bF9evXwcA3LhxQ+U03dOPt956S1rWH3/8AYVCge+//x7Tpk2Dvb09TE1N0atXrxIPUcfGxqJbt26wtLSEiYkJOnbs+Mxz+2+99VaJ6581a5Za3y1btsDDwwPGxsawtrbGoEGDSlz/87atOKVSifDwcDRu3BhGRkaws7PD6NGj8ejRI5V+zs7OeOedd9TWExQUpLbMkmpfuHCh2j4FgNzcXISGhqJ+/fowNDSEk5MTJk+ejNzc3BL3VXFvvfWW2vLmzZsHHR0dfPvtt2XaH4sWLUKbNm1Qs2ZNGBsbw8PDAzt27Chx/Vu2bEGrVq1gYmKCGjVqoEOHDti/f79Kn3379qFjx44wNzeHhYUFvLy81Grbvn279Jra2Njggw8+wN27d1X6fPTRRyo116hRA2+99RaOHj36wv30svM6OzurbbeOjg6++uorlfZDhw6hffv2MDU1hZWVFXr37o3Lly+r9Jk1axYUCgVSUlJU2v/++28oFAps3LixxJpLety4cQPA/39v7t+/Hy1atICRkRHc3d3x448/qm3Pv//+iwEDBsDa2homJiZ488038csvv5Rqv5X0ufzoo49gZmb2wv2oyeenoKAAc+fORb169WBoaAhnZ2dMmzZN7TPh7OyMjz76CLq6umjevDmaN2+OH3/8EQqFQu01e1ZNRduko6MDe3t7vPfee7h165bUp+hzs2jRomcup+g1LXL8+HEYGRnh+vXraNy4MQwNDWFvb4/Ro0cjNTVVbf7Svv/NzMzw77//wtfXF6ampnB0dMScOXMghFCrt+h9BACZmZnw8PCAi4sL7t+/L7WX9ruvOuIRoldUUXipWbMmgCdfaLt378aAAQPg4uKCpKQkrF27Fh07dsSlS5fg6OgIACgsLMQ777yD6OhoDBo0CJ9++ikyMzNx4MABxMXFoV69etI63n//ffTo0UNlvVOnTi2xnnnz5kGhUOCzzz5DcnIywsPD4ePjg7Nnz8LY2BjAkx+G7t27w8PDA6GhodDR0UFkZCQ6deqEo0ePolWrVmrLrV27NsLCwgAAWVlZGDNmTInrnjFjBgYOHIgRI0bgwYMHWLFiBTp06IAzZ87AyspKbZ5Ro0ahffv2AIAff/wRu3btUpk+evRobNy4EUOHDsW4ceOQkJCAlStX4syZMzh27Bj09fVL3A+aSEtLk7atOKVSiV69euHPP//EqFGj4ObmhgsXLmDp0qX4559/sHv3bo3WExkZienTp2Px4sUYPHhwiX1etD+WLVuGXr16wd/fH3l5edi2bRsGDBiAvXv3ws/PT+o3e/ZszJo1C23atMGcOXNgYGCA2NhYHDp0CF27dgXwZFzcsGHD0LhxY0ydOhVWVlY4c+YMoqKipPqK9r2XlxfCwsKQlJSEZcuW4dixY2qvqY2NDZYuXQoAuHPnDpYtW4YePXrg9u3bJb72xb3MvMXt378fw4YNQ1BQEKZMmSK1Hzx4EN27d8frr7+OWbNm4b///sOKFSvQtm1bnD59ulQ/0MWNHj0aPj4+0vMPP/wQ7777rsrpoFq1akn/jo+Px3vvvYePP/4YAQEBiIyMxIABAxAVFYUuXboAAJKSktCmTRs8fvwY48aNQ82aNbFp0yb06tULO3bswLvvvqtWR/H9VlRHRRsxYgQ2bdqE/v37IyQkBLGxsQgLC8Ply5fV3q/FFRQU4PPPP9doXe3bt8eoUaOgVCoRFxeH8PBw3Lt3r1Rh+VkePnyInJwcjBkzBp06dcLHH3+M69evIyIiArGxsYiNjYWhoSEAzd7/hYWF6NatG958800sWLAAUVFRCA0NRUFBAebMmVNiLfn5+ejXrx9u3bqFY8eOwcHBQZpWGd99WiOoWouMjBQAxMGDB8WDBw/E7du3xbZt20TNmjWFsbGxuHPnjhBCiJycHFFYWKgyb0JCgjA0NBRz5syR2jZs2CAAiCVLlqitS6lUSvMBEAsXLlTr07hxY9GxY0fp+e+//y4AiNdee01kZGRI7T/88IMAIJYtWyYt29XVVfj6+krrEUKIx48fCxcXF9GlSxe1dbVp00Y0adJEev7gwQMBQISGhkptN27cELq6umLevHkq8164cEHo6emptcfHxwsAYtOmTVJbaGioKP5ROXr0qAAgtm7dqjJvVFSUWnvdunWFn5+fWu2BgYHi6Y/f07VPnjxZ2NraCg8PD5V9+s033wgdHR1x9OhRlfnXrFkjAIhjx46pra+4jh07Ssv75ZdfhJ6enggJCSmxb2n2hxBPXqfi8vLyRJMmTUSnTp1UlqWjoyPeffddtfdi0WuelpYmzM3NRevWrcV///1XYp+8vDxha2srmjRpotJn7969AoCYOXOm1BYQECDq1q2rspx169YJAOLEiRMlbnN5z/v3338LMzMzMWDAALXtbtGihbC1tRUPHz6U2s6dOyd0dHTEkCFDpLaiff7gwQOV+U+ePCkAiMjIyBLrePo9VVzdunUFALFz506pLT09XTg4OIiWLVtKbePHjxcAVN5vmZmZwsXFRTg7O6ttk7+/v3BxcXluHQEBAcLU1LTEup6usTSfn7NnzwoAYsSIESr9Jk6cKACIQ4cOqSwzICBAer5q1SphaGgo3n77bbXX+1k1FZ9fCCEGDx4sTExMpOfP+44s8vTnqOh5586dRUFBgdRe9B2/YsUKIYTm738A4pNPPpHalEql8PPzEwYGBtL7qajeyMhIoVQqhb+/vzAxMRGxsbEqNWvy3Vcd8ZTZK8LHxwe1atWCk5MTBg0aBDMzM+zatUs6F21oaAgdnScvd2FhIR4+fAgzMzM0bNgQp0+flpazc+dO2NjY4JNPPlFbx9OHqDUxZMgQmJubS8/79+8PBwcH/PrrrwCAs2fPIj4+HoMHD8bDhw+RkpKClJQUZGdno3Pnzjhy5AiUSqXKMnNycmBkZPTc9f74449QKpUYOHCgtMyUlBTY29vD1dUVv//+u0r/vLw8AJD+J1aS7du3w9LSEl26dFFZpoeHB8zMzNSWmZ+fr9IvJSUFOTk5z6377t27WLFiBWbMmKF2amH79u1wc3NDo0aNVJZZdJr06fU/y4kTJzBw4ED069cPCxcuLLFPafYHAOkoHwA8evQI6enpaN++vcp7a/fu3VAqlZg5c6b0XixS9N46cOAAMjMzMWXKFLXXtqj
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAHHCAYAAABeLEexAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABPOElEQVR4nO3deVgVZf8/8PdhR+QcZIdEIUUFd0EJcUtRFHM3H5WSzNRHIRdcknI3IzU33O1R1LRMM7EseUI0NSM1d9zCHtwFVGQ1Fjn37w+/zI/jQeXQgYPN+3Vd57o499wz85k5C29m7hkUQggBIiIiIhkzMnQBRERERIbGQERERESyx0BEREREssdARERERLLHQERERESyx0BEREREssdARERERLLHQERERESyx0BERESkZ0IIZGRkIDk52dClUDkxEBER0UsjKSkJsbGx0vMzZ87ghx9+MFxBpeTk5GD69Olo2LAhzMzMYGdnhwYNGuDKlSuGLo3KgYGIsGnTJigUCulhYWGBBg0aIDw8HGlpaYYuj4hIkpOTg9GjR+O3335DcnIyxo8fj/Pnzxu6LDx48AD+/v6Ijo7GwIEDsWfPHsTHx+Pnn3+Gu7u7ocujcjAxdAFUfcydOxceHh7Iz8/HL7/8gjVr1uDHH39EUlISatSoYejyiIjg7+8vPQCgQYMGGDlypIGrAqZMmYK7d+8iMTERjRs3NnQ5VAEMRCTp0aMHfH19AQDvvfce7OzssGTJEuzZswdDhgwxcHVERE/Exsbi4sWL+Ouvv9C0aVOYmZkZtJ709HRs3rwZa9euZRh6ifGUGT1T586dAQApKSkAgIyMDEyePBlNmzZFzZo1oVQq0aNHD5w9e1Zr3vz8fMyePRsNGjSAhYUFXFxc0L9/f/z5558AgGvXrmmcpnv60alTJ2lZP//8MxQKBb7++mt8+OGHcHZ2hpWVFXr37o2bN29qrfvYsWPo3r07VCoVatSogY4dO+Lo0aNlbmOnTp3KXP/s2bO1+m7duhU+Pj6wtLSEra0tBg8eXOb6n7dtpanVaixbtgyNGzeGhYUFnJycMHr0aDx8+FCjn7u7O9544w2t9YSHh2sts6zaFy1apLVPAaCgoACzZs1C/fr1YW5uDjc3N0ydOhUFBQVl7qvSOnXqpLW8+fPnw8jICF9++WWF9sdnn32Gtm3bws7ODpaWlvDx8cE333xT5vq3bt2KNm3aoEaNGqhVqxY6dOiAn376SaPPvn370LFjR1hbW0OpVKJ169Zate3cuVN6Te3t7fHWW2/h9u3bGn3eeecdjZpr1aqFTp064ciRIy/cTxWd9+n5ynpcu3ZNY1vbt28PKysrWFtbo2fPnrhw4YLWci9fvoxBgwbBwcEBlpaWaNiwIT766CMAwOzZs1+4zp9//rnS9t3q1avRuHFjmJubw9XVFWFhYcjMzNToU/p95+3tDR8fH5w9e7bM91NZnv6829vbo2fPnkhKStLop1AoEB4e/szllAwzKHkNTpw4AbVajcLCQvj6+sLCwgJ2dnYYMmQIbty4oTX/gQMHpNfLxsYGffr0waVLlzT6lLweJa+ZUqmEnZ0dxo8fj/z8fK16S3/uHz9+jODgYNja2uLixYsafcv7PSZHPEJEz1QSXuzs7AAA//vf/xAbG4s333wTHh4eSEtLw7p169CxY0dcvHgRrq6uAIDi4mK88cYbSEhIwODBgzF+/Hjk5OQgPj4eSUlJqFevnrSOIUOGIDg4WGO9kZGRZdYzf/58KBQKfPDBB0hPT8eyZcsQGBiIM2fOwNLSEsCTL5oePXrAx8cHs2bNgpGREWJiYtC5c2ccOXIEbdq00Vpu7dq1ERUVBQDIzc3FmDFjylz3jBkzMGjQILz33nu4d+8eVqxYgQ4dOuD06dOwsbHRmmfUqFFo3749AODbb7/F7t27NaaPHj0amzZtwvDhwzFu3DikpKRg5cqVOH36NI4ePQpTU9My94MuMjMzpW0rTa1Wo3fv3vjll18watQoeHl54fz581i6dCn++OMPjUGr5RETE4Pp06dj8eLFGDp0aJl9XrQ/li9fjt69eyMkJASFhYXYvn073nzzTezduxc9e/aU+s2ZMwezZ89G27ZtMXfuXJiZmeHYsWM4cOAAunXrBuDJL6x3330XjRs3RmRkJGxsbHD69GnExcVJ9ZXs+9atWyMqKgppaWlYvnw5jh49qvWa2tvbY+nSpQCAW7duYfny5QgODsbNmzfLfO1Lq8i8o0ePRmBgoPT87bffRr9+/dC/f3+pzcHBAQDwxRdfIDQ0FEFBQViwYAEePXqENWvWoF27djh9+rQ0fuXcuXNo3749TE1NMWrUKLi7u+PPP//E999/j/nz56N///6oX7++tPyJEyfCy8sLo0aNktq8vLwqZd/Nnj0bc+bMQWBgIMaMGYMrV65gzZo1OHHixAs/Cx988MFz9//TGjVqhI8++ghCCPz5559YsmQJgoODywwu5fXgwQMAT/5I8fHxwaeffop79+4hOjoav/zyC06fPg17e3sAwP79+9GjRw+8+uqrmD17Nv766y+sWLECAQEBOHXqlNZ4o0GDBsHd3R1RUVH47bffEB0djYcPH2LLli3PrOe9997Dzz//jPj4eHh7e0vtFfkekxVBshcTEyMAiP3794t79+6Jmzdviu3btws7OzthaWkpbt26JYQQIj8/XxQXF2vMm5KSIszNzcXcuXOlto0bNwoAYsmSJVrrUqvV0nwAxKJFi7T6NG7cWHTs2FF6fvDgQQFAvPLKKyI7O1tq37FjhwAgli9fLi3b09NTBAUFSesRQohHjx4JDw8P0bVrV611tW3bVjRp0kR6fu/ePQFAzJo1S2q7du2aMDY2FvPnz9eY9/z588LExESrPTk5WQAQmzdvltpmzZolSn/cjhw5IgCIbdu2acwbFxen1V63bl3Rs2dPrdrDwsLE0x/hp2ufOnWqcHR0FD4+Phr79IsvvhBGRkbiyJEjGvOvXbtWABBHjx7VWl9pHTt2lJb3ww8/CBMTEzFp0qQy+5Znfwjx5HUqrbCwUDRp0kR07txZY1lGRkaiX79+Wu/Fktc8MzNTWFtbCz8/P/HXX3+V2aewsFA4OjqKJk2aaPTZu3evACBmzpwptYWGhoq6detqLGf9+vUCgDh+/HiZ26yPeUt7+nUtkZOTI2xsbMTIkSM12lNTU4VKpdJo79Chg7C2thbXr1/X6Fv6s1Ja3bp1RWhoqFa7vvddenq6MDMzE926ddN4TVeuXCkAiI0bN0ptpd93Qgjx448/CgCie/fuWu+nsjw9vxBCfPjhhwKASE9Pl9oAiLCwsGcup+Q7MyUlReO5t7e3xvu45Lur9GejRYsWwtHRUTx48EBqO3v2rDAyMhLDhg2T2ko+I71799ZY99ixYwUAcfbsWY16S94fkZGRwtjYWMTGxmrMp+v3mBzxlBlJAgMD4eDgADc3NwwePBg1a9bE7t278corrwAAzM3NYWT05C1TXFyMBw8eoGbNmmjYsCFOnTolLWfXrl2wt7fH+++/r7WO8hzWfpZhw4bB2tpaej5w4EC4uLjgxx9/BPDk8tvk5GQMHToUDx48wP3793H//n3k5eWhS5cuOHz4MNRqtcYy8/PzYWFh8dz1fvvtt1Cr1Rg0aJC0zPv378PZ2Rmenp44ePCgRv/CwkIAT/bXs+zcuRMqlQpdu3bVWKaPjw9q1qyptcyioiKNfvfv39c6bP6027dvY8WKFZgxYwZq1qyptX4vLy80atRIY5klp0mfXv+zHD9+HIMGDcKAAQOwaNGiMvuUZ38AkI7yAcDDhw+RlZWF9u3ba7y3YmNjoVarMXPmTOm9WKLkvRUfH4+cnBxMmzZN67Ut6fP7778jPT0dY8eO1ejTs2dPNGrUSOsybrVaLe2jM2fOYMuWLXBxcZGOmDzP35n3ReLj45GZmYkhQ4ZovI7Gxsbw8/OTXsd79+7h8OHDePfdd1GnTh2NZej6mdT3vtu/fz8KCwsxYcIEjdd05MiRUCqVz7ykXgiByMhIDBgwAH5+fuWuv+SzdO/ePSQmJmL37t1o1qy
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Распределение классов в обучающей выборке (в процентах):\n",
"Response\n",
"0 84.951091\n",
"1 15.048909\n",
"Name: proportion, dtype: float64\n"
]
}
],
"source": [
"# Импорт необходимых библиотек\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Функция для визуализации распределения классов\n",
"def plot_class_distribution(y, title):\n",
" sns.countplot(x=y_train, color='orange')\n",
" plt.title(title)\n",
" plt.xlabel(\"Response (Целевой признак)\")\n",
" plt.ylabel(\"Количество записей\")\n",
" plt.show()\n",
"\n",
"# Оценка сбалансированности классов в выборках\n",
"plot_class_distribution(y_train, \"Распределение классов в обучающей выборке\")\n",
"plot_class_distribution(y_val, \"Распределение классов в контрольной выборке\")\n",
"plot_class_distribution(y_test, \"Распределение классов в тестовой выборке\")\n",
"\n",
"# Проверка пропорций классов в обучающей выборке\n",
"class_distribution_train = y_train.value_counts(normalize=True) * 100\n",
"print(\"Распределение классов в обучающей выборке (в процентах):\")\n",
"print(class_distribution_train)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>Для второй бизнес цели</b>\n"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Размеры выборок для второй цели:\n",
"Обучающая: (1329, 15), Контрольная: (443, 15), Тестовая: (444, 15)\n"
]
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"# Целевой признак для второй бизнес-цели\n",
"target_col_2 = 'NumWebPurchases'\n",
"\n",
"# Разделение данных на обучающую, контрольную и тестовую выборки\n",
"X_train_2, X_temp_2, y_train_2, y_temp_2 = train_test_split(\n",
" X.drop(columns=[target_col_2]), # Все признаки, кроме целевого\n",
" X[target_col_2], # Целевой признак\n",
" test_size=0.4, # 40% на контрольную и тестовую выборки\n",
" random_state=42\n",
")\n",
"\n",
"X_val_2, X_test_2, y_val_2, y_test_2 = train_test_split(\n",
" X_temp_2,\n",
" y_temp_2,\n",
" test_size=0.5, # 50% от оставшихся данных для тестовой выборки\n",
" random_state=42\n",
")\n",
"\n",
"# Проверим размеры выборок\n",
"print(\"Размеры выборок для второй цели:\")\n",
"print(f\"Обучающая: {X_train_2.shape}, Контрольная: {X_val_2.shape}, Тестовая: {X_test_2.shape}\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Оценка:"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABgpklEQVR4nO3dd1gU1/s28HtBKVIFpCkCKip2RUVEBRVR7NFoVKJYYgt2YxK+UbEkwRJ7iMZE0ViiUaMmJhbsJdhQVGIJGrCCnaoCsuf9w5f5ue6iLi4sTO7Pdc11MWfOnPPssDs8nDkzqxBCCBARERHJlIG+AyAiIiIqSkx2iIiISNaY7BAREZGsMdkhIiIiWWOyQ0RERLLGZIeIiIhkjckOERERyRqTHSIiIpI1JjtERCVAVlYWbt68icePH+s7FNKxjIwMJCUlISsrS9+h/Gcx2SEi0pNNmzahbdu2sLCwgLm5OSpXrow5c+boO6xSITMzEwsXLpTWU1NTERkZqb+AXiKEwPLly9GsWTOUK1cOlpaWcHd3x9q1a/Ud2n+Wgl8XUTqtWrUKgwYNktaNjY1RuXJlBAYGYsqUKXBwcNBjdET0Jp9//jlmz56Nbt26oU+fPrCzs4NCoUD16tXh4uKi7/BKvLy8PFhZWeH7779Hq1atMG/ePFy+fBm7du3Sd2jo27cvNm7ciJCQEHTu3BlWVlZQKBSoV68eKlSooO/w/pPK6DsAejczZsyAu7s7nj17hqNHj2Lp0qX4888/ER8fj3Llyuk7PCLS4NChQ5g9ezYiIiLw+eef6zucUsnQ0BDTp0/HgAEDoFQqYWlpiT/++EPfYeGnn37Cxo0bsXbtWvTr10/f4dD/x5GdUip/ZOfUqVNo3LixVD5x4kTMnz8f69evR9++ffUYIREVpEuXLnj06BGOHTum71BKvVu3buHmzZvw9PSEtbW1vsNB3bp1Ua9ePaxbt07fodBLOGdHZtq0aQMASExMBAA8evQIn3zyCerWrQtzc3NYWloiKCgI586dU9v32bNnmDZtGqpXrw4TExM4OTmhR48euHbtGgAgKSkJCoWiwMXf319q6+DBg1AoFNi4cSP+97//wdHREWZmZujatStu3ryp1veJEyfQoUMHWFlZoVy5cvDz8yvwD4G/v7/G/qdNm6ZWd+3atfDy8oKpqSlsbGzQp08fjf2/7rW9TKlUYuHChahduzZMTEzg4OCA4cOHq00qdXNzQ+fOndX6GTVqlFqbmmKfO3eu2jEFgOzsbISHh6NatWowNjaGi4sLPv30U2RnZ2s8Vi/z9/dXa++rr76CgYEB1q9fX6jj8c0336B58+awtbWFqakpvLy8sHnzZo39r127Fk2bNkW5cuVQvnx5tGrVCnv27FGps3PnTvj5+cHCwgKWlpZo0qSJWmybNm2Sfqd2dnb48MMPcfv2bZU6AwcOVIm5fPny8Pf3x5EjR954nN5lXwDYv38/WrZsCTMzM1hbW6Nbt264dOmSSp3jx4+jTp066NOnD2xsbGBqaoomTZpg27ZtUp3MzEyYmZlh7Nixan3cunULhoaGiIiIkGJ2c3NTq/fqe+v69ev4+OOPUaNGDZiamsLW1ha9evVCUlKSyn75n9+DBw9KZadOnUK7du1gYWEBMzMzjcdk1apVUCgUOH36tFT24MEDje/xzp07a4z5bc4F06ZNk96LlSpVgo+PD8qUKQNHR0e1uDXJ3z9/sbCwQNOmTVWOP/DiM1OnTp0C28n/nKxatQrAi0nm8fHxcHFxQadOnWBpaVngsQKAf//9F7169YKNjQ3KlSuHZs2aqY1OaXMu1eYzrs05Vw54GUtm8hMTW1tbAC8+TNu2bUOvXr3g7u6Ou3fv4vvvv4efnx8uXrwIZ2dnAC+uf3fu3Bn79u1Dnz59MHbsWGRkZCA6Ohrx8fGoWrWq1Effvn3RsWNHlX7DwsI0xvPVV19BoVDgs88+w71797Bw4UIEBAQgLi4OpqamAF78cQgKCoKXlxfCw8NhYGCAqKgotGnTBkeOHEHTpk3V2q1UqZJ0os/MzMTIkSM19j1lyhT07t0bH330Ee7fv48lS5agVatWOHv2rMb/AocNG4aWLVsCAH799Vds3bpVZfvw4cOlUbUxY8YgMTER3377Lc6ePYtjx46hbNmyGo+DNlJTU6XX9jKlUomuXbvi6NGjGDZsGDw9PXHhwgUsWLAA//zzj9qJ+k2ioqIwefJkzJs3r8Dh9jcdj0WLFqFr164IDg5GTk4ONmzYgF69emHHjh3o1KmTVG/69OmYNm0amjdvjhkzZsDIyAgnTpzA/v37ERgYCODFH8rBgwejdu3aCAsLg7W1Nc6ePYtdu3ZJ8eUf+yZNmiAiIgJ3797FokWLcOzYMbXfqZ2dHRYsWADgRXKwaNEidOzYETdv3nzjCEBh9927dy+CgoJQpUoVTJs2DU+fPsWSJUvg6+uLM2fOSH/cHz58iOXLl8Pc3BxjxoxBhQoVsHbtWvTo0QPr1q1D3759YW5ujvfeew8bN27E/PnzYWhoKPXz888/QwiB4ODg176OV506dQp//fUX+vTpg0qVKiEpKQlLly6Fv78/Ll68WOCl76tXr8Lf3x/lypXDpEmTUK5cOfzwww8ICAhAdHQ0WrVqpVUcBSnMuSDfvHnzcPfuXa36W7NmDYAXCdl3332HXr16IT4+HjVq1ChU/A8fPgQAzJ49G46Ojpg0aRJMTEw0Hqu7d++iefPmePLkCcaMGQNbW1usXr0aXbt2xebNm/Hee++ptP0259JXFfQZf5fjXGoJKpWioqIEALF3715x//59cfPmTbFhwwZha2srTE1Nxa1bt4QQQjx79kzk5eWp7JuYmCiMjY3FjBkzpLKVK1cKAGL+/PlqfSmVSmk/AGLu3LlqdWrXri38/Pyk9QMHDggAomLFiiI9PV0q/+WXXwQAsWjRIqltDw8P0b59e6kfIYR48uSJcHd3F+3atVPrq3nz5qJOnTrS+v379wUAER4eLpUlJSUJQ0ND8dVXX6nse+HCBVGmTBm18oSEBAFArF69WioLDw8XL39Ejhw5IgCIdevWqey7a9cutXJXV1fRqVMntdhDQ0PFqx+7V2P/9NNPhb29vfDy8lI5pmvWrBEGBgbiyJEjKvsvW7ZMABDHjh1T6+9lfn5+Unt//PGHKFOmjJg4caLGum9zPIR48Xt6WU5OjqhTp45o06aNSlsGBgbivffeU3sv5v/OU1NThYWFhfD29hZPnz7VWCcnJ0fY29uLOnXqqNTZsWOHACCmTp0qlYWEhAhXV1eVdpYvXy4AiJMnT2p8zbrYt0GDBsLe3l48fPhQKjt37pwwMDAQAwYMkMoACADi4MGDUtmTJ0+Ep6encHR0FDk5OUIIIXbv3i0AiJ07d6r0U69ePZX3xqBBg0TlypXV4nn1vfXq70sIIWJiYgQA8dNPP0ll+Z/fAwcOCCGE6NmzpzA0NBTx8fFSnQcPHghbW1vh5eUlleWfl06dOiWVafp8CiFEp06dVI6zNueCV9+L9+7dExYWFiIoKEgl7oJoei/v2bNHABC//PKLVObn5ydq165dYDv558SoqCiVdSMjI/HPP/+oHINXj9W4ceMEAJXPc0ZGhnB3dxdubm7SZ+Vtz6X58b7pM16Yc64c8DJWKRcQEIAKFSrAxcUFffr0gbm5ObZu3YqKFSsCeHGXloHBi19zXl4eHj58CHNzc9SoUQNnzpyR2tmyZQvs7OwwevRotT5evXShjQEDBsDCwkJaf//99+Hk5IQ///wTABAXF4eEhAT069cPDx8+xIMHD/DgwQNkZWWhbdu2OHz4MJRKpUqbz549g4mJyWv7/fXXX6FUKtG7d2+pzQcPHsDR0REeHh44cOCASv2cnBwAL45XQTZt2gQrKyu0a9dOpU0vLy+Ym5urtZmbm6tS78GDB3j27Nlr4759+zaWLFmCKVOmwNzcXK1/T09P1KxZU6XN/EuXr/ZfkJMnT6J3797o2bMn5s6dq7HO2xwPACr/UT5+/BhpaWlo2bKlyntr27ZtUCqVmDp1qvRezJf/3oqOjkZGRgY
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAHHCAYAAACle7JuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABcLElEQVR4nO3deViN+f8/8OcpbVoVbSMVUtnJlqwV2RmGQTOyM5Mlma0ZJjJkmbGHMUOMsczYZ8xYsxuSCA2TmKhBEVpp0Xn//vDr/jrOiU7K6fg8H9d1rqv7fW/Pc5/T3av3vcmEEAJEREREWkhH0wGIiIiIyoqFDBEREWktFjJERESktVjIEBERkdZiIUNERERai4UMERERaS0WMkRERKS1WMgQERGR1mIhQ0REVM7kcjnS09Px77//ajrKW4+FDBERaY2TJ0/i6NGj0vDRo0dx6tQpzQV6TmpqKoKCguDo6Ah9fX3UqFED9evXR1ZWlqajvdVYyGjQunXrIJPJpJehoSHq1auHCRMmIC0tTdPxiIgqnZSUFHz88ce4fPkyLl++jI8//hgpKSmajoXr16+jZcuW2LJlC8aNG4c9e/bg4MGDiIqKgrGxsabjvdWqaDoAAWFhYXB2dkZeXh5OnjyJlStX4s8//0R8fDyqVq2q6XhERJVG//79sXjxYjRu3BgA4Onpif79+2s4FTBu3Djo6+vjzJkzeOeddzQd538KC5lKoHv37mjRogUAYPTo0bCyssLChQuxe/duDBkyRMPpiIgqDwMDA/z111+Ij48HADRs2BC6uroazRQbG4vDhw/jwIEDLGI0gIeWKiFvb28AQFJSEgDg4cOH+OSTT9CoUSOYmJjAzMwM3bt3x8WLF5XmzcvLw4wZM1CvXj0YGhrCzs4O/fv3x40bNwAAN2/eVDic9eKrU6dO0rKOHj0KmUyGX375BV9++SVsbW1hbGyMPn36qOzKjY6ORrdu3WBubo6qVauiY8eOJR677tSpk8r1z5gxQ2nan3/+GR4eHjAyMoKlpSUGDx6scv0ve2/Pk8vlWLx4MRo0aABDQ0PY2Nhg3LhxePTokcJ0Tk5O6NWrl9J6JkyYoLRMVdkXLFigtE0BID8/H6Ghoahbty4MDAzg4OCAzz77DPn5+Sq31fM6deqktLzZs2dDR0cHmzZtKtP2+Pbbb9G2bVtYWVnByMgIHh4e2LZtm8r1//zzz2jVqhWqVq2KatWqoUOHDjhw4IDCNHv37kXHjh1hamoKMzMztGzZUinb1q1bpc+0evXq+OCDD3D79m2FaYYPH66QuVq1aujUqRNOnDjxyu30uvM6OTkpvW8dHR3MnTtXof3w4cNo3749jI2NYWFhgb59++Lq1asK08yYMQMymQzp6ekK7efOnYNMJsO6detUZlb1unnzJoD/+24eOHAATZs2haGhIerXr48dO3YovZ9///0XAwcOhKWlJapWrYo2bdrgjz/+KNV2U/V7OXz4cJiYmLxyO6rz+/P06VPMmjULderUgYGBAZycnPDll18q/U44OTlh+PDh0NXVRZMmTdCkSRPs2LEDMplM6TMrKVPxe9LR0YGtrS3ef/99JCcnS9MU/958++23JS6n+DMtdubMGRgaGuLGjRto0KABDAwMYGtri3HjxuHhw4dK85f2+29iYoJ///0Xfn5+MDY2hr29PcLCwiCEUMpb/D0CgOzsbHh4eMDZ2Rl3796V2ku779M27JGphIqLDisrKwDPdkS7du3CwIED4ezsjLS0NHz//ffo2LEjrly5Ant7ewBAUVERevXqhaioKAwePBiTJ09GdnY2Dh48iPj4eNSpU0dax5AhQ9CjRw+F9YaEhKjMM3v2bMhkMnz++ee4d+8eFi9eDF9fX8TFxcHIyAjAsx169+7d4eHhgdDQUOjo6CAyMhLe3t44ceIEWrVqpbTcmjVrIjw8HACQk5ODjz76SOW6p0+fjkGDBmH06NG4f/8+li1bhg4dOuDChQuwsLBQmmfs2LFo3749AGDHjh3YuXOnwvhx48Zh3bp1GDFiBCZNmoSkpCQsX74cFy5cwKlTp6Cnp6dyO6gjIyNDem/Pk8vl6NOnD06ePImxY8fC3d0dly9fxqJFi3Dt2jXs2rVLrfVERkZi2rRp+O677zB06FCV07xqeyxZsgR9+vSBv78/CgoKsGXLFgwcOBB79uxBz549pelmzpyJGTNmoG3btggLC4O+vj6io6Nx+PBhdO3aFcCz875GjhyJBg0aICQkBBYWFrhw4QL27dsn5Sve9i1btkR4eDjS0tKwZMkSnDp1SukzrV69OhYtWgQA+O+//7BkyRL06NEDKSkpKj/7573OvM87cOAARo4ciQkTJuCLL76Q2g8dOoTu3bujdu3amDFjBp48eYJly5bBy8sL58+fL9Uf1ueNGzcOvr6+0vCHH36Id999V+GwSY0aNaSfExMT8f7772P8+PEICAhAZGQkBg4ciH379qFLly4AgLS0NLRt2xaPHz/GpEmTYGVlhfXr16NPnz7Ytm0b3n33XaUcz2+34hwVbfTo0Vi/fj3ee+89TJ06FdHR0QgPD8fVq1eVvq/Pe/r0Kb766iu11tW+fXuMHTsWcrkc8fHxWLx4Me7cuVOqIrckDx48QF5eHj766CN4e3tj/PjxuHHjBiIiIhAdHY3o6GgYGBgAUO/7X1RUhG7duqFNmzaYP38+9u3bh9DQUDx9+hRhYWEqsxQWFmLAgAFITk7GqVOnYGdnJ417E/s+jRCkMZGRkQKAOHTokLh//75ISUkRW7ZsEVZWVsLIyEj8999/Qggh8vLyRFFRkcK8SUlJwsDAQISFhUlta9euFQDEwoULldYll8ul+QCIBQsWKE3ToEED0bFjR2n4yJEjAoB45513RFZWltT+66+/CgBiyZIl0rJdXFyEn5+ftB4hhHj8+LFwdnYWXbp0UVpX27ZtRcOGDaXh+/fvCwAiNDRUart586bQ1dUVs2fPVpj38uXLokqVKkrtiYmJAoBYv3691BYaGiqe/5qfOHFCABAbN25UmHffvn1K7Y6OjqJnz55K2QMDA8WLvzovZv/ss8+EtbW18PDwUNimGzZsEDo6OuLEiRMK869atUoAEKdOnVJa3/M6duwoLe+PP/4QVapUEVOnTlU5bWm2hxDPPqfnFRQUiIYNGwpvb2+FZeno6Ih3331X6btY/JlnZGQIU1NT0bp1a/HkyROV0xQUFAhra2vRsGFDhWn27NkjAIivv/5aagsICBCOjo4Ky1m9erUAIM6ePavyPZf3vOfOnRMmJiZi4MCBSu+7adOmwtraWjx48EBqu3jxotDR0RHDhg2T2oq3+f379xXmj4mJEQBEZGSkyhwvfqee5+joKACI7du3S22ZmZnCzs5ONGvWTGoLCgoSABS+b9nZ2cLZ2Vk4OTkpvSd/f3/h7Oz80hwBAQHC2NhYZa4XM5bm9ycuLk4AEKNHj1aY7pNPPhEAxOHDhxWWGRAQIA2vWLFCGBgYiM6dOyt93iVlen5+IYQYOnSoqFq1qjT8sn1ksRd/j4qHfXx8xNOnT6X24n38smXLhBDqf/8BiIkTJ0ptcrlc9OzZU+jr60vfp+K8kZGRQi6XC39/f1G1alURHR2tkFmdfZ+24aGlSsDX1xc1atSAg4MDBg8eDBMTE+zcuVM61mpgYAAdnWcfVVFRER48eAATExO4urri/Pnz0nK2b9+O6tWrY+LEiUrreLErVx3Dhg2DqampNPzee+/Bzs4Of/75JwAgLi4OiYmJGDp0KB48eID09HSkp6cjNzcXPj4+OH78OORyucIy8/LyYGho+NL17tixA3K5HIMGDZKWmZ6eDltbW7i4uODIkSMK0xcUFACA9J+PKlu3boW5uTm6dOmisEwPDw+YmJgoLbOwsFBhuvT0dOTl5b009+3bt7Fs2TJMnz5dqQt+69atcHd3h5ubm8Iyiw8nvrj+kpw9exaDBg3CgAEDsGDBApXTlGZ7AJB61QDg0aNHyMzMRPv27RW+W7t27YJcLsfXX38tfReLFX+3Dh48iOzsbHzxxRdKn23xNOfOncO9e/fw8cc
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAHHCAYAAACle7JuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABULklEQVR4nO3dd1QU1/8+8GcBKdJBaRZARUDEhiWIXRRLLDGJUYm9B2ONJiQilihRIyqKGvNRLEk0GqOmWYk9VhSUqIgGlVjARlVA2fv7wx/7ZdylLIEsY57XOXsOe2fmzntnC8/eKasQQggQERERyZCergsgIiIiKisGGSIiIpItBhkiIiKSLQYZIiIiki0GGSIiIpItBhkiIiKSLQYZIiIiki0GGSIiIpItBhkiIqJChBB4/PgxEhMTdV0KlQKDDBER/Svi4+Oxa9cu1f3Y2Fj8+uuvuiuokMzMTMycORPu7u4wNDSEra0t6tevj4SEBF2XRiVgkJG5DRs2QKFQqG7GxsaoX78+JkyYgJSUFF2XR0SkkpmZibFjx+LUqVNITEzEpEmTcOnSJV2XhUePHsHX1xcRERF45513sHv3bhw4cACHDx+Gi4uLrsujEhjougAqH3PnzoWrqytycnJw/PhxrF69Gr/99hvi4+NRtWpVXZdHRARfX1/VDQDq16+P0aNH67gqYPr06bh37x5OnjwJLy8vXZdDWmKQeU10794dzZs3BwCMGjUKtra2CA8Px+7duzFw4EAdV0dE9NKuXbtw+fJlPHv2DN7e3jA0NNRpPampqdi4cSPWrFnDECNT3LX0murUqRMAICkpCQDw+PFjfPTRR/D29oaZmRksLCzQvXt3xMXFqS2bk5OD2bNno379+jA2NoajoyP69euHGzduAABu3rwp2Z316q1Dhw6qvg4fPgyFQoHvv/8en376KRwcHGBqaorevXsjOTlZbd2nT59Gt27dYGlpiapVq6J9+/Y4ceKExsfYoUMHjeufPXu22rzffPMNfHx8YGJiAhsbGwwYMEDj+ot7bIUplUosW7YMXl5eMDY2hr29PcaOHYsnT55I5nNxccGbb76ptp4JEyao9amp9sWLF6ttUwDIzc1FaGgo6tWrByMjI9SqVQszZsxAbm6uxm1VWIcOHdT6mz9/PvT09PDdd9+VaXt8+eWXaN26NWxtbWFiYgIfHx/88MMPGtf/zTffoGXLlqhatSqsra3Rrl077N+/XzLPnj170L59e5ibm8PCwgItWrRQq2379u2q57RatWp4//33cefOHck8w4YNk9RsbW2NDh064NixYyVup7Iu++pymm43b96UPNa2bdvC1NQU5ubm6NmzJ/7880+1fq9evYr+/fujevXqMDExgbu7Oz777DMAwOzZs0tc5+HDhyts261atQpeXl4wMjKCk5MTgoKCkJaWJpmn8OuuQYMG8PHxQVxcnMbXkyavvt+rVauGnj17Ij4+XjKfQqHAhAkTiuynYHd8wXNw9uxZKJVK5OXloXnz5jA2NoatrS0GDhyI27dvqy3/+++/q54vKysr9OnTB1euXJHMU/B8FDxnFhYWsLW1xaRJk5CTk6NWb+H3/YsXL9CjRw/Y2Njg8uXLknlL+zn2X8MRmddUQeiwtbUFAPz111/YtWsX3n33Xbi6uiIlJQVfffUV2rdvj8uXL8PJyQkAkJ+fjzfffBPR0dEYMGAAJk2ahMzMTBw4cADx8fGoW7euah0DBw5Ejx49JOsNDg7WWM/8+fOhUCjw8ccfIzU1FcuWLYO/vz9iY2NhYmIC4OUHRPfu3eHj44PQ0FDo6ekhKioKnTp1wrFjx9CyZUu1fmvWrImwsDAAQFZWFsaPH69x3SEhIejfvz9GjRqFBw8eYMWKFWjXrh0uXLgAKysrtWXGjBmDtm3bAgB+/PFH7Ny5UzJ97Nix2LBhA4YPH46JEyciKSkJK1euxIULF3DixAlUqVJF43bQRlpamuqxFaZUKtG7d28cP34cY8aMgaenJy5duoSlS5fi2rVrkoMpSyMqKgozZ87EkiVLMGjQII3zlLQ9li9fjt69eyMwMBB5eXnYunUr3n33Xfzyyy/o2bOnar45c+Zg9uzZaN26NebOnQtDQ0OcPn0av//+O7p27Qrg5T+aESNGwMvLC8HBwbCyssKFCxewd+9eVX0F275FixYICwtDSkoKli9fjhMnTqg9p9WqVcPSpUsBAH///TeWL1+OHj16IDk5WeNzX1hZlh07diz8/f1V9wcPHoy33noL/fr1U7VVr14dALB582YMHToUAQEBWLhwIZ4+fYrVq1ejTZs2uHDhgur4jIsXL6Jt27aoUqUKxowZAxcXF9y4cQM///wz5s+fj379+qFevXqq/qdMmQJPT0+MGTNG1ebp6Vkh22727NmYM2cO/P39MX78eCQkJGD16tU4e/Zsie+Fjz/+uNjt/yoPDw989tlnEELgxo0bCA8PR48ePTQGjtJ69OgRgJdfLnx8fPDFF1/gwYMHiIiIwPHjx3HhwgVUq1YNAHDw4EF0794dderUwezZs/Hs2TOsWLECfn5+OH/+vNrxNP3794eLiwvCwsJw6tQpRERE4MmTJ9i0aVOR9YwaNQqHDx/GgQMH0KBBA1V7WT7H/jMEyVpUVJQAIA4ePCgePHggkpOTxdatW4Wtra0wMTERf//9txBCiJycHJGfny9ZNikpSRgZGYm5c+eq2tavXy8AiPDwcLV1KZVK1XIAxOLFi9Xm8fLyEu3bt1fdP3TokAAgatSoITIyMlTt27ZtEwDE8uXLVX27ubmJgIAA1XqEEOLp06fC1dVVdOnSRW1drVu3Fg0bNlTdf/DggQAgQkNDVW03b94U+vr6Yv78+ZJlL126JAwMDNTaExMTBQCxceNGVVtoaKgo/FY5duyYACC+/fZbybJ79+5Va3d2dhY9e/ZUqz0oKEi8+vZ7tfYZM2YIOzs74ePjI9mmmzdvFnp6euLYsWOS5desWSMAiBMnTqitr7D27dur+vv111+FgYGBmDZtmsZ5S7M9hHj5PBWWl5cnGjZsKDp16iTpS09PT7z11ltqr8WC5zwtLU2Ym5uLVq1aiWfPnmmcJy8vT9jZ2YmGDRtK5vnll18EADFr1ixV29ChQ4Wzs7Okn7Vr1woA4syZMxofc3ksW9irz2uBzMxMYWVlJUaPHi1pv3//vrC0tJS0t2vXTpibm4tbt25J5i38XinM2dlZDB06VK29vLddamqqMDQ0FF27dpU8pytXrhQAxPr161VthV93Qgjx22+/CQCiW7duaq8nTV5dXgghPv30UwFApKamqtoAiKCgoCL7KfjMTEpKktxv0KCB5HVc8NlV+L3RpEkTYWdnJx49eqRqi4uLE3p6emLIkCGqtoL3SO/evSXr/uCDDwQAERcXJ6m34PURHBws9PX1xa5duyTLafs59l/DXUuvCX9/f1SvXh21atXCgAEDYGZmhp07d6JGjRoAACMjI+jpvXy68/Pz8ejRI5iZmcHd3R3nz59X9bNjxw5Uq1YNH374odo6SjP8W5QhQ4bA3Nxcdf+dd96Bo6MjfvvtNwAvT8NMTEzEoEGD8OjRIzx8+BAPHz5EdnY2OnfujKNHj0KpVEr6zMnJgbGxcbHr/fHHH6FUKtG/f39Vnw8fPoSDgwPc3Nxw6NAhyfx5eXkAXm6vomzfvh2Wlpbo0qWLpE8fHx+YmZmp9fn8+XPJfA8fPlQbXn7VnTt3sGLFCoSEhMDMzExt/Z6envDw8JD0WbA78dX1F+XMmTPo378/3n77bSxevFjjPKXZHgBUo2oA8OTJE6Snp6Nt27aS19auXbugVCoxa9Ys1WuxQMFr68CBA8jMzMQnn3yi9twWzHPu3Dmkpqbigw8+kMzTs2dPeHh4qJ3Oq1QqVdsoNjYWmzZtgqOjo2qEojj/ZNmSHDhwAGlpaRg4cKDkedTX10erVq1Uz+ODBw9w9OhRjBgxArVr15b0oe17sry33cGDB5GXl4fJkydLntPRo0fDwsKiyFOrhRAIDg7G22+/jVatWpW6/oL30oMHD3Dy5Ens3LkTjRo1Uo2YFMj
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Распределение классов в обучающей выборке (в процентах):\n",
"NumWebPurchases\n",
"1 15.876599\n",
"2 15.575621\n",
"3 14.672686\n",
"4 13.167795\n",
"5 10.233258\n",
"6 8.577878\n",
"7 6.471031\n",
"8 4.966140\n",
"9 3.837472\n",
"11 2.558315\n",
"0 2.257336\n",
"10 1.655380\n",
"23 0.075245\n",
"27 0.075245\n",
"Name: proportion, dtype: float64\n"
]
}
],
"source": [
"# Функция для визуализации распределения классов\n",
"def plot_class_distribution(y, title):\n",
" sns.countplot(x=y, color='orange')\n",
" plt.title(title)\n",
" plt.xlabel(\"NumWebPurchases (Целевой признак)\")\n",
" plt.ylabel(\"Количество записей\")\n",
" plt.show()\n",
"\n",
"# Оценка сбалансированности классов в выборках\n",
"plot_class_distribution(y_train_2, \"Распределение классов в обучающей выборке\")\n",
"plot_class_distribution(y_val_2, \"Распределение классов в контрольной выборке\")\n",
"plot_class_distribution(y_test_2, \"Распределение классов в тестовой выборке\")\n",
"\n",
"# Проверка пропорций классов в обучающей выборке\n",
"class_distribution_train_2 = y_train_2.value_counts(normalize=True) * 100\n",
"print(\"Распределение классов в обучающей выборке (в процентах):\")\n",
"print(class_distribution_train_2)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>6.</b><br><br> \n",
"\n",
"Наблюдаем несбалансированность, для Второй бизнес цели выполним Upsampling (увеличение выборки для редких классов). Делаем покупки больше."
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlMAAAHHCAYAAACbXt0gAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABdUUlEQVR4nO3deXhM1/8H8PcksskyEZGNSIJYQkolaMQuFTulVaTWVKxFKa3Wrmiofa3WTkutbSkVsWsEIXZpkKBIIsiKbHN+f+SX+81kJmTcpEnk/XqeeR5z7pl7P/dOJvPOuedeCiGEABERERG9Eb3iLoCIiIioNGOYIiIiIpKBYYqIiIhIBoYpIiIiIhkYpoiIiIhkYJgiIiIikoFhioiIiEgGhikiIiIiGRimiIioWGRmZiIuLg737t0r7lKoEJXF95VhioiICmzEiBF4//333/j1kZGRGDJkCOzt7WFoaAhbW1t4eXmB/xlH6fY2vK9PnjyBqakp/vzzT51fW+bC1IYNG6BQKKSHsbExatasiVGjRiE2Nra4yyOiEiw6OhqDBg1C9erVYWxsDDs7O7Ro0QLTpk1T69eqVSsoFAq4urpqXU9QUJD0O2jnzp0ay69du4ZPPvkElStXhpGRERwcHODn54dr166p9cv9u+xVj2PHjiE6OvqVfb777rvX7n9UVBR++uknfP311zoctf85c+YMGjdujCNHjuCrr77CX3/9haCgIOzduxcKheKN1knF7215XytWrIhPP/0UU6ZM0fm15YqgnlJh5syZcHFxwcuXL3Hq1CmsWrUKf/75J65evYry5csXd3lEVMLcunULjRo1gomJCQYPHgxnZ2c8evQIFy5cQGBgIGbMmKHW39jYGLdu3cLZs2fRuHFjtWVbt26FsbExXr58qbGd3bt3o0+fPrCysoK/vz9cXFwQHR2NtWvXYufOndi2bRs++OADAMDmzZvVXrtp0yYEBQVptNepUwcvXrwAAPTp0wcdO3bU2O6777772mOwZMkSuLi4oHXr1q/tm1d6ejoGDRqEmjVr4tChQ1AqlTqvg0qet+19HTZsGJYuXYojR46gTZs2BX+hKGPWr18vAIhz586ptY8bN04AED///HMxVUZEJdmIESNEuXLlRHR0tMay2NhYtectW7YUdevWFbVq1RJjx45VW/bixQthYWEhevbsKQCIHTt2SMtu3bolypcvL2rXri3i4uLUXvf48WNRu3ZtYWpqKm7fvq21xpEjR4r8fq1HRUUJAGL+/PkF2t+80tPThbW1tZg8efIbvX7nzp1CoVCIiIiIN3o9lUxv4/tar1490a9fP51eU+ZO8+UnJ4FGRUUBAJ4+fYovvvgC7u7uMDMzg4WFBTp06IBLly5pvPbly5eYPn06atasCWNjY9jb26NHjx64ffs2ALx2eL1Vq1bSuo4dOwaFQoHt27fj66+/hp2dHUxNTdG1a1fcv39fY9uhoaFo3749lEolypcvj5YtW+L06dNa9zHn1EPex/Tp0zX6btmyBR4eHjAxMYGVlRV69+6tdfuv2rfcVCoVFi9ejLp168LY2Bi2trYYOnQonj17ptbP2dkZnTt31tjOqFGjNNaprfb58+drHFMASEtLw7Rp01CjRg0YGRnB0dEREydORFpamtZjlVurVq001jd79mzo6enh559/fqPj8f3336Np06aoWLEiTExM4OHhofV0D5D9XjRu3Bjly5dHhQoV0KJFCxw6dEitz4EDB9CyZUuYm5vDwsICjRo10qhtx44d0ntqbW2NTz75BA8ePFDrM3DgQLWaK1SogFatWuHkyZOvPU55X6vtVJOu9QDAzZs30atXL1SqVAkmJiaoVasWvvnmG41+zs7OBdrugQMH0Lx5c5iamsLc3BydOnXSOH2mze3bt1GlShU4OTlpLLOxsdH6mj59+mD79u1QqVRS2x9//IHnz5+jV69eGv3nz5+P58+fY82aNahUqZLaMmtra/zwww9ITU3FvHnzXltvYTt16hTi4+Ph4+OjsSwuLg7+/v6wtbWFsbEx6tevj40bN6r1OXPmDFxcXLBr1y5Ur14dhoaGqFq1KiZOnCiNmgHAgAEDYG1tjYyMDI3ttGvXDrVq1QLwv9+Ved/fgQMHwtnZWa2toJ83Z2dnDBw4UHqenJyMUaNGSadbXV1d8d1336m9n0D276JRo0aptXXu3Fmjjp07d2rUrMt3zZ07d/DRRx/BwcEBenp60s94vXr1NPrmlfszoa+vj8qVKyMgIAAJCQlSn5xjmt/vIkDz+Bb0fc2xcuVK1K1bVzp9PXLkSLUagOzfufXq1UNYWBiaNm0KExMTuLi4YPXq1Wr9tP0MPHz4EM7OzvD09ERKSgqA7NGzqVOnwsPDA0qlEqampmjevDmOHj2qdR/ff/99/PHHHzrN9yqzp/nyygk+FStWBJD9Q7t371589NFHcHFxQWxsLH744Qe0bNkS169fh4ODAwAgKysLnTt3RnBwMHr37o0xY8YgOTkZQUFBuHr1KqpXry5tQ9vw+qRJk7TWM3v2bCgUCnz55ZeIi4vD4sWL4ePjg/DwcJiYmAAAjhw5gg4dOsDDwwPTpk2Dnp4e1q9fjzZt2uDkyZMapxYAoEqVKpg7dy4AICUlBcOHD9e67SlTpqBXr1749NNP8fjxYyxbtgwtWrTAxYsXYWlpqfGagIAANG/eHED2aYo9e/aoLR86dCg2bNiAQYMGYfTo0YiKisLy5ctx8eJFnD59GgYGBlqPgy4SEhKkfctNpVKha9euOHXqFAICAlCnTh1cuXIFixYtwj///IO9e/fqtJ3169dj8uTJWLBgAfr27au1z+uOx5IlS9C1a1f4+fkhPT0d27Ztw0cffYR9+/ahU6dOUr8ZM2Zg+vTpaNq0KWbOnAlDQ0OEhobiyJEjaNeuHYDseYCDBw9G3bp1MWnSJFhaWuLixYs4ePCgVF/OsW/UqBHmzp2L2NhYLFmyBKdPn9Z4T62trbFo0SIAwL///oslS5agY8eOuH//vtb3PjcjIyP89NNPam3nzp3D0qVL1doKWs/ly5fRvHlzGBgYICAgAM7Ozrh9+zb++OMPzJ49W2P7zZs3R0BAAADgxo0bmDNnjtryzZs3Y8CAAfD19UVgYCCeP3+OVatWoVmzZrh48aLGl19uTk5OOHz4sE7D/3379sX06dNx7Ngx6TU///wz2rZtqzWA/fHHH3B2dpZ+dvJq0aIFnJ2dsX///gJtX5vnz58jPj5eo93S0hLlyuX/lfD3339DoVBonA588eIFWrVqhVu3bmHUqFFwcXHBjh07MHDgQCQkJGDMmDEAsif33rlzB19//TV69OiB8ePH4/z585g/fz6uXr2K/fv3Q6FQoF+/fti0aRP++usvtT+sYmJicOTIEY35aQVR0M9bXj179kRQUBD69++Pxo0b4+jRo5g0aRKio6M1vtjflC7fNV27dsXdu3cxduxY1KxZEwqFQuvnID8ffPABevTogczMTISEhGDNmjV48eKFxmlhXRT0fQWA6dOnY8aMGfDx8cHw4cMRERGBVatW4dy5cxrfA8+ePUPHjh3Rq1cv9OnTB7/++iuGDx8OQ0NDDB48WGstiYmJ6NChAwwMDPDnn3/CzMwMAJCUlISffvoJffr0wZAhQ5CcnIy1a9fC19cXZ8+eRYMGDdTW4+HhgUWLFuHatWsFCqoAyu5pvsOHD4vHjx+L+/fvi23btomKFSsKExMT8e+//wohhHj58qXIyspSe21UVJQwMjISM2fOlNrWrVsnAIiFCxdqbEulUkmvQz7D63Xr1hUtW7aUnh89elQAEJUrVxZJSUlS+6+//ioAiCVLlkjrdnV1Fb6+vtJ2hBDi+fPnwsXFRbz//vsa22ratKmoV6+e9Pzx48cCgJg2bZrUFh0dLfT19cXs2bPVXnvlyhVRrlw5jfbIyEgBQGzcuFFqmzZtmtqphpMnTwoAYuvWrWqvPXjwoEa7k5OT6NSpk0bt2k5f5K194sSJwsbGRnh4eKgd082bNws9PT1x8uRJtdevXr1aABCnT5/W2F5uLVu2lNa3f/9+Ua5
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Распределение классов после SMOTE (в процентах):\n",
"Response\n",
"0 50.0\n",
"1 50.0\n",
"Name: proportion, dtype: float64\n"
]
}
],
"source": [
"from imblearn.over_sampling import SMOTE\n",
"\n",
"# Применение SMOTE к обучающей выборке\n",
"smote = SMOTE(random_state=42)\n",
"X_train_balanced, y_train_balanced = smote.fit_resample(X_train, y_train)\n",
"\n",
"# Проверим распределение после аугментации\n",
"plot_class_distribution(y_train_balanced, \"Распределение классов после SMOTE (обучающая выборка)\")\n",
"\n",
"# Проверим процентное распределение\n",
"balanced_distribution = y_train_balanced.value_counts(normalize=True) * 100\n",
"print(\"Распределение классов после SMOTE (в процентах):\")\n",
"print(balanced_distribution)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>Для первой бизнес цели:</b>"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Размерность выборки после RandomOverSampler:\n",
"X_train_res: (2954, 15)\n",
"y_train_res: (2954,)\n",
"\n",
"Распределение классов после балансировки (в процентах):\n",
"NumWebPurchases\n",
"2 7.142857\n",
"5 7.142857\n",
"1 7.142857\n",
"8 7.142857\n",
"9 7.142857\n",
"3 7.142857\n",
"11 7.142857\n",
"7 7.142857\n",
"6 7.142857\n",
"4 7.142857\n",
"0 7.142857\n",
"10 7.142857\n",
"23 7.142857\n",
"27 7.142857\n",
"Name: proportion, dtype: float64\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABbS0lEQVR4nO3dd1gU1/s28HtpC1IFqYqAvTdUYm+IokGNRqOSBLtRiAq2kKhYYrCLLWoSRRM11qiJiQWwGyuKSixfMCg27EgxFNnz/uHL/lwXkMXFXcf7c117yc6cPXPP7DI+zJyZlQkhBIiIiIgkykDXAYiIiIhKE4sdIiIikjQWO0RERCRpLHaIiIhI0ljsEBERkaSx2CEiIiJJY7FDREREksZih4iIiCTNSNcBiIhIv2VlZeHx48cwMjKCg4ODruMQaYxHdoiISE10dDS6desGGxsbmJmZoXz58hg9erSuYxGVCIsdiVqzZg1kMpnyYWpqimrVqiEoKAj37t3TdTwi0mPff/89OnXqhKdPn2LRokWIiopCVFQUpk+frutoRCXC01gSN336dHh4eCArKwtHjx7F8uXL8ddffyE+Ph5lypTRdTwi0jMJCQkICQnBsGHD8P3330Mmk+k6EtEbY7Ejcb6+vmjcuDEAYMiQIbCzs8OCBQuwc+dO9OvXT8fpiEjfLF68GE5OTli8eDELHZIMnsZ6z7Rv3x4AkJSUBAB4/Pgxxo0bh7p168LCwgJWVlbw9fXF+fPn1V6blZWFqVOnolq1ajA1NYWzszN69uyJa9euAQCuX7+ucurs1Ufbtm2VfR08eBAymQybNm3C119/DScnJ5ibm6Nbt264efOm2rJPnjyJzp07w9raGmXKlEGbNm1w7NixAtexbdu2BS5/6tSpam3XrVsHT09PmJmZwdbWFn379i1w+UWt28sUCgUiIiJQu3ZtmJqawtHREcOHD8eTJ09U2rm7u+PDDz9UW05QUJBanwVlnzt3rto2BYDs7GyEhYWhSpUqkMvlcHV1xYQJE5CdnV3gtnpZ27Zt1fqbOXMmDAwMsGHDhhJtj3nz5qF58+aws7ODmZkZPD09sXXr1gKXv27dOjRt2hRlypRB2bJl0bp1a+zbt0+lze7du9GmTRtYWlrCysoKTZo0Ucu2ZcsW5Xtarlw5fPrpp7h9+7ZKmwEDBqhkLlu2LNq2bYsjR468dju9+tpXHwcPHtQ4DwBcuXIFffr0gb29PczMzFC9enV88803au3c3d2Ltdzdu3ejVatWMDc3h6WlJbp27Yp//vnntet34sQJeHp6YuTIkXB0dIRcLkedOnXw448/qrXV5P199dR6QfuF4u6P8vcfBS3LwsICAwYMUJmWmpqK4OBguLu7Qy6Xo0KFCvj888/x8OFDlf5e3YZdu3ZV+/2bOnUqZDKZ8v2ysrKCnZ0dRo8ejaysLJXXP3/+HDNmzEDlypUhl8vh7u6Or7/+Wu338eX31MDAAE5OTvjkk0+QnJysbJP/O7dmzRrltPT0dHh6esLDwwN3794ttB0ABAYGQiaTqW2b9wWP7Lxn8gsTOzs7AMC///6LHTt2oHfv3vDw8MC9e/ewcuVKtGnTBpcuXYKLiwsAIC8vDx9++CFiYmLQt29fjB49Gunp6YiKikJ8fDwqV66sXEa/fv3QpUsXleWGhoYWmGfmzJmQyWSYOHEi7t+/j4iICHh7eyMuLg5mZmYAgP3798PX1xeenp4ICwuDgYEBIiMj0b59exw5cgRNmzZV67dChQoIDw8HAGRkZGDEiBEFLnvy5Mno06cPhgwZggcPHmDJkiVo3bo1zp07BxsbG7XXDBs2DK1atQIA/Pbbb9i+fbvK/OHDh2PNmjUYOHAgRo0ahaSkJCxduhTnzp3DsWPHYGxsXOB20ERqaqpy3V6mUCjQrVs3HD16FMOGDUPNmjVx8eJFLFy4EP/73/+wY8cOjZYTGRmJSZMmYf78+ejfv3+BbV63PRYtWoRu3brB398fOTk52LhxI3r37o1du3aha9euynbTpk3D1KlT0bx5c0yfPh0mJiY4efIk9u/fDx8fHwAv/rMcNGgQateujdDQUNjY2ODcuXPYs2ePMl/+tm/SpAnCw8Nx7949LFq0CMeOHVN7T8uVK4eFCxcCAG7duoVFixahS5cuuHnzZoHv/cvkcjl++uknlWmnT5/G4sWLVaYVN8+FCxfQqlUrGBsbY9iwYXB3d8e1a9fwxx9/YObMmWrLb9WqFYYNGwYAuHz5Mr777juV+b/88gsCAgLQqVMnzJ49G8+ePcPy5cvRsmVLnDt3Du7u7oWu26NHj3DmzBkYGRkhMDAQlStXxo4dOzBs2DA8evQIX331lbJtcd/fly1cuBDlypUDALV1K+7+SBMZGRlo1aoVLl++jEGDBqFRo0Z4+PAhfv/9d9y6dUuZ5VWHDx/GX3/9VWi/ffr0gbu7O8LDw3HixAksXrwYT548wc8//6xsM2TIEKxduxYff/wxxo4di5MnTyI8PByXL19W+13Jf08VCgXi4+MRERGBO3fuFFqA5+bmolevXkhOTsaxY8fg7OxcaNbExMQCi9X3iiBJioyMFABEdHS0ePDggbh586bYuHGjsLOzE2ZmZuLWrVtCCCGysrJEXl6eymuTkpKEXC4X06dPV05bvXq1ACAWLFigtiyFQqF8HQAxd+5ctTa1a9cWbdq0UT4/cOCAACDKly8v0tLSlNM3b94sAIhFixYp+65ataro1KmTcjlCCPHs2TPh4eEhOnbsqLas5s2bizp16iifP3jwQAAQYWFhymnXr18XhoaGYubMmSqvvXjxojAyMlKbnpCQIACItWvXKqeFhYWJl3+Fjhw5IgCI9evXq7x2z549atPd3NxE165d1bIHBgaKV38tX80+YcIE4eDgIDw9PVW26S+//CIMDAzEkSNHVF6/YsUKAUAcO3ZMbXkva9OmjbK/P//8UxgZGYmxY8cW2LY420OIF+/Ty3JyckSdOnVE+/btVfoyMDAQH330kdpnMf89T01NFZaWlsLLy0v8999/BbbJyckRDg4Ook6dOiptdu3aJQCIKVOmKKcFBAQINzc3lX5++OEHAUCcOnWqwHV++bXm5uZq07ds2SIAiAMHDmicp3Xr1sLS0lLcuHGjwHV7Wfny5cXAgQOVz/N/l/KXm56eLmxsbMTQoUNVXpeSkiKsra3Vpr/Kzc1NABBr1qxRTnv+/Lno0KGDkMvl4uHDh8rpxXl/8/34448CgMo6vvyZE6L4+6P8dd6yZYvacszNzUVAQIDy+ZQpUwQA8dtvv6m1zd++r25DIYTw8vISvr6+ar9/+Z/zbt26qfQ1cuRIAUCcP39eCCFEXFycACCGDBmi0m7cuHECgNi/f79ympubm0pmIYTo37+/KFOmjMp2ACAiIyOFQqEQ/v7+okyZMuLkyZNq2yu/Xb4+ffqIOnXqCFdXV7XlvC94GkvivL29YW9vD1dXV/Tt2xcWFhbYvn07ypcvD+DFX6gGBi8+Bnl5eXj06BEsLCxQvXp1nD17VtnPtm3bUK5cOXz55Zdqy3iT8/qff/45LC0tlc8//vhjODs7K/+iiouLQ0JCAvr3749Hjx7h4cOHePjwITIzM9GhQwccPnwYCoVCpc+srCyYmpoWudzffvsNCoUCffr0Ufb58OFDODk5oWrVqjhw4IBK+5ycHAAvtldhtmzZAmtra3Ts2FGlT09PT1hYWKj1mZubq9Lu4cOHaofBX3X79m0sWbIEkydPhoWFhdrya9asiRo1aqj0mX/q8tXlF+bUqVPo06cPevXqhblz5xbYpjjbA4Dy6BwAPHnyBE+fPkWrVq1UPls7duyAQqHAlClTlJ/FfPmfraioKKSnp+Orr75Se2/z25w5cwb379/HyJEjVdp07doVNWrUwJ9//qnyOoVCodxGcXFx+Pnnn+Hs7IyaNWsWuU7FVdw8Dx48wOHDhzFo0CBUrFixwHV7WU5OTpHbPSoqCqmpqejXr5/K58DQ0BBeXl7F+hw4Ojris88+Uz43NDTEmDFjkJ2djejoaOX04ry/L+cGiv7MFHd/lC8
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from imblearn.over_sampling import RandomOverSampler\n",
"import pandas as pd\n",
"\n",
"# Применение RandomOverSampler для балансировки классов\n",
"ros = RandomOverSampler(random_state=42)\n",
"X_train_res, y_train_res = ros.fit_resample(X_train_2, y_train_2)\n",
"\n",
"# Выводим новые размеры выборки\n",
"print(f\"Размерность выборки после RandomOverSampler:\")\n",
"print(f\"X_train_res: {X_train_res.shape}\")\n",
"print(f\"y_train_res: {y_train_res.shape}\")\n",
"\n",
"# Распределение классов в обучающей выборке после балансировки\n",
"class_distribution_res = pd.Series(y_train_res).value_counts(normalize=True) * 100\n",
"print(\"\\nРаспределение классов после балансировки (в процентах):\")\n",
"print(class_distribution_res)\n",
"\n",
"# Для визуализации можно использовать график\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Функция для визуализации распределения классов\n",
"def plot_class_distribution(y, title):\n",
" sns.countplot(x=y, color=\"orange\")\n",
" plt.title(title)\n",
" plt.xlabel(\"Response (Целевой признак)\")\n",
" plt.ylabel(\"Количество записей\")\n",
" plt.show()\n",
"\n",
"# Построение графика распределения классов\n",
"plot_class_distribution(y_train_res, \"Распределение классов после балансировки\")\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>7-8. Делаем конструирование признаков</b>"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Income Kidhome Teenhome MntWines MntFruits \\\n",
"customer_id \n",
"0 -0.263557 1 1 68 0 \n",
"1 -1.102440 1 0 18 3 \n",
"2 0.633408 0 1 225 162 \n",
"3 1.135917 1 0 739 107 \n",
"4 1.299116 0 0 395 183 \n",
"\n",
" MntMeatProducts MntFishProducts MntSweetProducts MntGoldProds \\\n",
"customer_id \n",
"0 16 0 0 8 \n",
"1 19 3 3 6 \n",
"2 387 106 36 29 \n",
"3 309 140 80 35 \n",
"4 565 166 141 28 \n",
"\n",
" AcceptedCmp1 AcceptedCmp2 AcceptedCmp3 AcceptedCmp4 \\\n",
"customer_id \n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" AcceptedCmp5 Recency Income_binned \n",
"customer_id \n",
"0 0 6 1.0 \n",
"1 0 67 0.0 \n",
"2 0 77 1.0 \n",
"3 0 2 2.0 \n",
"4 0 19 2.0 \n",
"Размерность выборки после RandomOverSampler:\n",
"X_train_res: (2954, 17)\n",
"y_train_res: (2954,)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"d:\\study\\3_course\\aim\\aimenv\\Lib\\site-packages\\featuretools\\entityset\\entityset.py:1733: UserWarning: index customer_id not found in dataframe, creating new integer column\n",
" warnings.warn(\n",
"d:\\study\\3_course\\aim\\aimenv\\Lib\\site-packages\\featuretools\\synthesis\\deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created\n",
" warnings.warn(\n"
]
}
],
"source": [
"import pandas as pd\n",
"from sklearn.preprocessing import KBinsDiscretizer, MinMaxScaler, StandardScaler\n",
"import featuretools as ft\n",
"from imblearn.over_sampling import RandomOverSampler\n",
"\n",
"\n",
"# 1. One-hot encoding для категориальных признаков\n",
"X_train_2 = pd.get_dummies(X_train_2, drop_first=True)\n",
"X_test_2 = pd.get_dummies(X_test_2, drop_first=True)\n",
"\n",
"# 2. Дискретизация числовых признаков\n",
"discretizer = KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='uniform')\n",
"X_train_2['Income_binned'] = discretizer.fit_transform(X_train_2[['Income']])\n",
"X_test_2['Income_binned'] = discretizer.transform(X_test_2[['Income']])\n",
"\n",
"# 3. Масштабирование признаков\n",
"scaler_minmax = MinMaxScaler()\n",
"X_train_2[['Income']] = scaler_minmax.fit_transform(X_train_2[['Income']])\n",
"X_test_2[['Income']] = scaler_minmax.transform(X_test_2[['Income']])\n",
"\n",
"# Стандартизация признаков\n",
"scaler_standard = StandardScaler()\n",
"X_train_2[['Income']] = scaler_standard.fit_transform(X_train_2[['Income']])\n",
"X_test_2[['Income']] = scaler_standard.transform(X_test_2[['Income']])\n",
"\n",
"# 4. Применение Featuretools для создания признаков\n",
"es = ft.EntitySet(id=\"data\")\n",
"\n",
"# Мы добавляем данные в EntitySet с помощью метода add_dataframe\n",
"es = es.add_dataframe(\n",
" dataframe_name=\"customer_data\",\n",
" dataframe=X_train_2,\n",
" index=\"customer_id\" \n",
")\n",
"\n",
"# Применяем Featuretools для создания признаков\n",
"# Изменения: теперь указываем `target_dataframe_name` вместо `target_entity`\n",
"features, feature_names = ft.dfs(entityset=es, target_dataframe_name=\"customer_data\")\n",
"\n",
"print(features.head())\n",
"\n",
"# 5. Балансировка выборки с помощью RandomOverSampler\n",
"ros = RandomOverSampler(random_state=42)\n",
"X_train_res, y_train_res = ros.fit_resample(X_train_2, y_train_2)\n",
"\n",
"print(f\"Размерность выборки после RandomOverSampler:\")\n",
"print(f\"X_train_res: {X_train_res.shape}\")\n",
"print(f\"y_train_res: {y_train_res.shape}\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Прилетает ошибка - UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created\n",
" warnings.warn. Она значит, что EntitySet состоит из одного DataFrame. Т.е. только одна сущность.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>Следующая бизнес-цель</b>"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Income Kidhome Teenhome NumWebPurchases MntWines \\\n",
"customer_id \n",
"0 0.739837 0 1 6 522 \n",
"1 -0.203068 1 1 1 22 \n",
"2 0.160233 0 1 7 479 \n",
"3 1.049812 0 0 4 594 \n",
"4 0.119182 1 2 6 416 \n",
"\n",
" MntFruits MntMeatProducts MntFishProducts MntSweetProducts \\\n",
"customer_id \n",
"0 0 522 227 120 \n",
"1 2 10 6 4 \n",
"2 5 82 7 17 \n",
"3 51 631 72 55 \n",
"4 0 26 0 0 \n",
"\n",
" MntGoldProds AcceptedCmp1 AcceptedCmp2 AcceptedCmp3 \\\n",
"customer_id \n",
"0 134 0 0 0 \n",
"1 34 0 0 0 \n",
"2 171 0 0 1 \n",
"3 32 0 0 0 \n",
"4 4 0 0 0 \n",
"\n",
" AcceptedCmp4 AcceptedCmp5 Recency Income_binned \n",
"customer_id \n",
"0 0 0 28 0.0 \n",
"1 0 0 84 0.0 \n",
"2 0 0 30 0.0 \n",
"3 0 0 42 0.0 \n",
"4 1 0 11 0.0 \n",
"Размерность выборки после RandomOverSampler для первой бизнес-цели:\n",
"X_train_res_1: (2258, 18)\n",
"y_train_res_1: (2258,)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"d:\\study\\3_course\\aim\\aimenv\\Lib\\site-packages\\featuretools\\entityset\\entityset.py:1733: UserWarning: index customer_id not found in dataframe, creating new integer column\n",
" warnings.warn(\n",
"d:\\study\\3_course\\aim\\aimenv\\Lib\\site-packages\\featuretools\\synthesis\\deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created\n",
" warnings.warn(\n"
]
}
],
"source": [
"# 1. One-hot encoding для категориальных признаков\n",
"X_train = pd.get_dummies(X_train, drop_first=True)\n",
"X_test = pd.get_dummies(X_test, drop_first=True)\n",
"\n",
"# 2. Дискретизация числовых признаков\n",
"discretizer = KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='uniform')\n",
"X_train['Income_binned'] = discretizer.fit_transform(X_train[['Income']])\n",
"X_test['Income_binned'] = discretizer.transform(X_test[['Income']])\n",
"\n",
"# 3. Масштабирование признаков\n",
"scaler_minmax = MinMaxScaler()\n",
"X_train[['Income']] = scaler_minmax.fit_transform(X_train[['Income']])\n",
"X_test[['Income']] = scaler_minmax.transform(X_test[['Income']])\n",
"\n",
"# Стандартизация признаков\n",
"scaler_standard = StandardScaler()\n",
"X_train[['Income']] = scaler_standard.fit_transform(X_train[['Income']])\n",
"X_test[['Income']] = scaler_standard.transform(X_test[['Income']])\n",
"\n",
"# 4. Применение Featuretools для создания признаков\n",
"es = ft.EntitySet(id=\"data\")\n",
"es = es.add_dataframe(dataframe_name=\"customer_data\", dataframe=X_train, index=\"customer_id\")\n",
"\n",
"# Применяем deep feature synthesis для создания новых признаков\n",
"features, feature_names = ft.dfs(entityset=es, target_dataframe_name=\"customer_data\", max_depth=2)\n",
"\n",
"\n",
"print(features.head())\n",
"\n",
"# 5. Балансировка выборки с помощью RandomOverSampler\n",
"ros = RandomOverSampler(random_state=42)\n",
"X_train_res_1, y_train_res_1 = ros.fit_resample(X_train, y_train)\n",
"\n",
"print(f\"Размерность выборки после RandomOverSampler для первой бизнес-цели:\")\n",
"print(f\"X_train_res_1: {X_train_res_1.shape}\")\n",
"print(f\"y_train_res_1: {y_train_res_1.shape}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>9. Оценить качество каждого набора признаков</b><br><br>Настраиваем модель"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X_train columns: Index(['Income', 'Kidhome', 'Teenhome', 'NumWebPurchases', 'MntWines',\n",
" 'MntFruits', 'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts',\n",
" 'MntGoldProds', 'AcceptedCmp1', 'AcceptedCmp2', 'AcceptedCmp3',\n",
" 'AcceptedCmp4', 'AcceptedCmp5', 'Recency', 'Income_binned'],\n",
" dtype='object')\n",
"X_test columns: Index(['Income', 'Kidhome', 'Teenhome', 'NumWebPurchases', 'MntWines',\n",
" 'MntFruits', 'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts',\n",
" 'MntGoldProds', 'AcceptedCmp1', 'AcceptedCmp2', 'AcceptedCmp3',\n",
" 'AcceptedCmp4', 'AcceptedCmp5', 'Recency', 'Income_binned'],\n",
" dtype='object')\n"
]
}
],
"source": [
"X_train = X_train.drop(columns=['customer_id'], errors='ignore')\n",
"print(\"X_train columns:\", X_train.columns)\n",
"print(\"X_test columns:\", X_test.columns)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 0.8761261261261262\n",
"ROC-AUC: 0.8519339641315966\n",
"Время обучения модели: 0.2374 секунд\n",
"Время предсказания: 0.0086 секунд\n",
"Средняя точность по кросс-валидации: 0.8736\n",
"Корреляция признаков с целевой переменной:\n",
"Response 1.000000\n",
"AcceptedCmp5 0.323374\n",
"AcceptedCmp1 0.297345\n",
"AcceptedCmp3 0.254005\n",
"MntWines 0.246299\n",
"MntMeatProducts 0.237746\n",
"AcceptedCmp4 0.180205\n",
"AcceptedCmp2 0.169294\n",
"NumWebPurchases 0.151431\n",
"MntGoldProds 0.140332\n",
"Income 0.133047\n",
"MntFruits 0.122443\n",
"MntSweetProducts 0.116170\n",
"MntFishProducts 0.108145\n",
"Kidhome -0.077909\n",
"Teenhome -0.153901\n",
"Recency -0.199766\n",
"Name: Response, dtype: float64\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA38AAAMhCAYAAABYMwgIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzddXgURx/A8e9d3N1dIAIEd3fXFiteoFhpKZQCbSFAhbZIKVAoLS4t7u7ubiFACBLi7nr7/nHkkksuQktfaDuf57kHsje797uZvdmdndlZmSRJEoIgCIIgCIIgCMK/mvxNByAIgiAIgiAIgiD8/UTjTxAEQRAEQRAE4T9ANP4EQRAEQRAEQRD+A0TjTxAEQRAEQRAE4T9ANP4EQRAEQRAEQRD+A0TjTxAEQRAEQRAE4T9ANP4EQRAEQRAEQRD+A0TjTxAEQRAEQRAE4T9ANP4EQRAEQRAEQRD+A0TjTxAEQRAEQRAE4T9ANP4EQfhHWbVqFTKZjCtXrhR777fffkMmk9GtWzfy8vLeQHSCIAiCIAhvL9H4EwThX2H79u2MGjWKxo0bs2HDBrS0tN50SIIgCIIgCG8V0fgTBOEf78SJE/Tt2xd/f392796Nvr7+mw5JEARBEAThrSMaf4Ig/KPduHGDrl274uDgwMGDBzEzMyuWZvPmzdSsWRMDAwOsra3p378/L168UEszePBgjI2Nefz4MW3btsXIyAhHR0dmzpyJJEmqdE+ePEEmkzFnzhx+/PFH3NzcMDAwoGnTpty5c6fYZ9+/f593330XS0tL9PX1qVWrFrt27dL4XZo1a4ZMJiv2WrVqlVq6JUuWULlyZQwNDdXSbdmyRW1blStXLvYZc+bMQSaT8eTJE9Wy/KG0hZcpFAoCAgI0fv6xY8do3LgxRkZGmJub07VrV4KCgtTSTJ8+HZlMRmxsrNryK1euFNtmft4XtWXLFmQyGSdOnFAtO336ND179sTV1RU9PT1cXFz45JNPyMjI0Lh+rVq1MDExUcunOXPmFEtbWH5+6OrqEhMTo/be+fPnVdspPPS4PHENHjxYY/kWfuWXgbu7O506deLQoUNUq1YNfX19/P392bZtm8ZYy1N2r5LP2dnZTJs2jZo1a2JmZoaRkRGNGzfm+PHjpeZdPnd391K/Z2EymYwPP/yQ9evX4+Pjg76+PjVr1uTUqVNq6fL3qcJSU1Oxt7cvFv/IkSOpUKEChoaGWFpa0qJFC06fPl0sxk6dOhWL/cMPPyz2OStXrqRFixbY2tqip6eHv78/S5Ys0fi9Bw8erLbsgw8+QF9fXy0+gMWLF1OpUiX09PRwdHRkzJgxJCYmqqUpWidYW1vTsWNHjXWNIAhCeWi/6QAEQRD+rJCQENq1a4eenh4HDx7EwcGhWJpVq1YxZMgQateuzaxZs4iKiuKnn37i7NmzXL9+HXNzc1XavLw82rVrR7169fjhhx84cOAAgYGB5ObmMnPmTLXtrlmzhpSUFMaMGUNmZiY//fQTLVq04Pbt29jZ2QFw9+5dGjZsiJOTE5MnT8bIyIhNmzbRrVs3tm7dSvfu3YvF6+vryxdffAFAbGwsn3zyidr7GzduZPTo0TRr1oyxY8diZGREUFAQ33777V/NTjVr167l9u3bxZYfOXKE9u3b4+npyfTp08nIyGDhwoU0bNiQa9eu4e7u/lrjKGrz5s2kp6czatQorKysuHTpEgsXLiQsLIzNmzer0p0/f55evXpRtWpVvvvuO8zMzDTmZ2m0tLRYt26d2jorV65EX1+fzMzMV45rxIgRtGrVSrXOgAED6N69Oz169FAts7GxUf3/4cOH9O7dm5EjRzJo0CBWrlxJz549OXDgAK1bty4x7pLK7lUkJyezbNky+vbty/Dhw0lJSWH58uW0bduWS5cuUa1atTK3Ua1aNSZMmKC2bM2aNRw+fLhY2pMnT7Jx40Y++ugj9PT0WLx4Me3atePSpUsaL2Lkmzt3LlFRUcWWZ2dn079/f5ydnYmPj2fp0qW0a9eOoKAgXF1dy86AIpYsWUKlSpXo0qUL2tra7N69m9GjR6NQKBgzZkyJ6wUGBrJ8+XI2btxIs2bNVMunT5/OjBkzaNWqFaNGjSI4OJglS5Zw+fJlzp49i46Ojiptfp0gSRIhISHMmzePDh068OzZs1f+HoIgCEiCIAj/ICtXrpQAac+ePZKXl5cESG3atNGYNjs7W7K1tZUqV64sZWRkqJbv2bNHAqRp06aplg0aNEgCpLFjx6qWKRQKqWPHjpKurq4UExMjSZIkhYaGSoBkYGAghYWFqdJevHhRAqRPPvlEtaxly5ZSlSpVpMzMTLVtNmjQQKpQoUKxeBs2bCg1b95c9Xf+Z61cuVK1rG/fvpK5ubna9zl+/LgESJs3b1Yta9q0qVSpUqVinzF79mwJkEJDQ1XL8vM0f1lmZqbk6uoqtW/fvtjnV6tWTbK1tZXi4uJUy27evCnJ5XJp4MCBqmWBgYESoMq3fJcvXy62zUGDBklGRkbFYt28ebMESMePH1ctS09PL5Zu1qxZkkwmk54+fapaNmXKFAmQIiIiVMvy83P27NnFtlFYfn707dtXqlKlimp5WlqaZGpqKr333nsSIF2+fPmV4yoMkAIDAzW+5+bmJgHS1q1bVcuSkpIkBwcHqXr16sViLU/ZvUo+5+bmSllZWWrpEhISJDs7O+n999/XGHPR+Dt27Fhs+ZgxY6Sipx6ABEhXrlxRLXv69Kmkr68vde/eXbUsf5/KFx0dLZmYmKi+a+H4i7p06ZIESFu2bPlTMWoq37Zt20qenp5qy9zc3KRBgwZJkiRJS5culQBp4cKFammio6MlXV1dqU2bNlJeXp5q+aJFiyRAWrFihWpZ06ZNpaZNm6qt//nnn0uAFB0dXeL3FQRBKIkY9ikIwj/S4MGDef78Oe+99x6HDh1S6/XJd+XKFaKjoxk9erTafYAdO3bE19eXvXv3Flvnww8/VP0/fzhadnY2R44cUUvXrVs3nJycVH/XqVOHunXrsm/fPgDi4+M5duwYvXr1IiUlhdjYWGJjY4mLi6Nt27Y8fPiw2NDT7Oxs9PT0Sv3eKSkpGBoa/q33Nf7888/ExcURGBiotjwiIoIbN24wePBgLC0tVcsDAgJo3bq16rsXFh8fr/rusbGxJCUllfi5hdPFxsaSkpJSLI2BgYHq/2lpacTGxtKgQQMkSeL69euq91JSUpDL5Wo9u69qwIAB3L9/XzW8c+vWrZiZmdGyZcs/HdercHR0VOsdNjU1ZeDAgVy/fp3IyEiN65RUdq9KS0sLXV1dQDmMND4+ntzcXGrVqsW1a9f+0rY1qV+/PjVr1lT97erqSteuXTl48GCJM/d+9dVXmJmZ8dFHH2l8PzMzk9jYWIKCgvjpp58wMDCgVq1aamlycnKK7XdFe3VBvXyTkpKIjY2ladOmPH78WOM+vXPnTkaPHs3EiRPV6hRQ9p5nZ2czbtw45PKC07Dhw4djamparF7KjzEmJobz58+zfft2AgICsLa21vi9BUEQSiMaf4Ig/CPFx8ezbt06Vq9eTbVq1fj444+LnYQ9ffoUAB8fn2Lr+/r6qt7PJ5fL8fT0VFtWsWJFALV7qgAqVKhQbJsVK1ZUpXv06BGSJDF16lRsbGzUXvkn5tHR0WrrJyYmarwnq7D69esTHh7O9OnTefbsWZkNqleVlJTEt99+y/jx41XDV/OVlp9+fn7ExsaSlpamttzHx0ftuxce9lhYWlpasXx6//33i6V79uyZqvFpbGyMjY0NTZs2VcWer379+igUCj7++GNCQkKIjY0lISHhlfLCxsaGjh07smLFCgBWrFjBoEGD1E7YXzWuV+Ht7V3s3rOS9sf8zymp7P6M1atXExAQgL6+PlZWVtjY2LB3797Xur/lK+n3lJ6eXuy+S4DQ0FCWLl3KjBkzSrwQsmrVKmxsbPD39+fo0aMcPnwYNzc3tTSHDh0qtt8tX7682LbOnj1Lq1atVPe52tjY8PnnnwPFy/fGjRv07duXvLw84uPji22rpN+Rrq4unp6exeqlc+fOYWNjg62tLQ0
"text/plain": [
"<Figure size 1000x800 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 0.89 0.98 0.93 377\n",
" 1 0.70 0.31 0.43 67\n",
"\n",
" accuracy 0.88 444\n",
" macro avg 0.79 0.64 0.68 444\n",
"weighted avg 0.86 0.88 0.86 444\n",
"\n"
]
}
],
"source": [
"import time\n",
"import pandas as pd\n",
"import numpy as np\n",
"from sklearn.model_selection import train_test_split, cross_val_score\n",
"from sklearn.metrics import accuracy_score, roc_auc_score, classification_report\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"# 1. Оценка предсказательной способности (Accuracy и ROC-AUC для бинарной классификации)\n",
"model = RandomForestClassifier(random_state=42)\n",
"\n",
"start_time = time.perf_counter()\n",
"model.fit(X_train, y_train)\n",
"end_time = time.perf_counter()\n",
"\n",
"train_time = end_time - start_time\n",
"\n",
"y_pred = model.predict(X_test)\n",
"accuracy = accuracy_score(y_test, y_pred)\n",
"roc_auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1])\n",
"\n",
"print(f\"Accuracy: {accuracy}\")\n",
"print(f\"ROC-AUC: {roc_auc}\")\n",
"print(f\"Время обучения модели: {train_time:.4f} секунд\")\n",
"\n",
"# 2. Оценка скорости вычисления (время предсказания)\n",
"start_time = time.perf_counter()\n",
"y_pred = model.predict(X_test)\n",
"end_time = time.perf_counter()\n",
"predict_time = end_time - start_time\n",
"\n",
"print(f\"Время предсказания: {predict_time:.4f} секунд\")\n",
"\n",
"# 3. Оценка надежности модели с помощью перекрестной проверки\n",
"cv_scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')\n",
"mean_cv_score = np.mean(cv_scores)\n",
"print(f\"Средняя точность по кросс-валидации: {mean_cv_score:.4f}\")\n",
"\n",
"# 4. Оценка корреляции признаков с целевой переменной\n",
"correlation_matrix = pd.concat([X, y], axis=1).corr()\n",
"correlation_with_target = correlation_matrix['Response'].sort_values(ascending=False)\n",
"print(\"Корреляция признаков с целевой переменной:\")\n",
"print(correlation_with_target)\n",
"\n",
"# Визуализация корреляции\n",
"plt.figure(figsize=(10, 8))\n",
"sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')\n",
"plt.title('Корреляционная матрица признаков')\n",
"plt.show()\n",
"\n",
"# Дополнительная информация о модели\n",
"print(classification_report(y_test, y_pred))\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "aimenv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}