1001 lines
569 KiB
Plaintext
1001 lines
569 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"**Lab 3 Malafeev PIbd-31**"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"1.Определим две бизнес цели. Во-первых, у нас есть в датасете есть столбец NumWebPurchases — количество покупок через интернет. Через него поставим первую бизнес цель: Увеличение продаж через интернет-магазин. Также у нас имеется столбец Response — отклик на текущую кампанию. Через него мы поставим вторую бизнес цель: Анализ отклика на предыдущие кампании для повышения их эффективности."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"2.Цели технического проекта. Для увеличения интернет-продаж:\n",
|
|||
|
"\n",
|
|||
|
"Разработать модели сегментации клиентов на основе их характеристик (доход, покупки).\n",
|
|||
|
"Создать прогнозные модели для определения вероятности веб-покупок.\n",
|
|||
|
"\n",
|
|||
|
"Для оптимизации кампаний:\n",
|
|||
|
"\n",
|
|||
|
"Провести анализ данных об откликах клиентов на прошлые кампании.\n",
|
|||
|
"Сформировать рекомендации по улучшению таргетирования на основе анализа успешных кампаний."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"3.код\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"((1329, 16), (443, 16), (444, 16))"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"import pandas as pd\n",
|
|||
|
"dataset = pd.read_csv(\".//datasetlab1//marketing_campaign.csv\", sep=\"\\t\")\n",
|
|||
|
"\n",
|
|||
|
"# Удаление неинформативных столбцов и выбор целевых данных для бизнес-целей\n",
|
|||
|
"columns_to_use = [\n",
|
|||
|
" \"Income\", \"Kidhome\", \"Teenhome\", \"NumWebPurchases\", \"MntWines\", \n",
|
|||
|
" \"MntFruits\", \"MntMeatProducts\", \"MntFishProducts\", \"MntSweetProducts\",\n",
|
|||
|
" \"MntGoldProds\", \"AcceptedCmp1\", \"AcceptedCmp2\", \"AcceptedCmp3\", \n",
|
|||
|
" \"AcceptedCmp4\", \"AcceptedCmp5\", \"Response\", \"Recency\"\n",
|
|||
|
"]\n",
|
|||
|
"\n",
|
|||
|
"# Очистка данных от пропусков и выбор только необходимых столбцов\n",
|
|||
|
"filtered_data = dataset[columns_to_use].dropna()\n",
|
|||
|
"\n",
|
|||
|
"# Разделение данных на признаки (X) и целевую переменную (y) для оптимизации кампаний\n",
|
|||
|
"X = filtered_data.drop(columns=[\"Response\"])\n",
|
|||
|
"y = filtered_data[\"Response\"]\n",
|
|||
|
"\n",
|
|||
|
"# Разбиение на обучающую (60%), контрольную (20%) и тестовую (20%) выборки\n",
|
|||
|
"X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state=42, stratify=y)\n",
|
|||
|
"X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp)\n",
|
|||
|
"\n",
|
|||
|
"# Проверка размера выборок\n",
|
|||
|
"X_train.shape, X_val.shape, X_test.shape"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"5.Выполним оценку сбалансированности.\n",
|
|||
|
"Подсказка: за 0 будем брать не отклик, за 1 - отклик клиента на рекламу."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 5,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAHHCAYAAABeLEexAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABQ6klEQVR4nO3deVhUZf8/8PewD9sgIJuikKKCe4CGuKUoimuaPiolmam5kXtabmlG7rjl0lOoaWlqYlmSiqampIYrbmEPKi6AiqwGKHP//vDH+ToOKEMDA57367rmupj73Oecz5yZObw55z4HhRBCgIiIiEjGjAxdABEREZGhMRARERGR7DEQERERkewxEBEREZHsMRARERGR7DEQERERkewxEBEREZHsMRARERGR7DEQERFVAbm5uUhOTsaDBw8MXQrpWXZ2Nq5du4bc3FxDlyJrDERERJXUtm3b0LFjR9jY2MDa2hq1atXCggULDF1WlZCTk4PIyEjpeUZGBlatWmW4gp4ihMC6devw2muvwdLSEra2tvD09MSmTZsMXZqsKfivO15O69evx5AhQ6Tn5ubmqFWrFjp37owZM2bA2dnZgNUR0YtMnToV8+fPR69evTBgwAA4OjpCoVCgXr16cHd3N3R5lV5hYSFUKhXWrl2Ltm3bYvHixbh8+TJiYmIMXRoGDhyIrVu3IiwsDN27d4dKpYJCoUCTJk1QvXp1Q5cnWyaGLoDK15w5c+Dp6Ym8vDz8/vvvWL16NX755RckJCTA0tLS0OURUTEOHTqE+fPnIyIiAlOnTjV0OVWSsbExPvnkEwwePBhqtRq2trb4+eefDV0WNm7ciK1bt2LTpk0YNGiQocuhp/AI0Uuq6AjRyZMn4efnJ7VPnDgRS5YswbfffouBAwcasEIiKkmPHj2Qnp6Oo0ePGrqUKu/mzZtITk6Gt7c37OzsDF0OGjdujCZNmmDz5s2GLoWewTFEMtOhQwcAQFJSEgAgPT0dkyZNQuPGjWFtbQ1bW1t07doVZ8+e1Zo3Ly8Ps2fPRr169WBhYQFXV1f06dMHf//9NwDg2rVrUCgUJT7at28vLeu3336DQqHA1q1b8dFHH8HFxQVWVlbo2bMnkpOTtdZ9/PhxdOnSBSqVCpaWlmjXrl2Jvyzat29f7Ppnz56t1XfTpk3w9fWFUqmEvb09BgwYUOz6n/fanqZWqxEZGYmGDRvCwsICzs7OGDFihNZAWA8PD3Tv3l1rPWPGjNFaZnG1L1y4UGubAkB+fj5mzZqFunXrwtzcHO7u7pgyZQry8/OL3VZPa9++vdby5s2bByMjI3z77bdl2h6LFi1Cq1at4ODgAKVSCV9fX2zfvr3Y9W/atAktWrSApaUlqlWrhrZt22Lv3r0affbs2YN27drBxsYGtra28Pf316pt27Zt0nvq6OiIt956C7du3dLo884772jUXK1aNbRv3x5Hjhx54Xb6N/MCwIEDB9CmTRtYWVnBzs4OvXr1wqVLlzT6/PHHH2jUqBEGDBgAe3t7KJVK+Pv7Izo6WuqTk5MDKysrfPDBB1rruHnzJoyNjRERESHV7OHhodXv2c/W9evXMWrUKNSvXx9KpRIODg7o168frl27pjFf0ff3t99+k9pOnjyJTp06wcbGBlZWVsVuk/Xr10OhUODPP/+U2u7du1fsZ7x79+7F1lyafcHs2bOlz2LNmjUREBAAExMTuLi4aNVdnKL5ix42NjZo0aKFxvYHnnxnGjVqVOJyir4n69evB/BkYHxCQgLc3d3RrVs32NralritAOB///sf+vXrB3t7e1haWuK1117TOsqly75Ul++4LvvclwVPmclMUXhxcHAA8OQLFx0djX79+sHT0xOpqalYu3Yt2rVrh4sXL8LNzQ3Ak/Px3bt3R2xsLAYMGIAPPvgA2dnZ2LdvHxISElCnTh1pHQMHDkRISIjGeqdNm1ZsPfPmzYNCocCHH36ItLQ0REZGIigoCGfOnIFSqQTw5BdI165d4evri1mzZsHIyAhRUVHo0KEDjhw5ghYtWmgtt2bNmtIvg5ycHIwcObLYdc+YMQP9+/fHe++9h7t372LFihVo27YtTp8+Xexfk8OHD0ebNm0AAD/88AN27typMX3EiBHS0bnw8HAkJSVh5cqVOH36NI4ePQpTU9Nit4MuMjIypNf2NLVajZ49e+L333/H8OHD4e3tjfPnz2Pp0qX466+/tHbmLxIVFYXp06dj8eLFJR7af9H2WLZsGXr27InQ0FAUFBRgy5Yt6NevH3bv3o1u3bpJ/T755BPMnj0brVq1wpw5c2BmZobjx4/jwIED6Ny5M4Anv0zfffddNGzYENOmTYOdnR1Onz6NmJgYqb6ibe/v74+IiAikpqZi2bJlOHr0qNZ76ujoiKVLlwJ4EiCWLVuGkJAQJCcnv/BIQlnn3b9/P7p27YpXXnkFs2fPxj///IMVK1YgMDAQp06dkgLA/fv3sW7dOlhbWyM8PBzVq1fHpk2b0KdPH2zevBkDBw6EtbU13njjDWzduhVLliyBsbGxtJ7vvvsOQgiEhoY+93U86+TJkzh27BgGDBiAmjVr4tq1a1i9ejXat2+Pixcvlnia/erVq2jfvj0sLS0xefJkWFpa4ssvv0RQUBD27duHtm3b6lRHScqyLyiyePFipKam6rS+b775BsCT0PbFF1+gX79+SEhIQP369ctU//379wEA8+fPh4uLCyZPngwLC4tit1VqaipatWqFhw8fIjw8HA4ODtiwYQN69uyJ7du344033tBYdmn2pc8q6Tv+b7ZzlSbopRQVFSUAiP3794u7d++K5ORksWXLFuHg4CCUSqW4efOmEEKIvLw8UVhYqDFvUlKSMDc3F3PmzJHavv76awFALFmyRGtdarVamg+AWLhwoVafhg0binbt2knPDx48KACIGjVqiKysLKn9+++/FwDEsmXLpGV7eXmJ4OBgaT1CCPHw4UPh6ekpOnXqpLWuVq1aiUaNGknP7969KwCIWbNmSW3Xrl0TxsbGYt68eRrznj9/XpiYmGi1JyYmCgBiw4YNUtusWbPE01+hI0eOCABi8+bNGvPGxMRotdeuXVt069ZNq/bRo0eLZ7+Wz9Y+ZcoU4eTkJHx9fTW26TfffCOMjIzEkSNHNOZfs2aNACCOHj2qtb6ntWvXTlrezz//LExMTMTEiROL7Vua7SHEk/fpaQUFBaJRo0aiQ4cOGssyMjISb7zxhtZnseg9z8jIEDY2NqJly5bin3/+KbZPQUGBcHJyEo0aNdLos3v3bgFAzJw5U2oLCwsTtWvX1ljOunXrBABx4sSJYl+zPuZt1qyZcHJyEvfv35fazp49K4yMjMTgwYOlNgACgPjtt9+ktocPHwpvb2/h4uIiCgoKhBBC/PrrrwKA2LNnj8Z6mjRpovHZGDJkiKhVq5ZWPc9+tp59v4QQIi4uTgAQGzdulNqKvr8HDx4UQgjRt29fYWxsLBISEqQ+9+7dEw4ODsLX11dqK9ovnTx5Umor7vsphBDdunXT2M667Aue/SympaUJGxsb0bVrV426S1LcZ3nv3r0CgPj++++ltnbt2omGDRuWuJyifWJUVJTGczMzM/HXX39pbINnt9W4ceMEAI3vc3Z2tvD09BQeHh7Sd6W0+9Kiel/0HS/LPvdlwVNmL7mgoCBUr14d7u7uGDBgAKytrbFz507UqFEDwJOrz4yMnnwMCgsLcf/+fVhbW6N+/fo4deqUtJwdO3bA0dERY8eO1VrHs6dJdDF48GDY2NhIz9988024urril19+AQCcOXMGiYmJGDRoEO7fv4979+7h3r17yM3NRceOHXH48GGo1WqNZebl5cHCwuK56/3hhx+gVqvRv39/aZn37t2Di4sLvLy8cPDgQY3+BQUFAJ5sr5Js27YNKpUKnTp10limr68vrK2ttZb56NEjjX737t1DXl7ec+u+desWVqxYgRkzZsDa2lpr/d7e3mjQoIHGMotOkz67/pKcOHEC/fv3R9++fbFw4cJi+5RmewDQ+Mv0wYMHyMzMRJs2bTQ+W9HR0VCr1Zg5c6b0WSxS9Nnat28
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABUgElEQVR4nO3deVRU9f8/8OeAMKwDIrIlAu7gHiqhuSMo5JJbJSWZuaKmuEWpuGTkkuJCWt9PoZZmqalliQumqeESruQSmLsC4sIAxjrv3x8e7s9xQBkYGrw9H+fMOc77vu+9r3tn4em973tHIYQQICIiIpIpE2MXQERERFSVGHaIiIhI1hh2iIiISNYYdoiIiEjWGHaIiIhI1hh2iIiISNYYdoiIiEjWGHaIiIhI1hh2iIiI/mUajQaZmZn4+++/jV3KfwLDDhERycahQ4ewf/9+6fn+/ftx+PBh4xX0mLS0NEycOBEeHh4wNzdH7dq14ePjA7VabezSZI9hpxpbs2YNFAqF9LCwsECjRo0wbtw4pKenG7s8IqJq5/r16xg7dizOnj2Ls2fPYuzYsbh+/bqxy0Jqairatm2LjRs3YtSoUdixYwf27NmDhIQEWFtbG7s82ath7ALo2ebOnQsvLy/k5eXh0KFDWLVqFX755RckJyfDysrK2OUREVUb/fv3R0xMDFq0aAEA8Pf3R//+/Y1cFTBq1CiYm5vjyJEjeOGFF4xdzn8Ow85zoFevXmjTpg0A4N1330WtWrWwZMkSbN++HW+88YaRqyMiqj6USiV+//13JCcnAwCaNWsGU1NTo9aUlJSEffv2Yffu3Qw6RsLTWM+hbt26AQAuX74MALh37x6mTJmC5s2bw8bGBiqVCr169cLp06d15s3Ly8Ps2bPRqFEjWFhYwNXVFf3798elS5cAAFeuXNE6dfbko0uXLtKy9u/fD4VCge+++w4ffPABXFxcYG1tjT59+pR62Pjo0aPo2bMn7OzsYGVlhc6dO5d5Lr1Lly6lrn/27Nk6fb/55hv4+vrC0tISDg4OeP3110td/9O27XEajQYxMTFo2rQpLCws4OzsjFGjRuH+/fta/Tw9PfHKK6/orGfcuHE6yyyt9kWLFunsUwDIz89HVFQUGjRoAKVSCXd3d0ybNg35+fml7qvHdenSRWd58+fPh4mJCTZs2FCh/bF48WK0b98etWrVgqWlJXx9fbF58+ZS1//NN9+gXbt2sLKyQs2aNdGpUyfs3r1bq8/OnTvRuXNn2NraQqVSoW3btjq1bdq0SXpNHR0d8eabb+LmzZtafd5++22tmmvWrIkuXbrg4MGDz9xPlZ3X09NTZ7tNTEzwySefaLXv27cPHTt2hLW1Nezt7dG3b1+cP39eq8/s2bOhUCiQmZmp1f7HH39AoVBgzZo1pdZc2uPKlSsA/v97c/fu3WjVqhUsLCzg4+ODH374QWd7/v77bwwaNAgODg6wsrLCSy+9hJ9//rlc+620z+Xbb78NGxubZ+5HfT4/RUVFmDdvHurXrw+lUglPT0988MEHOp8JT09PvP322zA1NUXLli3RsmVL/PDDD1AoFDqvWVk1lWyTiYkJXFxc8Nprr+HatWtSn5LPzeLFi8tcTslrWuLIkSOwsLDApUuX0LRpUyiVSri4uGDUqFG4d++ezvzlff/b2Njg77//RlBQEKytreHm5oa5c+dCCKFTb8n7CACys7Ph6+sLLy8v3L59W2ov73ff84hHdp5DJcGkVq1aAB59WW3btg2DBg2Cl5cX0tPT8fnnn6Nz5844d+4c3NzcAADFxcV45ZVXkJCQgNdffx3vvfcesrOzsWfPHiQnJ6N+/frSOt544w0EBwdrrTcyMrLUeubPnw+FQoHp06cjIyMDMTExCAgIwKlTp2BpaQng0Zd+r1694Ovri6ioKJiYmCAuLg7dunXDwYMH0a5dO53l1qlTB9HR0QCAnJwcjBkzptR1z5w5E4MHD8a7776LO3fuYMWKFejUqRNOnjwJe3t7nXlGjhyJjh07AgB++OEHbN26VWv6qFGjsGbNGgwbNgwTJkzA5cuXsXLlSpw8eRKHDx+GmZlZqftBHw8ePJC27XEajQZ9+vTBoUOHMHLkSHh7e+Ps2bNYunQp/vrrL2zbtk2v9cTFxWHGjBn49NNPMWTIkFL7PGt/LFu2DH369EFoaCgKCgqwceNGDBo0CDt27EBISIjUb86cOZg9ezbat2+PuXPnwtzcHEePHsW+ffsQGBgI4NE4tHfeeQdNmzZFZGQk7O3tcfLkScTHx0v1lez7tm3bIjo6Gunp6Vi2bBkOHz6s85o6Ojpi6dKlAIAbN25g2bJlCA4OxvXr10t97R9XmXkft3v3brzzzjsYN24c3n//fal979696NWrF+rVq4fZs2fjn3/+wYoVK9ChQwecOHGiXH98Hzdq1CgEBARIz9966y28+uqrWqdoateuLf07JSUFr732GkaPHo2wsDDExcVh0KBBiI+PR48ePQAA6enpaN++PR4+fIgJEyagVq1aWLt2Lfr06YPNmzfj1Vdf1anj8f1WUkdVe/fdd7F27VoMHDgQkydPxtGjRxEdHY3z58/rvF8fV1RUhA8//FCvdXXs2BEjR46ERqNBcnIyYmJicOvWrXIF4bLcvXsXeXl5GDNmDLp164bRo0fj0qVLiI2NxdGjR3H06FEolUoA+r3/i4uL0bNnT7z00ktYuHAh4uPjERUVhaKiIsydO7fUWgoLCzFgwABcu3YNhw8fhqurqzTt3/juMxpB1VZcXJwAIPbu3Svu3Lkjrl+/LjZu3Chq1aolLC0txY0bN4QQQuTl5Yni4mKteS9fviyUSqWYO3eu1PbVV18JAGLJkiU669JoNNJ8AMSiRYt0+jRt2lR07txZev7rr78KAOKFF14QarVaav/+++8FALFs2TJp2Q0bNhRBQUHSeoQQ4uHDh8LLy0v06NFDZ13t27cXzZo1k57fuXNHABBRUVFS25UrV4SpqamYP3++1rxnz54VNWrU0GlPSUkRAMTatWultqioKPH4x+DgwYMCgFi/fr3WvPHx8TrtHh4eIiQkRKf28PBw8eRH68nap02bJpycnISvr6/WPv3666+FiYmJOHjwoNb8q1evFgDE4cOHddb3uM6dO0vL+/nnn0WNGjXE5MmTS+1bnv0hxKPX6XEFBQWiWbNmolu3blrLMjExEa+++qrOe7HkNX/w4IGwtbUVfn5+4p9//im1T0FBgXBychLNmjXT6rNjxw4BQMyaNUtqCwsLEx4eHlrL+eKLLwQAcezYsVK32dDz/vHHH8LGxkYMGjRIZ7tbtWolnJycxN27d6W206dPCxMTEzF06FCprWSf37lzR2v+48ePCwAiLi6u1DqefE89zsPDQwAQW7ZskdqysrKEq6uraN26tdQ2ceJEAUDr/ZadnS28vLyEp6enzjaFhoYKLy+vp9YRFhYmrK2tS63ryRrL8/k5deqUACDeffddrX5TpkwRAMS+ffu0lhkWFiY9/+yzz4RSqRRdu3bVeb3Lqunx+YUQYsiQIcLKykp6/rTvyBJPfo5Knnfv3l0UFRVJ7SXf8StWrBBC6P/+ByDGjx8vtWk0GhESEiLMzc2l91NJvXFxcUKj0YjQ0FBhZWUljh49qlWzPt99zyOexnoOBAQEoHbt2nB3d8frr78OGxsbbN26VTr3q1QqYWLy6KUsLi7G3bt3YWNjg8aNG+PEiRPScrZs2QJHR0eMHz9eZx1PHjbWx9ChQ2Frays9HzhwIFxdXfHLL78AAE6dOoWUlBQMGTIEd+/eRWZmJjIzM5Gbm4vu3bvjt99+g0aj0VpmXl4eLCwsnrreH374ARqNBoMHD5aWmZmZCRcXFzRs2BC//vqrVv+CggIAkP4HVZpNmzbBzs4OPXr00Fqmr68vbGxsdJZZWFio1S8zMxN5eXlPrfvmzZtYsWIFZs6cqXO4f9OmTfD29kaTJk20llly6vLJ9Zfl2LFjGDx4MAYMGIBFixaV2qc8+wOAdHQOAO7fv4+srCx07NhR6721bds2aDQazJo1S3ovlih5b+3ZswfZ2dl
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABSO0lEQVR4nO3dd1QU5/4/8PfS64JIV1TsYBeVoLFEERRiN16VKDFGjYJGsYXEbgyx93a/N7ZEYyxRo4kYwK7YsGK76sVYASsLGIrs8/vDw/xcF5SFxcXx/Tpnz2FnnnnmM8Pu8mbmmVmFEEKAiIiISKaMDF0AERERUWli2CEiIiJZY9ghIiIiWWPYISIiIllj2CEiIiJZY9ghIiIiWWPYISIiIllj2CEiIiJZY9ghIiLSgRACjx8/xrVr1wxdChURww4REZUJiYmJ2L59u/T87Nmz+OOPPwxX0EvS09MxYcIE1KpVC2ZmZihfvjxq1qyJq1evGro0KgKGHZlbs2YNFAqF9LCwsEDNmjURHh6OlJQUQ5dHRCRJT0/HkCFDcOzYMVy7dg1fffUVLly4YOiy8OjRI/j5+WHRokXo2bMnduzYgZiYGOzfvx9VqlQxdHlUBCaGLoDejmnTpsHT0xNZWVk4fPgwli9fjj///BOJiYmwsrIydHlERPDz85MeAFCzZk0MGjTIwFUBY8eOxf379xEfH486deoYuhwqBoad90THjh3RpEkTAMAXX3yB8uXLY968edixYwf69Olj4OqIiF7Yvn07Ll26hH/++Qf16tWDmZmZQetJTU3F2rVrsWLFCgaddxhPY72n2rZtCwBISkoCADx+/BhjxoxBvXr1YGNjA6VSiY4dO+LcuXNay2ZlZWHKlCmoWbMmLCws4Obmhu7du+PGjRsAgJs3b2qcOnv10aZNG6mv/fv3Q6FQ4Ndff8U333wDV1dXWFtbo3Pnzrh9+7bWuo8fP44OHTrAzs4OVlZWaN26NY4cOVLgNrZp06bA9U+ZMkWr7c8//wwfHx9YWlrCwcEBvXv3LnD9r9u2l6nVaixYsAB16tSBhYUFXFxcMGTIEDx58kSjXZUqVfDxxx9rrSc8PFyrz4Jqnz17ttY+BYDs7GxMnjwZ1atXh7m5OTw8PDBu3DhkZ2cXuK9e1qZNG63+ZsyYASMjI2zYsKFY+2POnDlo3rw5ypcvD0tLS/j4+GDLli0Frv/nn39Gs2bNYGVlhXLlyqFVq1b466+/NNrs3r0brVu3hq2tLZRKJZo2bapV2+bNm6XfqaOjIz799FPcvXtXo81nn32mUXO5cuXQpk0bHDp06I37qbjLvrpcQY+bN29qbGvLli1hbW0NW1tbBAcH4+LFi1r9XrlyBb169YKTkxMsLS1Rq1YtfPvttwCAKVOmvHGd+/fvL7V9t2zZMtSpUwfm5uZwd3dHWFgYnj59qtHm5dedt7c3fHx8cO7cuQJfTwV59f3u6OiI4OBgJCYmarRTKBQIDw8vtJ/8U//5v4OTJ09CrVYjJycHTZo0gYWFBcqXL48+ffrg1q1bWsvv3btX+n3Z29ujS5cuuHz5skab/N9H/u9MqVSifPny+Oqrr5CVlaVV78vv++fPnyMoKAgODg64dOmSRtuifo69j3hk5z2VH0zKly8PAPjf//6H7du345NPPoGnpydSUlKwcuVKtG7dGpcuXYK7uzsAIC8vDx9//DHi4uLQu3dvfPXVV0hPT0dMTAwSExNRrVo1aR19+vRBUFCQxnojIyMLrGfGjBlQKBQYP348UlNTsWDBAvj7++Ps2bOwtLQE8OJDpGPHjvDx8cHkyZNhZGSE1atXo23btjh06BCaNWum1W/FihURFRUFAMjIyMDQoUMLXPfEiRPRq1cvfPHFF3jw4AEWL16MVq1a4cyZM7C3t9daZvDgwWjZsiUA4LfffsO2bds05g8ZMgRr1qzBgAEDMGLECCQlJWHJkiU4c+YMjhw5AlNT0wL3gy6ePn0qbdvL1Go1OnfujMOHD2Pw4MHw8vLChQsXMH/+fPz3v//VGABaFKtXr8aECRMwd+5c9O3bt8A2b9ofCxcuROfOnRESEoKcnBxs3LgRn3zyCXbt2oXg4GCp3dSpUzFlyhQ0b94c06ZNg5mZGY4fP469e/ciICAAwIs/Rp9//jnq1KmDyMhI2Nvb48yZM4iOjpbqy9/3TZs2RVRUFFJSUrBw4UIcOXJE63fq6OiI+fPnAwDu3LmDhQsXIigoCLdv3y7wd/+y4iw7ZMgQ+Pv7S8/79euHbt26oXv37tI0JycnAMBPP/2E0NBQBAYGYubMmXj27BmWL1+ODz/8EGfOnJHGi5w/fx4tW7aEqakpBg8ejCpVquDGjRvYuXMnZsyYge7du6N69epS/6NGjYKXlxcGDx4sTfPy8iqVfTdlyhRMnToV/v7+GDp0KK5evYrly5fj5MmTb3wvjB8//rX7/1W1a9fGt99+CyEEbty4gXnz5iEoKKjAUFJUjx49AvDiHxAfHx/88MMPePDgARYtWoTDhw/jzJkzcHR0BADExsaiY8eOqFq1KqZMmYJ//vkHixcvRosWLXD69Gmt8T29evVClSpVEBUVhWPHjmHRokV48uQJ1q1bV2g9X3zxBfbv34+YmBh4e3tL04vzOfZeESRrq1evFgBEbGysePDggbh9+7bYuHGjKF++vLC0tBR37twRQgiRlZUl8vLyNJZNSkoS5ubmYtq0adK0VatWCQBi3rx5WutSq9XScgDE7NmztdrUqVNHtG7dWnq+b98+AUBUqFBBqFQqafqmTZsEALFw4UKp7xo1aojAwEBpPUII8ezZM+Hp6Snat2+vta7mzZuLunXrSs8fPHggAIjJkydL027evCmMjY3FjBkzNJa9cOGCMDEx0Zp+7do1AUCsXbtWmjZ58mTx8lvp0KFDAoBYv369xrLR0dFa0ytXriyCg4O1ag8LCxOvvj1frX3cuHHC2dlZ+Pj4aOzTn376SRgZGYlDhw5pLL9ixQoBQBw5ckRrfS9r3bq11N8ff/whTExMxOjRowtsW5T9IcSL39PLcnJyRN26dUXbtm01+jIyMhLdunXTei3m/86fPn0qbG1tha+vr/jnn38KbJOTkyOcnZ1F3bp1Ndrs2rVLABCTJk2SpoWGhorKlStr9PPvf/9bABAnTpwocJv1sezLXv295ktPTxf29vZi0KBBGtOTk5OFnZ2dxvRWrVoJW1tb8ffff2u0ffm98rLKlSuL0NBQren63nepqanCzMxMBAQEaPxOlyxZIgCIVatWSdNeft0JIcSff/4pAIgOHTpovZ4K8uryQgjxzTffCAAiNTVVmgZAhIWFFdpP/mdmUlKSxnNvb2+N13H+Z9fL742GDRsKZ2dn8ejRI2nauXPnhJGRkejfv780Lf890rlzZ411Dxs2TAAQ586d06g3//URGRkpjI2Nxfbt2zWW0/Vz7H3E01jvCX9/fzg5OcHDwwO9e/eGjY0Ntm3bhgoVKgAAzM3NYWT04uWQl5eHR48ewcbGBrVq1cLp06elfrZu3QpHR0cMHz5cax1FOdRcmP79+8PW1lZ63rNnT7i5ueHPP/8E8OIS1GvXrqFv37549OgRHj58iIcPHyIzMxPt2rXDwYMHoVarNfrMysqChYXFa9f722+/Qa1Wo1evXlKfDx8+hKurK2rUqIF9+/ZptM/JyQHwYn8VZvPmzbCzs0P79u01+vTx8YGNjY1Wn7m5uRrtHj58qHUo+1V3797F4sWLMXHiRNjY2Git38vLC7Vr19boM//U5avrL8yJEyfQq1cv9OjRA7Nnzy6wTVH2BwDp6BwAPHnyBGlpaWjZsqXGa2v79u1Qq9WYNGmS9FrMl//aiomJQXp6Or7++mut321+m1OnTiE1NRXDhg3TaBMcHIzatWtrXcqsVqulfXT27FmsW7cObm5u0pGO1ynJsm8SExODp0+fok+fPhq/R2NjY/j6+kq/xwcPHuDgwYP4/PPPUalSJY0+dH1P6nvfxcbGIicnByNHjtT4nQ4aNAhKpbLQy8qFEIiMjESPHj3g6+tb5Prz30sPHjxAfHw8tm3
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Распределение классов в обучающей выборке (в процентах):\n",
|
|||
|
"Response\n",
|
|||
|
"0 84.951091\n",
|
|||
|
"1 15.048909\n",
|
|||
|
"Name: proportion, dtype: float64\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Импорт необходимых библиотек\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"\n",
|
|||
|
"# Функция для визуализации распределения классов\n",
|
|||
|
"def plot_class_distribution(y, title):\n",
|
|||
|
" sns.countplot(x=y)\n",
|
|||
|
" plt.title(title)\n",
|
|||
|
" plt.xlabel(\"Response (Целевой признак)\")\n",
|
|||
|
" plt.ylabel(\"Количество записей\")\n",
|
|||
|
" plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# Оценка сбалансированности классов в выборках\n",
|
|||
|
"plot_class_distribution(y_train, \"Распределение классов в обучающей выборке\")\n",
|
|||
|
"plot_class_distribution(y_val, \"Распределение классов в контрольной выборке\")\n",
|
|||
|
"plot_class_distribution(y_test, \"Распределение классов в тестовой выборке\")\n",
|
|||
|
"\n",
|
|||
|
"# Проверка пропорций классов в обучающей выборке\n",
|
|||
|
"class_distribution_train = y_train.value_counts(normalize=True) * 100\n",
|
|||
|
"print(\"Распределение классов в обучающей выборке (в процентах):\")\n",
|
|||
|
"print(class_distribution_train)\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Сделаем для второй бизнес цели.\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Размеры выборок для второй цели:\n",
|
|||
|
"Обучающая: (1329, 15), Контрольная: (443, 15), Тестовая: (444, 15)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"\n",
|
|||
|
"# Целевой признак для второй бизнес-цели\n",
|
|||
|
"target_col_2 = 'NumWebPurchases'\n",
|
|||
|
"\n",
|
|||
|
"# Разделение данных на обучающую, контрольную и тестовую выборки\n",
|
|||
|
"X_train_2, X_temp_2, y_train_2, y_temp_2 = train_test_split(\n",
|
|||
|
" X.drop(columns=[target_col_2]), # Все признаки, кроме целевого\n",
|
|||
|
" X[target_col_2], # Целевой признак\n",
|
|||
|
" test_size=0.4, # 40% на контрольную и тестовую выборки\n",
|
|||
|
" random_state=42\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"X_val_2, X_test_2, y_val_2, y_test_2 = train_test_split(\n",
|
|||
|
" X_temp_2,\n",
|
|||
|
" y_temp_2,\n",
|
|||
|
" test_size=0.5, # 50% от оставшихся данных для тестовой выборки\n",
|
|||
|
" random_state=42\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# Проверим размеры выборок\n",
|
|||
|
"print(\"Размеры выборок для второй цели:\")\n",
|
|||
|
"print(f\"Обучающая: {X_train_2.shape}, Контрольная: {X_val_2.shape}, Тестовая: {X_test_2.shape}\")\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Оценка:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 10,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABgmUlEQVR4nO3dd1gUV/s38O+C0puANEVARcWuqIiooCKKPRqNShRLbMFuTMITFUsSLLGHaMyj2KNRo+YxsWAvwYaiEkvQgBXsVBUQzvuHL/Nz3UVcXFiYfD/XNdfFnDlzzr3D7nBz5sysQgghQERERCRTeroOgIiIiKg4MdkhIiIiWWOyQ0RERLLGZIeIiIhkjckOERERyRqTHSIiIpI1JjtEREQka0x2iIiISNaY7BARlQKZmZm4ffs2nj59qutQSMvS09ORmJiIzMxMXYfyr8Vkh4hIR7Zs2YJ27drB3NwcZmZmqFKlCubOnavrsMqEjIwMLFq0SFpPSUlBRESE7gJ6jRACK1asQPPmzWFiYgILCwu4ublh/fr1ug7tX0vBr4som1avXo3BgwdL64aGhqhSpQoCAgIwdepU2Nvb6zA6IirMl19+iTlz5qB79+7o27cvbG1toVAoUKNGDTg7O+s6vFIvNzcXlpaW+PHHH9G6dWvMnz8fV69exZ49e3QdGvr164fNmzcjODgYXbp0gaWlJRQKBerXr4+KFSvqOrx/pXK6DoDez8yZM+Hm5oYXL17g+PHjWLZsGf744w/ExcXBxMRE1+ERkRpHjhzBnDlzEB4eji+//FLX4ZRJ+vr6mDFjBgYOHIi8vDxYWFjg999/13VYWLt2LTZv3oz169ejf//+ug6H/j+O7JRR+SM7Z86cQZMmTaTySZMmYcGCBdi4cSP69eunwwiJqCBdu3bFkydPcOLECV2HUubduXMHt2/fhoeHB6ysrHQdDurVq4f69etjw4YNug6FXsM5OzLTtm1bAEBCQgIA4MmTJ/jss89Qr149mJmZwcLCAoGBgbhw4YLKvi9evMD06dNRo0YNGBkZwdHRET179sSNGzcAAImJiVAoFAUufn5+UluHDx+GQqHA5s2b8Z///AcODg4wNTVFt27dcPv2bZW+T506hY4dO8LS0hImJibw9fUt8A+Bn5+f2v6nT5+uUnf9+vXw9PSEsbExrK2t0bdvX7X9v+21vS4vLw+LFi1CnTp1YGRkBHt7e4wYMUJlUqmrqyu6dOmi0s/o0aNV2lQX+7x581SOKQBkZWUhLCwM1atXh6GhIZydnfH5558jKytL7bF6nZ+fn0p733zzDfT09LBx48YiHY/vvvsOLVq0gI2NDYyNjeHp6YmtW7eq7X/9+vVo1qwZTExMUKFCBbRu3Rr79u1TqrN79274+vrC3NwcFhYWaNq0qUpsW7ZskX6ntra2+Pjjj3H37l2lOoMGDVKKuUKFCvDz88OxY8cKPU7vsy8AHDx4EK1atYKpqSmsrKzQvXt3XLlyRanOyZMnUbduXfTt2xfW1tYwNjZG06ZNsWPHDqlORkYGTE1NMW7cOJU+7ty5A319fYSHh0sxu7q6qtR787118+ZNfPrpp6hZsyaMjY1hY2OD3r17IzExUWm//M/v4cOHpbIzZ86gffv2MDc3h6mpqdpjsnr1aigUCpw9e1Yqe/Tokdr3eJcuXdTG/C7ngunTp0vvxcqVK8Pb2xvlypWDg4ODStzq5O+fv5ibm6NZs2ZKxx949ZmpW7duge3kf05Wr14N4NUk87i4ODg7O6Nz586wsLAo8FgBwD///IPevXvD2toaJiYmaN68ucrolCbnUk0+45qcc+WAl7FkJj8xsbGxAfDqw7Rjxw707t0bbm5uuH//Pn788Uf4+vri8uXLcHJyAvDq+neXLl1w4MAB9O3bF+PGjUN6ejqioqIQFxeHatWqSX3069cPnTp1Uuo3NDRUbTzffPMNFAoFvvjiCzx48ACLFi2Cv78/YmNjYWxsDODVH4fAwEB4enoiLCwMenp6iIyMRNu2bXHs2DE0a9ZMpd3KlStLJ/qMjAyMGjVKbd9Tp05Fnz598Mknn+Dhw4dYunQpWrdujfPnz6v9L3D48OFo1aoVAODXX3/F9u3blbaPGDFCGlUbO3YsEhIS8P333+P8+fM4ceIEypcvr/Y4aCIlJUV6ba/Ly8tDt27dcPz4cQwfPhweHh64dOkSFi5ciL///lvlRF2YyMhITJkyBfPnzy9wuL2w47F48WJ069YNQUFByM7OxqZNm9C7d2/s2rULnTt3lurNmDED06dPR4sWLTBz5kwYGBjg1KlTOHjwIAICAgC8+kM5ZMgQ1KlTB6GhobCyssL58+exZ88eKb78Y9+0aVOEh4fj/v37WLx4MU6cOKHyO7W1tcXChQsBvEoOFi9ejE6dOuH27duFjgAUdd/9+/cjMDAQVatWxfTp0/H8+XMsXboUPj4+OHfunPTH/fHjx1ixYgXMzMwwduxYVKxYEevXr0fPnj2xYcMG9OvXD2ZmZvjggw+wefNmLFiwAPr6+lI/P//8M4QQCAoKeuvreNOZM2fw559/om/fvqhcuTISExOxbNky+Pn54fLlywVe+r5+/Tr8/PxgYmKCyZMnw8TEBD/99BP8/f0RFRWF1q1baxRHQYpyLsg3f/583L9/X6P+1q1bB+BVQvbDDz+gd+/eiIuLQ82aNYsU/+PHjwEAc+bMgYODAyZPngwjIyO1x+r+/fto0aIFnj17hrFjx8LGxgZr1qxBt27dsHXrVnzwwQdKbb/LufRNBX3G3+c4l1mCyqTIyEgBQOzfv188fPhQ3L59W2zatEnY2NgIY2NjcefOHSGEEC9evBC5ublK+yYkJAhDQ0Mxc+ZMqWzVqlUCgFiwYIFKX3l5edJ+AMS8efNU6tSpU0f4+vpK64cOHRIARKVKlURaWppU/ssvvwgAYvHixVLb7u7uokOHDlI/Qgjx7Nkz4ebmJtq3b6/SV4sWLUTdunWl9YcPHwoAIiwsTCpLTEwU+vr64ptvvlHa99KlS6JcuXIq5fHx8QKAWLNmjVQWFhYmXv+IHDt2TAAQGzZsUNp3z549KuUuLi6ic+fOKrGHhISINz92b8b++eefCzs7O+Hp6al0TNetWyf09PTEsWPHlPZfvny5ACBOnDih0t/rfH19pfZ+//13Ua5cOTFp0iS1dd/leAjx6vf0uuzsbFG3bl3Rtm1bpbb09PTEBx98oPJezP+dp6SkCHNzc+Hl5SWeP3+utk52draws7MTdevWVaqza9cuAUBMmzZNKgsODhYuLi5K7axYsUIAEKdPn1b7mrWxb8OGDYWdnZ14/PixVHbhwgWhp6cnBg4cKJUBEADE4cOHpbJnz54JDw8P4eDgILKzs4UQQuzdu1cAELt371bqp379+krvjcGDB4sqVaqoxPPme+vN35cQQkRHRwsAYu3atVJZ/uf30KFDQgghevXqJfT19UVcXJxU59GjR8LGxkZ4enpKZfnnpTNnzkhl6j6fQgjRuXNnpeOsybngzffigwcPhLm5uQgMDFSKuyDq3sv79u0TAMQvv/wilfn6+oo6deoU2E7+OTEyMlJp3cDAQPz9999Kx+DNYzV+/HgBQOnznJ6eLtzc3ISrq6v0WXnXc2l+vIV9xotyzpUDXsYq4/z9/VGxYkU4Ozujb9++MDMzw/bt21GpUiUAr+7S0tN79WvOzc3F48ePYWZmhpo1a+LcuXNSO9u2bYOtrS3GjBmj0sebly40MXDgQJibm0vrH374IRwdHfHHH38AAGJjYxEfH4/+/fvj8ePHePToER49eoTMzEy0a9cOR48eRV5enlKbL168gJGR0Vv7/fXXX5GXl4c+ffpIbT569AgODg5wd3fHoUOHlOpnZ2cDeHW8CrJlyxZYWlqiffv2Sm16enrCzMxMpc2cnByleo8ePcKLFy/eGvfdu3exdOlSTJ06FWZmZir9e3h4oFatWkpt5l+6fLP/gpw+fRp9+vRBr169MG/ePLV13uV4AFD6j/Lp06dITU1Fq1atlN5bO3bsQF5eHqZNmya9F/Plv7eioqKQnp6OL7/
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAHHCAYAAACle7JuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABcL0lEQVR4nO3dd1QU1/8+8GdBmlRBaREBFQG7YkPsoNg1GitR7CUYRYxJSDQoRrEkdtSYKJZYYjcxsWI3FkRRiYagQSEqKCpVKbL394c/5uu6i7IILuvneZ2z5zB3ZmeenV2GN3fuzMqEEAJEREREWkhH0wGIiIiISoqFDBEREWktFjJERESktVjIEBERkdZiIUNERERai4UMERERaS0WMkRERKS1WMgQERGR1mIhQ0REVMrkcjlSU1Px77//ajrKe4+FDBERaY3Tp0/j+PHj0vTx48dx5swZzQV6SXJyMgIDA+Ho6Ah9fX1UqVIFtWvXRkZGhqajvddYyGjQunXrIJPJpIehoSFq1aqFCRMmICUlRdPxiIjKnaSkJHzyySe4du0arl27hk8++QRJSUmajoWbN2+iadOm2Lp1K8aOHYt9+/bh8OHDiIyMhLGxsabjvdcqaDoAAaGhoXB2dkZOTg5Onz6NlStX4o8//kBsbCwqVqyo6XhEROVGnz59sHjxYtSvXx8A4OnpiT59+mg4FTB27Fjo6+vj3Llz+OCDDzQd538KC5lyoEuXLmjSpAkAYNSoUbCyssLChQuxd+9eDBo0SMPpiIjKDwMDA/z555+IjY0FANStWxe6uroazRQdHY2jR4/i0KFDLGI0gKeWyqEOHToAABISEgAAjx8/xmeffYZ69erBxMQEZmZm6NKlC65cuaL03JycHMyYMQO1atWCoaEh7Ozs0KdPH9y6dQsAcPv2bYXTWa8+2rVrJ63r+PHjkMlk+OWXX/DVV1/B1tYWxsbG6Nmzp8qu3PPnz6Nz584wNzdHxYoV0bZt2yLPXbdr107l9mfMmKG07M8//wwPDw8YGRnB0tISAwcOVLn91722l8nlcixevBh16tSBoaEhbGxsMHbsWDx58kRhOScnJ3Tv3l1pOxMmTFBap6rsCxYsUNqnAJCbm4uQkBDUrFkTBgYGcHBwwOeff47c3FyV++pl7dq1U1rf7NmzoaOjg82bN5dof3z33Xdo2bIlrKysYGRkBA8PD+zYsUPl9n/++Wc0a9YMFStWRKVKldCmTRscOnRIYZn9+/ejbdu2MDU1hZmZGZo2baqUbfv27dJ7WrlyZXz88ce4e/euwjLDhg1TyFypUiW0a9cOp06deuN+etvnOjk5Kb1uHR0dzJ07V6H96NGjaN26NYyNjWFhYYFevXrhxo0bCsvMmDEDMpkMqampCu0XL16ETCbDunXrVGZW9bh9+zaA//tsHjp0CA0bNoShoSFq166NXbt2Kb2ef//9F/369YOlpSUqVqyIFi1a4Pfffy/WflP1ezls2DCYmJi8cT+q8/vz/PlzzJo1CzVq1ICBgQGcnJzw1VdfKf1OODk5YdiwYdDV1UWDBg3QoEED7Nq1CzKZTOk9KypT4WvS0dGBra0tBgwYgMTERGmZwt+b7777rsj1FL6nhc6dOwdDQ0PcunULderUgYGBAWxtbTF27Fg8fvxY6fnF/fybmJjg33//ha+vL4yNjWFvb4/Q0FAIIZTyFn6OACAzMxMeHh5wdnbG/fv3pfbiHvu0DXtkyqHCosPKygrAiwPRnj170K9fPzg7OyMlJQU//PAD2rZti+vXr8Pe3h4AUFBQgO7duyMyMhIDBw7EpEmTkJmZicOHDyM2NhY1atSQtjFo0CB07dpVYbvBwcEq88yePRsymQxffPEFHjx4gMWLF8PHxwcxMTEwMjIC8OKA3qVLF3h4eCAkJAQ6OjqIiIhAhw4dcOrUKTRr1kxpvVWrVkVYWBgAICsrC+PHj1e57enTp6N///4YNWoUHj58iGXLlqFNmza4fPkyLCwslJ4zZswYtG7dGgCwa9cu7N69W2H+2LFjsW7dOgwfPhwTJ05EQkICli9fjsuXL+PMmTPQ09NTuR/UkZaWJr22l8nlcvTs2ROnT5/GmDFj4O7ujmvXrmHRokX4559/sGfPHrW2ExERgWnTpuH777/H4MGDVS7zpv2xZMkS9OzZE35+fsjLy8PWrVvRr18/7Nu3D926dZOWmzlzJmbMmIGWLVsiNDQU+vr6OH/+PI4ePYpOnToBeDHua8SIEahTpw6Cg4NhYWGBy5cv48CBA1K+wn3ftGlThIWFISUlBUuWLMGZM2eU3tPKlStj0aJFAID//vsPS5YsQdeuXZGUlKTyvX/Z2zz3ZYcOHcKIESMwYcIEfPnll1L7kSNH0KVLF1SvXh0zZszAs2fPsGzZMnh5eeHSpUvF+sP6srFjx8LHx0eaHjJkCD788EOF0yZVqlSRfo6Pj8eAAQMwbtw4+Pv7IyIiAv369cOBAwfQsWNHAEBKSgpatmyJp0+fYuLEibCyssL69evRs2dP7NixAx9++KFSjpf3W2GOsjZq1CisX78eH330EaZMmYLz588jLCwMN27cUPq8vuz58+f4+uuv1dpW69atMWbMGMjlcsTGxmLx4sW4d+9esYrcojx69Ag5OTkYP348OnTogHHjxuHWrVsIDw/H+fPncf78eRgYGABQ7/NfUFCAzp07o0WLFpg/fz4OHDiAkJAQPH/+HKGhoSqz5Ofno2/fvkhMTMSZM2dgZ2cnzXsXxz6NEKQxERERAoA4cuSIePjwoUhKShJbt24VVlZWwsjISPz3339CCCFycnJEQUGBwnMTEhKEgYGBCA0NldrWrl0rAIiFCxcqbUsul0vPAyAWLFigtEydOnVE27Ztpeljx44JAOKDDz4QGRkZUvu2bdsEALFkyRJp3S4uLsLX11fajhBCPH36VDg7O4uOHTsqbatly5aibt260vTDhw8FABESEiK13b59W+jq6orZs2crPPfatWuiQoUKSu3x8fECgFi/fr3UFhISIl7+mJ86dUoAEJs2bVJ47oEDB5TaHR0dRbdu3ZSyBwQEiFd/dV7N/vnnnwtra2vh4eGhsE83btwodHR0xKlTpxSev2rVKgFAnDlzRml7L2vbtq20vt9//11UqFBBTJkyReWyxdkfQrx4n16Wl5cn6tatKzp06KCwLh0dHfHhhx8qfRYL3/O0tDRhamoqmjdvLp49e6Zymby8PGFtbS3q1q2rsMy+ffsEAPHNN99Ibf7+/sLR0VFhPatXrxYAxIULF1S+5tJ+7sWLF4WJiYno16+f0utu2LChsLa2Fo8ePZLarly5InR0dMTQoUOltsJ9/vDhQ4XnR0VFCQAiIiJCZY5XP1Mvc3R0FADEzp07pbb09HRhZ2cnGjVqJLUFBgYKAAqft8zMTOHs7CycnJyUXpOfn59wdnZ+bQ5/f39hbGysMterGYvz+xMTEyMAiFGjRiks99lnnwkA4ujRowrr9Pf3l6ZXrFghDAwMRPv27ZXe76Iyvfx8IYQYPHiwqFixojT9umNkoVd/jwqnvb29xfPnz6X2wmP8smXLhBDqf/4BiE8//VRqk8vlolu3bkJfX1/6PBXmjYiIEHK5XPj5+YmKFSuK8+fPK2RW59inbXhqqRzw8fFBlSpV4ODggIEDB8LExAS7d++WzrUaGBhAR+fFW1VQUIBHjx7BxMQErq6uuHTpkrSenTt3onLlyvj000+VtvFqV646hg4dClNTU2n6o48+gp2dHf744w8AQExMDOLj4zF48GA8evQIqampSE1NRXZ2Nry9vXHy5EnI5XKFdebk5MDQ0PC12921axfkcjn69+8vrTM1NRW2trZwcXHBsWPHFJbPy8sDAOk/H1W2b98Oc3NzdOzYUWGdHh4eMDExUVpnfn6+wnKpqanIycl5be67d+9i2bJlmD59ulIX/Pbt2+Hu7g43NzeFdRaeTnx1+0W5cOEC+vfvj759+2LBggUqlynO/gAg9aoBwJMnT5Ceno7WrVsrfLb27NkDuVyOb775RvosFir8bB0+fBiZmZn48ssvld7bwmUuXry
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAHHCAYAAACle7JuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABUIUlEQVR4nO3dd1QU1/8+8GcBKdJBaRZARUDEhiWIXRS7xiRGJfYuxhpNSEQsUaJGVBRbPoolxWiMmmYl9lhRUKIiGlRiARtVAWXv7w9/7JdxF2QJZBnzvM7Zc9g7M3feO1t49k5ZhRBCgIiIiEiG9HRdABEREVFpMcgQERGRbDHIEBERkWwxyBAREZFsMcgQERGRbDHIEBERkWwxyBAREZFsMcgQERGRbDHIEBERFSKEwOPHj5GYmKjrUqgEGGSIiOhfER8fj127dqnux8bG4tdff9VdQYVkZmZi5syZcHd3h6GhIWxtbVG3bl0kJCToujR6DQYZmdu4cSMUCoXqZmxsjLp162LChAlISUnRdXlERCqZmZkYM2YMTp06hcTEREyaNAmXLl3SdVl49OgRfH19ERERgXfffRe7d+/GgQMHcPjwYbi4uOi6PHoNA10XQGVj7ty5cHV1RU5ODo4fP47Vq1fjt99+Q3x8PCpXrqzr8oiI4Ovrq7oBQN26dTFq1CgdVwVMnz4d9+7dw8mTJ+Hl5aXrckhLDDJviK5du6Jp06YAgJEjR8LW1hbh4eHYvXs3BgwYoOPqiIhe2rVrFy5fvoxnz57B29sbhoaGOq0nNTUVmzZtwpo1axhiZIq7lt5QHTp0AAAkJSUBAB4/foyPPvoI3t7eMDMzg4WFBbp27Yq4uDi1ZXNycjB79mzUrVsXxsbGcHR0RN++fXHjxg0AwM2bNyW7s169tWvXTtXX4cOHoVAo8P333+PTTz+Fg4MDTE1N0atXLyQnJ6ut+/Tp0+jSpQssLS1RuXJltG3bFidOnND4GNu1a6dx/bNnz1ab9+uvv4aPjw9MTExgY2OD/v37a1x/cY+tMKVSiWXLlsHLywvGxsawt7fHmDFj8OTJE8l8Li4u6NGjh9p6JkyYoNanptoXL16stk0BIDc3F6GhoahTpw6MjIxQo0YNzJgxA7m5uRq3VWHt2rVT62/+/PnQ09PDt99+W6rt8eWXX6Jly5awtbWFiYkJfHx88MMPP2hc/9dff43mzZujcuXKsLa2Rps2bbB//37JPHv27EHbtm1hbm4OCwsLNGvWTK227du3q57TKlWq4IMPPsCdO3ck8wwdOlRSs7W1Ndq1a4djx469djuVdtlXl9N0u3nzpuSxtm7dGqampjA3N0f37t3x559/qvV79epV9OvXD1WrVoWJiQnc3d3x2WefAQBmz5792nUePny43LbdqlWr4OXlBSMjIzg5OSEoKAhpaWmSeQq/7urVqwcfHx/ExcVpfD1p8ur7vUqVKujevTvi4+Ml8ykUCkyYMKHIfgp2xxc8B2fPnoVSqUReXh6aNm0KY2Nj2NraYsCAAbh9+7ba8r///rvq+bKyskLv3r1x5coVyTwFz0fBc2ZhYQFbW1tMmjQJOTk5avUWft+/ePEC3bp1g42NDS5fviyZt6SfY/81HJF5QxWEDltbWwDAX3/9hV27duG9996Dq6srUlJSsHbtWrRt2xaXL1+Gk5MTACA/Px89evRAdHQ0+vfvj0mTJiEzMxMHDhxAfHw8ateurVrHgAED0K1bN8l6g4ODNdYzf/58KBQKfPzxx0hNTcWyZcvg7++P2NhYmJiYAHj5AdG1a1f4+PggNDQUenp6iIqKQocOHXDs2DE0b95crd/q1asjLCwMAJCVlYVx48ZpXHdISAj69euHkSNH4sGDB1ixYgXatGmDCxcuwMrKSm2Z0aNHo3Xr1gCAH3/8ETt37pRMHzNmDDZu3Ihhw4Zh4sSJSEpKwsqVK3HhwgWcOHEClSpV0rgdtJGWlqZ6bIUplUr06tULx48fx+jRo+Hp6YlLly5h6dKluHbtmuRgypKIiorCzJkzsWTJEgwcOFDjPK/bHsuXL0evXr0QGBiIvLw8bN26Fe+99x5++eUXdO/eXTXfnDlzMHv2bLRs2RJz586FoaEhTp8+jd9//x2dO3cG8PIfzfDhw+Hl5YXg4GBYWVnhwoUL2Lt3r6q+gm3frFkzhIWFISUlBcuXL8eJEyfUntMqVapg6dKlAIC///4by5cvR7du3ZCcnKzxuS+sNMuOGTMG/v7+qvuDBg3C22+/jb59+6raqlatCgDYsmULhgwZgoCAACxcuBBPnz7F6tWr0apVK1y4cEF1fMbFixfRunVrVKpUCaNHj4aLiwtu3LiBn3/+GfPnz0ffvn1Rp04dVf9TpkyBp6cnRo8erWrz9PQsl203e/ZszJkzB/7+/hg3bhwSEhKwevVqnD179rXvhY8//rjY7f8qDw8PfPbZZxBC4MaNGwgPD0e3bt00Bo6SevToEYCXXy58fHzwxRdf4MGDB4iIiMDx48dx4cIFVKlSBQBw8OBBdO3aFbVq1cLs2bPx7NkzrFixAn5+fjh//rza8TT9+vWDi4sLwsLCcOrUKURERODJkyfYvHlzkfWMHDkShw8fxoEDB1CvXj1Ve2k+x/4zBMlaVFSUACAOHjwoHjx4IJKTk8XWrVuFra2tMDExEX///bcQQoicnByRn58vWTYpKUkYGRmJuXPnqto2bNggAIjw8HC1dSmVStVyAMTixYvV5vHy8hJt27ZV3T906JAAIKpVqyYyMjJU7du2bRMAxPLly1V9u7m5iYCAANV6hBDi6dOnwtXVVXTq1EltXS1bthT169dX3X/w4IEAIEJDQ1VtN2/eFPr6+mL+/PmSZS9duiQMDAzU2hMTEwUAsWnTJlVbaGioKPxWOXbsmAAgvvnmG8mye/fuVWt3dnYW3bt3V6s9KChIvPr2e7X2GTNmCDs7O+Hj4yPZplu2bBF6enri2LFjkuXXrFkjAIgTJ06ora+wtm3bqvr79ddfhYGBgZg2bZrGeUuyPYR4+TwVlpeXJ+rXry86dOgg6UtPT0+8/fbbaq/Fguc8LS1NmJubixYtWohnz55pnCcvL0/Y2dmJ+vXrS+b55ZdfBAAxa9YsVduQIUOEs7OzpJ9169YJAOLMmTMaH3NZLFvYq89rgczMTGFlZSVGjRolab9//76wtLSUtLdp00aYm5uLW7duSeYt/F4pzNnZWQwZMkStvay3XWpqqjA0NBSdO3eWPKcrV64UAMSGDRtUbYVfd0II8dtvvwkAokuXLmqvJ01eXV4IIT799FMBQKSmpqraAIigoKAi+yn4zExKSpLcr1evnuR1XPDZVfi90ahRI2FnZycePXqkaouLixN6enpi8ODBqraC90ivXr0k6x4/frwAIOLi4iT1Frw+goODhb6+vti1a5dkOW0/x/5ruGvpDeHv74+qVauiRo0a6N+/P8zMzLBz505Uq1YNAGBkZAQ9vZdPd35+Ph49egQzMzO4u7vj/Pnzqn527NiBKlWq4MMPP1RbR0mGf4syePBgmJubq+6/++67cHR0xG+//Qbg5WmYiYmJGDhwIB49eoSHDx/i4cOHyM7ORseOHXH06FEolUpJnzk5OTA2Ni52vT/++COUSiX69eun6vPhw4dwcHCAm5sbDh06JJk/Ly8PwMvtVZTt27fD0tISnTp1kvTp4+MDMzMztT6fP38ume/hw4dqw8uvunPnDlasWIGQkBCYmZmprd/T0xMeHh6SPgt2J766/qKcOXMG/fr1wzvvvIPFixdrnKck2wOAalQNAJ48eYL09HS0bt1a8tratWsXlEolZs2apXotFih4bR04cACZmZn45JNP1J7bgnnOnTuH1NRUjB8/XjJP9+7d4eHhoXY6r1KpVG2j2NhYbN68GY6OjqoRiuL8k2Vf58CBA0hLS8OAAQMkz6O+vj5atGiheh4fPHiAo0ePYvjw4ahZs6akD23fk2W97Q4ePIi8vDxMnjxZ8pyOGjUKFhYWRZ5aLYRAcHAw3nnnHbRo0aLE9Re8lx48eICTJ09i586daNCggWrEpEBOTg4
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Распределение классов в обучающей выборке (в процентах):\n",
|
|||
|
"NumWebPurchases\n",
|
|||
|
"1 15.876599\n",
|
|||
|
"2 15.575621\n",
|
|||
|
"3 14.672686\n",
|
|||
|
"4 13.167795\n",
|
|||
|
"5 10.233258\n",
|
|||
|
"6 8.577878\n",
|
|||
|
"7 6.471031\n",
|
|||
|
"8 4.966140\n",
|
|||
|
"9 3.837472\n",
|
|||
|
"11 2.558315\n",
|
|||
|
"0 2.257336\n",
|
|||
|
"10 1.655380\n",
|
|||
|
"23 0.075245\n",
|
|||
|
"27 0.075245\n",
|
|||
|
"Name: proportion, dtype: float64\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Функция для визуализации распределения классов\n",
|
|||
|
"def plot_class_distribution(y, title):\n",
|
|||
|
" sns.countplot(x=y)\n",
|
|||
|
" plt.title(title)\n",
|
|||
|
" plt.xlabel(\"NumWebPurchases (Целевой признак)\")\n",
|
|||
|
" plt.ylabel(\"Количество записей\")\n",
|
|||
|
" plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# Оценка сбалансированности классов в выборках\n",
|
|||
|
"plot_class_distribution(y_train_2, \"Распределение классов в обучающей выборке\")\n",
|
|||
|
"plot_class_distribution(y_val_2, \"Распределение классов в контрольной выборке\")\n",
|
|||
|
"plot_class_distribution(y_test_2, \"Распределение классов в тестовой выборке\")\n",
|
|||
|
"\n",
|
|||
|
"# Проверка пропорций классов в обучающей выборке\n",
|
|||
|
"class_distribution_train_2 = y_train_2.value_counts(normalize=True) * 100\n",
|
|||
|
"print(\"Распределение классов в обучающей выборке (в процентах):\")\n",
|
|||
|
"print(class_distribution_train_2)\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"6.Видим несбалансированность, для Второй бизнес цели выполним Upsampling (увеличение выборки для редких классов). Сделаем покупок больше."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 16,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlMAAAHHCAYAAACbXt0gAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABdKklEQVR4nO3deXhM1/8H8PcksskyEZGNSIJYQkolaMQuFTulVaT2irUopdXaFQ21r9XaKWptS6nYailBiF0aJCiSCLIi25zfH/nlfnMzEzJu0iS8X88zz5M598y9n7kzk3nn3HNvVEIIASIiIiJ6LQZFXQARERFRScYwRURERKQAwxQRERGRAgxTRERERAowTBEREREpwDBFREREpADDFBEREZECDFNERERECjBMERFRkcjIyEBsbCzu3r1b1KVQAXobX1eGKSIiyrehQ4fi/ffff+3HR0REYODAgXB0dISxsTHs7e3h4+MD/jOOku1NeF0fP34Mc3Nz/PHHH3o/9q0LU2vXroVKpZJupqamqFq1KoYPH46YmJiiLo+IirGoqCj069cPlStXhqmpKRwcHNCkSRNMnjxZ1q9Zs2ZQqVRwd3fXuZ7g4GDpd9D27du1ll+9ehWffPIJypcvDxMTEzg5OSEgIABXr16V9cv5u+xlt6NHjyIqKuqlfb777rtXPv/IyEj89NNP+Prrr/XYa/9z+vRp1K9fH4cPH8ZXX32FP//8E8HBwdi9ezdUKtVrrZOK3pvyupYtWxaffvopJk6cqPdjSxVCPSXCtGnT4ObmhhcvXuDEiRNYvnw5/vjjD1y5cgWlS5cu6vKIqJi5efMm6tWrBzMzM/Tv3x+urq54+PAhzp8/j6CgIEydOlXW39TUFDdv3sSZM2dQv3592bJNmzbB1NQUL1680NrOzp070aNHD9jY2GDAgAFwc3NDVFQUVq1ahe3bt2PLli344IMPAAAbNmyQPXb9+vUIDg7Waq9RowaeP38OAOjRowfatm2rtd133333lftg4cKFcHNzQ/PmzV/ZN7e0tDT069cPVatWxYEDB6BWq/VeBxU/b9rrOnjwYCxatAiHDx9GixYt8v9A8ZZZs2aNACDOnj0rax89erQAIH7++eciqoyIirOhQ4eKUqVKiaioKK1lMTExsvtNmzYVNWvWFNWqVROjRo2SLXv+/LmwsrISXbt2FQDEtm3bpGU3b94UpUuXFtWrVxexsbGyxz169EhUr15dmJubi1u3bumscdiwYSKvX+uRkZECgJgzZ06+nm9uaWlpwtbWVkyYMOG1Hr99+3ahUqlEeHj4az2eiqc38XWtVauW6NWrl16PeesO8+UlO4FGRkYCAJ48eYIvvvgCnp6esLCwgJWVFdq0aYOLFy9qPfbFixeYMmUKqlatClNTUzg6OqJLly64desWALxyeL1Zs2bSuo4ePQqVSoWtW7fi66+/hoODA8zNzdGxY0fcu3dPa9shISFo3bo11Go1SpcujaZNm+LkyZM6n2P2oYfctylTpmj13bhxI7y8vGBmZgYbGxt0795d5/Zf9txy0mg0WLBgAWrWrAlTU1PY29tj0KBBePr0qayfq6sr2rdvr7Wd4cOHa61TV+1z5szR2qcAkJqaismTJ6NKlSowMTGBs7Mzxo0bh9TUVJ37KqdmzZpprW/GjBkwMDDAzz///Fr74/vvv0fDhg1RtmxZmJmZwcvLS+fhHiDrtahfvz5Kly6NMmXKoEmTJjhw4ICsz759+9C0aVNYWlrCysoK9erV06pt27Zt0mtqa2uLTz75BPfv35f16du3r6zmMmXKoFmzZjh+/Pgr91Pux+o61KRvPQBw48YNdOvWDeXKlYOZmRmqVauGb775Rqufq6trvra7b98+NG7cGObm5rC0tES7du20Dp/pcuvWLVSoUAEuLi5ay+zs7HQ+pkePHti6dSs0Go3U9vvvv+PZs2fo1q2bVv85c+bg2bNnWLlyJcqVKydbZmtrix9++AEpKSmYPXv2K+staCdOnEBcXBz8/Py0lsXGxmLAgAGwt7eHqakpateujXXr1sn6nD59Gm5ubtixYwcqV64MY2NjVKxYEePGjZNGzQCgT58+sLW1RXp6utZ2WrVqhWrVqgH43+/K3K9v37594erqKmvL7+fN1dUVffv2le4nJSVh+PDh0uFWd3d3fPfdd7LXE8j6XTR8+HBZW/v27bXq2L59u1bN+nzX3L59Gx999BGcnJxgYGAgvcdr1aql1Te3nJ8JQ0NDlC9fHoGBgYiPj5f6ZO/TvH4XAdr7N7+va7Zly5ahZs2a0uHrYcOGyWoAsn7n1qpVC6GhoWjYsCHMzMzg5uaGFStWyPrpeg88ePAArq6u8Pb2RnJyMoCs0bNJkybBy8sLarUa5ubmaNy4MY4cOaLzOb7//vv4/fff9Zrv9dYe5sstO/iULVsWQNabdvfu3fjoo4/g5uaGmJgY/PDDD2jatCmuXbsGJycnAEBmZibat2+PQ4cOoXv37hg5ciSSkpIQHByMK1euoHLlytI2dA2vjx8/Xmc9M2bMgEqlwpdffonY2FgsWLAAfn5+CAsLg5mZGQDg8OHDaNOmDby8vDB58mQYGBhgzZo1aNGiBY4fP651aAEAKlSogFmzZgEAkpOTMWTIEJ3bnjhxIrp164ZPP/0Ujx49wuLFi9GkSRNcuHAB1tbWWo8JDAxE48aNAWQdpti1a5ds+aBBg7B27Vr069cPI0aMQGRkJJYsWYILFy7g5MmTMDIy0rkf9BEfHy89t5w0Gg06duyIEydOIDAwEDVq1MDly5cxf/58/PPPP9i9e7de21mzZg0mTJiAuXPnomfPnjr7vGp/LFy4EB07dkRAQADS0tKwZcsWfPTRR9izZw/atWsn9Zs6dSqmTJmChg0bYtq0aTA2NkZISAgOHz6MVq1aAciaB9i/f3/UrFkT48ePh7W1NS5cuID9+/dL9WXv+3r16mHWrFmIiYnBwoULcfLkSa3X1NbWFvPnzwcA/Pvvv1i4cCHatm2Le/fu6XztczIxMcFPP/0kazt79iwWLVoka8tvPZcuXULjxo1hZGSEwMBAuLq64tatW/j9998xY8YMre03btwYgYGBAIDr169j5syZsuUbNmxAnz594O/vj6CgIDx79gzLly9Ho0aNcOHCBa0vv5xcXFxw8OBBvYb/e/bsiSlTpuDo0aPSY37++We0bNlSZwD7/fff4erqKr13cmvSpAlcXV2xd+/efG1fl2fPniEuLk6r3draGqVK5f2V8Pfff0OlUmkdDnz+/DmaNWuGmzdvYvjw4XBzc8O2bdvQt29fxMfHY+TIkQCyJvfevn0bX3/9Nbp06YIxY8bg3LlzmDNnDq5cuYK9e/dCpVKhV69eWL9+Pf7880/ZH1bR0dE4fPiw1vy0/Mjv5y23rl27Ijg4GL1790b9+vVx5MgRjB8/HlFRUVpf7K9Ln++ajh074s6dOxg1ahSqVq0KlUql83OQlw8++ABdunRBRkYGTp06hZUrV+L58+dah4X1kd/XFQCmTJmCqVOnws/PD0OGDEF4eDiWL1+Os2fPan0PPH36FG3btkW3bt3Qo0cP/PLLLxgyZAiMjY3Rv39/nbUkJCSgTZs2MDIywh9//AELCwsAQGJiIn766Sf06NEDAwcORFJSElatWgV/f3+cOXMGderUka3Hy8sL8+fPx9WrV/MVVAG8vYf5Dh48KB49eiTu3bsntmzZIsqWLSvMzMzEv//+K4QQ4sWLFyIzM1P22MjISGFiYiKmTZsmta1evVoAEPPmzdPalkajkR6HPIbXa9asKZo2bSrdP3LkiAAgypcvLxITE6X2X375RQAQCxculNbt7u4u/P39pe0IIcSzZ8+Em5ubeP/997W21bBhQ1GrVi3p/qNHjwQAMXnyZKktKipKGBoaihkzZsgee/nyZVGqVCmt9oiICAFArFu3TmqbPHmy7FDD8ePHBQCxadMm2WP379+v1e7i4iLatWunVbuuwxe5ax83bpyws7MTXl5esn26YcMGYWBgII4fPy57/IoVKwQAcfLkSa3t5dS0aVNpfXv37hWlSpU
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Распределение классов после SMOTE (в процентах):\n",
|
|||
|
"Response\n",
|
|||
|
"0 50.0\n",
|
|||
|
"1 50.0\n",
|
|||
|
"Name: proportion, dtype: float64\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from imblearn.over_sampling import SMOTE\n",
|
|||
|
"\n",
|
|||
|
"# Применение SMOTE к обучающей выборке\n",
|
|||
|
"smote = SMOTE(random_state=42)\n",
|
|||
|
"X_train_balanced, y_train_balanced = smote.fit_resample(X_train, y_train)\n",
|
|||
|
"\n",
|
|||
|
"# Проверим распределение после аугментации\n",
|
|||
|
"plot_class_distribution(y_train_balanced, \"Распределение классов после SMOTE (обучающая выборка)\")\n",
|
|||
|
"\n",
|
|||
|
"# Проверим процентное распределение\n",
|
|||
|
"balanced_distribution = y_train_balanced.value_counts(normalize=True) * 100\n",
|
|||
|
"print(\"Распределение классов после SMOTE (в процентах):\")\n",
|
|||
|
"print(balanced_distribution)\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Для первой бизнес цели:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 24,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Размерность выборки после RandomOverSampler:\n",
|
|||
|
"X_train_res: (2954, 15)\n",
|
|||
|
"y_train_res: (2954,)\n",
|
|||
|
"\n",
|
|||
|
"Распределение классов после балансировки (в процентах):\n",
|
|||
|
"NumWebPurchases\n",
|
|||
|
"2 7.142857\n",
|
|||
|
"5 7.142857\n",
|
|||
|
"1 7.142857\n",
|
|||
|
"8 7.142857\n",
|
|||
|
"9 7.142857\n",
|
|||
|
"3 7.142857\n",
|
|||
|
"11 7.142857\n",
|
|||
|
"7 7.142857\n",
|
|||
|
"6 7.142857\n",
|
|||
|
"4 7.142857\n",
|
|||
|
"0 7.142857\n",
|
|||
|
"10 7.142857\n",
|
|||
|
"23 7.142857\n",
|
|||
|
"27 7.142857\n",
|
|||
|
"Name: proportion, dtype: float64\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABbTElEQVR4nO3dd1gU1/s28HtpC1IFpSkCYu+KSuwNUTSo0WhUkmA3ClHBFhIVSwx2sUVNomiixho1MbEgdiMWFJVYvmBQbNiRYiiy5/3Dl/25LiCLi7uO9+e69pKdOXvmntllfJg5MysTQggQERERSZSBrgMQERERlSYWO0RERCRpLHaIiIhI0ljsEBERkaSx2CEiIiJJY7FDREREksZih4iIiCSNxQ4RERFJmpGuAxARkX7LysrC48ePYWRkBHt7e13HIdIYj+wQEZGa/fv3o1u3brCxsYGZmRkqVKiA0aNH6zoWUYmw2JGoNWvWQCaTKR+mpqaoVq0agoKCcO/ePV3HIyI99v3336NTp054+vQpFi1ahKioKERFRWH69Om6jkZUIjyNJXHTp0+Hu7s7srKycOzYMSxfvhx//fUX4uPjUaZMGV3HIyI9k5CQgJCQEAwbNgzff/89ZDKZriMRvTEWOxLn6+uLxo0bAwCGDBkCOzs7LFiwADt37kS/fv10nI6I9M3ixYvh6OiIxYsXs9AhyeBprPdM+/btAQBJSUkAgMePH2PcuHGoW7cuLCwsYGVlBV9fX5w/f17ttVlZWZg6dSqqVasGU1NTODk5oWfPnrh27RoA4Pr16yqnzl59tG3bVtnXoUOHIJPJsGnTJnz99ddwdHSEubk5unXrhps3b6ot++TJk+jcuTOsra1RpkwZtGnTBsePHy9wHdu2bVvg8qdOnarWdt26dfD09ISZmRlsbW3Rt2/fApdf1Lq9TKFQICIiArVr14apqSkcHBwwfPhwPHnyRKWdm5sbPvzwQ7XlBAUFqfVZUPa5c+eqbVMAyM7ORlhYGKpUqQK5XA4XFxdMmDAB2dnZBW6rl7Vt21atv5kzZ8LAwAAbNmwo0faYN28emjdvDjs7O5iZmcHT0xNbt24tcPnr1q1D06ZNUaZMGZQtWxatW7fGvn37VNrs3r0bbdq0gaWlJaysrNCkSRO1bFu2bFG+p+XKlcOnn36K27dvq7QZMGCASuayZcuibdu2OHr06Gu306uvffVx6NAhjfMAwJUrV9CnTx+UL18eZmZmqF69Or755hu1dm5ubsVa7u7du9GqVSuYm5vD0tISXbt2xT///PPa9YuJiYGnpydGjhwJBwcHyOVy1KlTBz/++KNaW03e31dPrRe0Xyju/ih//1HQsiwsLDBgwACVaampqQgODoabmxvkcjkqVqyIzz//HA8fPlTp79Vt2LVrV7Xfv6lTp0ImkynfLysrK9jZ2WH06NHIyspSef3z588xY8YMeHh4QC6Xw83NDV9//bXa7+PL76mBgQEcHR3xySefIDk5Wdkm/3duzZo1ymnp6enw9PSEu7s77t69W2g7AAgMDIRMJlPbNu8LHtl5z+QXJnZ2dgCAf//9Fzt27EDv3r3h7u6Oe/fuYeXKlWjTpg0uXboEZ2dnAEBeXh4+/PBDREdHo2/fvhg9ejTS09MRFRWF+Ph4eHh4KJfRr18/dOnSRWW5oaGhBeaZOXMmZDIZJk6ciPv37yMiIgLe3t6Ii4uDmZkZAODAgQPw9fWFp6cnwsLCYGBggMjISLRv3x5Hjx5F06ZN1fqtWLEiwsPDAQAZGRkYMWJEgcuePHky+vTpgyFDhuDBgwdYsmQJWrdujXPnzsHGxkbtNcOGDUOrVq0AAL/99hu2b9+uMn/48OFYs2YNBg4ciFGjRiEpKQlLly7FuXPncPz4cRgbGxe4HTSRmpqqXLeXKRQKdOvWDceOHcOwYcNQs2ZNXLx4EQsXLsT//vc/7NixQ6PlREZGYtKkSZg/fz769+9fYJvXbY9FixahW7du8Pf3R05ODjZu3IjevXtj165d6Nq1q7LdtGnTMHXqVDRv3hzTp0+HiYkJTp48iQMHDsDHxwfAi/8sBw0ahNq1ayM0NBQ2NjY4d+4c9uzZo8yXv+2bNGmC8PBw3Lt3D4sWLcLx48fV3tNy5cph4cKFAIBbt25h0aJF6NKlC27evFnge/8yuVyOn376SWXa6dOnsXjxYpVpxc1z4cIFtGrVCsbGxhg2bBjc3Nxw7do1/PHHH5g5c6ba8lu1aoVhw4YBAC5fvozvvvtOZf4vv/yCgIAAdOrUCbNnz8azZ8+wfPlytGzZEufOnYObm1uh6/bo0SOcOXMGRkZGCAwMhIeHB3bs2IFhw4bh0aNH+Oqrr5Rti/v+vmzhwoUoV64cAKitW3H3R5rIyMhAq1atcPnyZQwaNAiNGjXCw4cP8fvvv+PWrVvKLK86cuQI/vrrr0L77dOnD9zc3BAeHo6YmBgsXrwYT548wc8//6xsM2TIEKxduxYff/wxxo4di5MnTyI8PByXL19W+13Jf08VCgXi4+MRERGBO3fuFFqA5+bmolevXkhOTsbx48fh5ORUaNbExMQCi9X3iiBJioyMFADE/v37xYMHD8TNmzfFxo0bhZ2dnTAzMxO3bt0SQgiRlZUl8vLyVF6blJQk5HK5mD59unLa6tWrBQCxYMECtWUpFArl6wCIuXPnqrWpXbu2aNOmjfL5wYMHBQBRoUIFkZaWppy+efNmAUAsWrRI2XfVqlVFp06dlMsRQohnz54Jd3d30bFjR7VlNW/eXNSpU0f5/MGDBwKACAsLU067fv26MDQ0FDNnzlR57cWLF4WRkZHa9ISEBAFArF27VjktLCxMvPwrdPToUQFArF+/XuW1e/bsUZvu6uoqunbtqpY9MDBQvPpr+Wr2CRMmCHt7e+Hp6amyTX/55RdhYGAgjh49qvL6FStWCADi+PHjast7WZs2bZT9/fnnn8LIyEiMHTu2wLbF2R5CvHifXpaTkyPq1Kkj2rdvr9KXgYGB+Oijj9Q+i/nveWpqqrC0tBReXl7iv//+K7BNTk6OsLe3F3Xq1FFps2vXLgFATJkyRTktICBAuLq6qvTzww8/CADi1KlTBa7zy681NzdXm75lyxYBQBw8eFDjPK1btxaWlpbixo0bBa7byypUqCAGDhyofJ7/u5S/3PT0dGFjYyOGDh2q8rqUlBRhbW2tNv1Vrq6uAoBYs2aNctrz589Fhw4dhFwuFw8fPlROL877m+/HH38UAFTW8eXPnBDF3x/lr/OWLVvUlmNubi4CAgKUz6dMmSIAiN9++02tbf72fXUbCiGEl5eX8PX1Vfv9y/+cd+vWTaWvkSNHCgDi/PnzQggh4uLiBAAxZMgQlXbjxo0TAMSBAweU01xdXVUyCyFE//79RZkyZVS2AwARGRkpFAqF8Pf3F2XKlBEnT55U21757fL16dNH1KlTR7i4uKgt533B01gS5+3tjfLly8PFxQV9+/aFhYUFtm/fjgoVKgB48ReqgcGLj0FeXh4ePXoECwsLVK9eHWfPnlX2s23bNpQrVw5ffvml2jLe5Lz+559/DktLS+Xzjz/+GE5OTsq/qOLi4pCQkID+/fvj0aNHePjwIR4+fIjMzEx06NABR44cgUKhUOkzKysLpqamRS73t99+g0KhQJ8+fZR9Pnz4EI6OjqhatSoOHjyo0j4nJwfAi+1VmC1btsDa2hodO3ZU6dPT0xMWFhZqfebm5qq0e/jwodph8Ffdvn0bS5YsweTJk2FhYaG2/Jo1a6JGjRoqfeafunx1+YU5deoU+vTpg169emHu3LkFtinO9gCgPDoHAE+ePMHTp0/RqlUrlc/Wjh07oFAoMGXKFOVnMV/+ZysqKgrp6en46quv1N7b/DZnzpzB/fv3MXLkSJU2Xbt2RY0aNfDnn3+qvE6hUCi3UVxcHH7++Wc4OTmhZs2aRa5TcRU3z4MHD3DkyBEMGjQIlSpVKnDdXpaTk1Pkdo+KikJqair69eun8jkwNDSEl5dXsT4HDg4O+Oyzz5TPDQ0NMWbMGGRnZ2P//v3K6cV5f1/ODRT9mSnu/ih
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from imblearn.over_sampling import RandomOverSampler\n",
|
|||
|
"import pandas as pd\n",
|
|||
|
"\n",
|
|||
|
"# Применение RandomOverSampler для балансировки классов\n",
|
|||
|
"ros = RandomOverSampler(random_state=42)\n",
|
|||
|
"X_train_res, y_train_res = ros.fit_resample(X_train_2, y_train_2)\n",
|
|||
|
"\n",
|
|||
|
"# Выводим новые размеры выборки\n",
|
|||
|
"print(f\"Размерность выборки после RandomOverSampler:\")\n",
|
|||
|
"print(f\"X_train_res: {X_train_res.shape}\")\n",
|
|||
|
"print(f\"y_train_res: {y_train_res.shape}\")\n",
|
|||
|
"\n",
|
|||
|
"# Распределение классов в обучающей выборке после балансировки\n",
|
|||
|
"class_distribution_res = pd.Series(y_train_res).value_counts(normalize=True) * 100\n",
|
|||
|
"print(\"\\nРаспределение классов после балансировки (в процентах):\")\n",
|
|||
|
"print(class_distribution_res)\n",
|
|||
|
"\n",
|
|||
|
"# Для визуализации можно использовать график\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"\n",
|
|||
|
"# Функция для визуализации распределения классов\n",
|
|||
|
"def plot_class_distribution(y, title):\n",
|
|||
|
" sns.countplot(x=y)\n",
|
|||
|
" plt.title(title)\n",
|
|||
|
" plt.xlabel(\"Response (Целевой признак)\")\n",
|
|||
|
" plt.ylabel(\"Количество записей\")\n",
|
|||
|
" plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# Построение графика распределения классов\n",
|
|||
|
"plot_class_distribution(y_train_res, \"Распределение классов после балансировки\")\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"7-8. Делаем лютый кодик, ставим библиотеку, пишем комменты)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 42,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"1.31.0\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import featuretools as ft\n",
|
|||
|
"print(ft.__version__)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 46,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
" Income Kidhome Teenhome MntWines MntFruits \\\n",
|
|||
|
"customer_id \n",
|
|||
|
"0 -0.263557 1 1 68 0 \n",
|
|||
|
"1 -1.102440 1 0 18 3 \n",
|
|||
|
"2 0.633408 0 1 225 162 \n",
|
|||
|
"3 1.135917 1 0 739 107 \n",
|
|||
|
"4 1.299116 0 0 395 183 \n",
|
|||
|
"\n",
|
|||
|
" MntMeatProducts MntFishProducts MntSweetProducts MntGoldProds \\\n",
|
|||
|
"customer_id \n",
|
|||
|
"0 16 0 0 8 \n",
|
|||
|
"1 19 3 3 6 \n",
|
|||
|
"2 387 106 36 29 \n",
|
|||
|
"3 309 140 80 35 \n",
|
|||
|
"4 565 166 141 28 \n",
|
|||
|
"\n",
|
|||
|
" AcceptedCmp1 AcceptedCmp2 AcceptedCmp3 AcceptedCmp4 \\\n",
|
|||
|
"customer_id \n",
|
|||
|
"0 0 0 0 0 \n",
|
|||
|
"1 0 0 0 0 \n",
|
|||
|
"2 0 0 0 0 \n",
|
|||
|
"3 0 0 0 0 \n",
|
|||
|
"4 0 0 0 0 \n",
|
|||
|
"\n",
|
|||
|
" AcceptedCmp5 Recency Income_binned \n",
|
|||
|
"customer_id \n",
|
|||
|
"0 0 6 1.0 \n",
|
|||
|
"1 0 67 0.0 \n",
|
|||
|
"2 0 77 1.0 \n",
|
|||
|
"3 0 2 2.0 \n",
|
|||
|
"4 0 19 2.0 \n",
|
|||
|
"Размерность выборки после RandomOverSampler:\n",
|
|||
|
"X_train_res: (2954, 17)\n",
|
|||
|
"y_train_res: (2954,)\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Ulstu\\MII\\miivenv\\Lib\\site-packages\\featuretools\\synthesis\\deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"from sklearn.preprocessing import KBinsDiscretizer, MinMaxScaler, StandardScaler\n",
|
|||
|
"import featuretools as ft\n",
|
|||
|
"from imblearn.over_sampling import RandomOverSampler\n",
|
|||
|
"\n",
|
|||
|
"# 1. One-hot encoding для категориальных признаков\n",
|
|||
|
"X_train_2 = pd.get_dummies(X_train_2, drop_first=True)\n",
|
|||
|
"X_test_2 = pd.get_dummies(X_test_2, drop_first=True)\n",
|
|||
|
"\n",
|
|||
|
"# 2. Дискретизация числовых признаков\n",
|
|||
|
"discretizer = KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='uniform')\n",
|
|||
|
"X_train_2['Income_binned'] = discretizer.fit_transform(X_train_2[['Income']])\n",
|
|||
|
"X_test_2['Income_binned'] = discretizer.transform(X_test_2[['Income']])\n",
|
|||
|
"\n",
|
|||
|
"# 3. Масштабирование признаков\n",
|
|||
|
"scaler_minmax = MinMaxScaler()\n",
|
|||
|
"X_train_2[['Income']] = scaler_minmax.fit_transform(X_train_2[['Income']])\n",
|
|||
|
"X_test_2[['Income']] = scaler_minmax.transform(X_test_2[['Income']])\n",
|
|||
|
"\n",
|
|||
|
"# Стандартизация признаков\n",
|
|||
|
"scaler_standard = StandardScaler()\n",
|
|||
|
"X_train_2[['Income']] = scaler_standard.fit_transform(X_train_2[['Income']])\n",
|
|||
|
"X_test_2[['Income']] = scaler_standard.transform(X_test_2[['Income']])\n",
|
|||
|
"\n",
|
|||
|
"# 4. Применение Featuretools для создания признаков\n",
|
|||
|
"es = ft.EntitySet(id=\"data\")\n",
|
|||
|
"\n",
|
|||
|
"# Мы добавляем данные в EntitySet с помощью метода add_dataframe\n",
|
|||
|
"es = es.add_dataframe(\n",
|
|||
|
" dataframe_name=\"customer_data\",\n",
|
|||
|
" dataframe=X_train_2,\n",
|
|||
|
" index=\"customer_id\" \n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# Применяем Featuretools для создания признаков\n",
|
|||
|
"# Изменения: теперь указываем `target_dataframe_name` вместо `target_entity`\n",
|
|||
|
"features, feature_names = ft.dfs(entityset=es, target_dataframe_name=\"customer_data\")\n",
|
|||
|
"\n",
|
|||
|
"print(features.head())\n",
|
|||
|
"\n",
|
|||
|
"# 5. Балансировка выборки с помощью RandomOverSampler\n",
|
|||
|
"ros = RandomOverSampler(random_state=42)\n",
|
|||
|
"X_train_res, y_train_res = ros.fit_resample(X_train_2, y_train_2)\n",
|
|||
|
"\n",
|
|||
|
"print(f\"Размерность выборки после RandomOverSampler:\")\n",
|
|||
|
"print(f\"X_train_res: {X_train_res.shape}\")\n",
|
|||
|
"print(f\"y_train_res: {y_train_res.shape}\")\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Видим снизу ошибку: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created\n",
|
|||
|
" warnings.warn. Она означает, что EntitySet состоит только из одного DataFrame. Т.е. только одна сущность.\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"ТЕперь для другой бизнес цели)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 50,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
" Income Kidhome Teenhome NumWebPurchases MntWines \\\n",
|
|||
|
"customer_id \n",
|
|||
|
"0 0.739837 0 1 6 522 \n",
|
|||
|
"1 -0.203068 1 1 1 22 \n",
|
|||
|
"2 0.160233 0 1 7 479 \n",
|
|||
|
"3 1.049812 0 0 4 594 \n",
|
|||
|
"4 0.119182 1 2 6 416 \n",
|
|||
|
"\n",
|
|||
|
" MntFruits MntMeatProducts MntFishProducts MntSweetProducts \\\n",
|
|||
|
"customer_id \n",
|
|||
|
"0 0 522 227 120 \n",
|
|||
|
"1 2 10 6 4 \n",
|
|||
|
"2 5 82 7 17 \n",
|
|||
|
"3 51 631 72 55 \n",
|
|||
|
"4 0 26 0 0 \n",
|
|||
|
"\n",
|
|||
|
" MntGoldProds AcceptedCmp1 AcceptedCmp2 AcceptedCmp3 \\\n",
|
|||
|
"customer_id \n",
|
|||
|
"0 134 0 0 0 \n",
|
|||
|
"1 34 0 0 0 \n",
|
|||
|
"2 171 0 0 1 \n",
|
|||
|
"3 32 0 0 0 \n",
|
|||
|
"4 4 0 0 0 \n",
|
|||
|
"\n",
|
|||
|
" AcceptedCmp4 AcceptedCmp5 Recency Income_binned \n",
|
|||
|
"customer_id \n",
|
|||
|
"0 0 0 28 0.0 \n",
|
|||
|
"1 0 0 84 0.0 \n",
|
|||
|
"2 0 0 30 0.0 \n",
|
|||
|
"3 0 0 42 0.0 \n",
|
|||
|
"4 1 0 11 0.0 \n",
|
|||
|
"Размерность выборки после RandomOverSampler для первой бизнес-цели:\n",
|
|||
|
"X_train_res_1: (2258, 18)\n",
|
|||
|
"y_train_res_1: (2258,)\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Ulstu\\MII\\miivenv\\Lib\\site-packages\\featuretools\\synthesis\\deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# 1. One-hot encoding для категориальных признаков\n",
|
|||
|
"X_train = pd.get_dummies(X_train, drop_first=True)\n",
|
|||
|
"X_test = pd.get_dummies(X_test, drop_first=True)\n",
|
|||
|
"\n",
|
|||
|
"# 2. Дискретизация числовых признаков\n",
|
|||
|
"discretizer = KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='uniform')\n",
|
|||
|
"X_train['Income_binned'] = discretizer.fit_transform(X_train[['Income']])\n",
|
|||
|
"X_test['Income_binned'] = discretizer.transform(X_test[['Income']])\n",
|
|||
|
"\n",
|
|||
|
"# 3. Масштабирование признаков\n",
|
|||
|
"scaler_minmax = MinMaxScaler()\n",
|
|||
|
"X_train[['Income']] = scaler_minmax.fit_transform(X_train[['Income']])\n",
|
|||
|
"X_test[['Income']] = scaler_minmax.transform(X_test[['Income']])\n",
|
|||
|
"\n",
|
|||
|
"# Стандартизация признаков\n",
|
|||
|
"scaler_standard = StandardScaler()\n",
|
|||
|
"X_train[['Income']] = scaler_standard.fit_transform(X_train[['Income']])\n",
|
|||
|
"X_test[['Income']] = scaler_standard.transform(X_test[['Income']])\n",
|
|||
|
"\n",
|
|||
|
"# 4. Применение Featuretools для создания признаков\n",
|
|||
|
"es = ft.EntitySet(id=\"data\")\n",
|
|||
|
"es = es.add_dataframe(dataframe_name=\"customer_data\", dataframe=X_train, index=\"customer_id\")\n",
|
|||
|
"\n",
|
|||
|
"# Применяем deep feature synthesis для создания новых признаков\n",
|
|||
|
"features, feature_names = ft.dfs(entityset=es, target_dataframe_name=\"customer_data\", max_depth=2)\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"print(features.head())\n",
|
|||
|
"\n",
|
|||
|
"# 5. Балансировка выборки с помощью RandomOverSampler\n",
|
|||
|
"ros = RandomOverSampler(random_state=42)\n",
|
|||
|
"X_train_res_1, y_train_res_1 = ros.fit_resample(X_train, y_train)\n",
|
|||
|
"\n",
|
|||
|
"print(f\"Размерность выборки после RandomOverSampler для первой бизнес-цели:\")\n",
|
|||
|
"print(f\"X_train_res_1: {X_train_res_1.shape}\")\n",
|
|||
|
"print(f\"y_train_res_1: {y_train_res_1.shape}\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Опять то же предупреждение. Итого мы выполнили пункты 7-8."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"9.Начнём работу. Сначала для модели кое что настроим, чтобы она не ругалась)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 61,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"X_train columns: Index(['customer_id', 'Income', 'Kidhome', 'Teenhome', 'NumWebPurchases',\n",
|
|||
|
" 'MntWines', 'MntFruits', 'MntMeatProducts', 'MntFishProducts',\n",
|
|||
|
" 'MntSweetProducts', 'MntGoldProds', 'AcceptedCmp1', 'AcceptedCmp2',\n",
|
|||
|
" 'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'Recency',\n",
|
|||
|
" 'Income_binned'],\n",
|
|||
|
" dtype='object')\n",
|
|||
|
"X_test columns: Index(['Income', 'Kidhome', 'Teenhome', 'NumWebPurchases', 'MntWines',\n",
|
|||
|
" 'MntFruits', 'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts',\n",
|
|||
|
" 'MntGoldProds', 'AcceptedCmp1', 'AcceptedCmp2', 'AcceptedCmp3',\n",
|
|||
|
" 'AcceptedCmp4', 'AcceptedCmp5', 'Recency', 'Income_binned'],\n",
|
|||
|
" dtype='object')\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"print(\"X_train columns:\", X_train.columns)\n",
|
|||
|
"print(\"X_test columns:\", X_test.columns)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Видим, что столбцы отличаются. Выпустим фикс"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 66,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"X_train columns: Index(['Income', 'Kidhome', 'Teenhome', 'NumWebPurchases', 'MntWines',\n",
|
|||
|
" 'MntFruits', 'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts',\n",
|
|||
|
" 'MntGoldProds', 'AcceptedCmp1', 'AcceptedCmp2', 'AcceptedCmp3',\n",
|
|||
|
" 'AcceptedCmp4', 'AcceptedCmp5', 'Recency', 'Income_binned'],\n",
|
|||
|
" dtype='object')\n",
|
|||
|
"X_test columns: Index(['Income', 'Kidhome', 'Teenhome', 'NumWebPurchases', 'MntWines',\n",
|
|||
|
" 'MntFruits', 'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts',\n",
|
|||
|
" 'MntGoldProds', 'AcceptedCmp1', 'AcceptedCmp2', 'AcceptedCmp3',\n",
|
|||
|
" 'AcceptedCmp4', 'AcceptedCmp5', 'Recency', 'Income_binned'],\n",
|
|||
|
" dtype='object')\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"X_train = X_train.drop(columns=['customer_id'], errors='ignore')\n",
|
|||
|
"print(\"X_train columns:\", X_train.columns)\n",
|
|||
|
"print(\"X_test columns:\", X_test.columns)\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Привели к одниаковым столбцам. Но при попытке обучения вылезала новая ошибка:ValueError: The feature names should match those that were passed during fit. Feature names unseen at fit time: - NumWebPurchases. Залезем посмотреть в них."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 70,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"NumWebPurchases in X_train:\n",
|
|||
|
"0 6\n",
|
|||
|
"1 1\n",
|
|||
|
"2 7\n",
|
|||
|
"3 4\n",
|
|||
|
"4 6\n",
|
|||
|
"Name: NumWebPurchases, dtype: int64\n",
|
|||
|
"NumWebPurchases in X_test:\n",
|
|||
|
"937 11\n",
|
|||
|
"987 9\n",
|
|||
|
"8 3\n",
|
|||
|
"282 4\n",
|
|||
|
"1341 5\n",
|
|||
|
"Name: NumWebPurchases, dtype: int64\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Выведем столбец 'NumWebPurchases' из X_train и X_test\n",
|
|||
|
"print(\"NumWebPurchases in X_train:\")\n",
|
|||
|
"print(X_train['NumWebPurchases'].head()) # Покажем первые 5 значений\n",
|
|||
|
"\n",
|
|||
|
"print(\"NumWebPurchases in X_test:\")\n",
|
|||
|
"print(X_test['NumWebPurchases'].head()) "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 72,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')\n",
|
|||
|
"Index([937, 987, 8, 282, 1341, 1879, 286, 1080, 525, 977], dtype='int64')\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"print(X_train.index[:10]) # Проверим первые 10 индексов в X_train\n",
|
|||
|
"print(X_test.index[:10]) # Проверим первые 10 индексов в X_test\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Видим индексы полетели. Сбросим хехехе"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 76,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"X_train = X_train.reset_index(drop=True)\n",
|
|||
|
"X_test = X_test.reset_index(drop=True)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 77,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"NumWebPurchases in X_train: 0 6\n",
|
|||
|
"1 1\n",
|
|||
|
"2 7\n",
|
|||
|
"3 4\n",
|
|||
|
"4 6\n",
|
|||
|
"Name: NumWebPurchases, dtype: int64\n",
|
|||
|
"NumWebPurchases in X_test: 0 11\n",
|
|||
|
"1 9\n",
|
|||
|
"2 3\n",
|
|||
|
"3 4\n",
|
|||
|
"4 5\n",
|
|||
|
"Name: NumWebPurchases, dtype: int64\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"print(\"NumWebPurchases in X_train:\", X_train['NumWebPurchases'].head())\n",
|
|||
|
"print(\"NumWebPurchases in X_test:\", X_test['NumWebPurchases'].head())\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 82,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Accuracy: 0.8761261261261262\n",
|
|||
|
"ROC-AUC: 0.8519339641315966\n",
|
|||
|
"Время обучения модели: 0.2755 секунд\n",
|
|||
|
"Время предсказания: 0.0070 секунд\n",
|
|||
|
"Средняя точность по кросс-валидации: 0.8736\n",
|
|||
|
"Корреляция признаков с целевой переменной:\n",
|
|||
|
"Response 1.000000\n",
|
|||
|
"AcceptedCmp5 0.323374\n",
|
|||
|
"AcceptedCmp1 0.297345\n",
|
|||
|
"AcceptedCmp3 0.254005\n",
|
|||
|
"MntWines 0.246299\n",
|
|||
|
"MntMeatProducts 0.237746\n",
|
|||
|
"AcceptedCmp4 0.180205\n",
|
|||
|
"AcceptedCmp2 0.169294\n",
|
|||
|
"NumWebPurchases 0.151431\n",
|
|||
|
"MntGoldProds 0.140332\n",
|
|||
|
"Income 0.133047\n",
|
|||
|
"MntFruits 0.122443\n",
|
|||
|
"MntSweetProducts 0.116170\n",
|
|||
|
"MntFishProducts 0.108145\n",
|
|||
|
"Kidhome -0.077909\n",
|
|||
|
"Teenhome -0.153901\n",
|
|||
|
"Recency -0.199766\n",
|
|||
|
"Name: Response, dtype: float64\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA38AAAMhCAYAAABYMwgIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzddXgURx/A8e9d3N1dIAIEd3fXFiteoFhpKZQCbSFAhbZIKVAoLS4t7u7ubiFACBLi7nr7/nHkkksuQktfaDuf57kHsje797uZvdmdndlZmSRJEoIgCIIgCIIgCMK/mvxNByAIgiAIgiAIgiD8/UTjTxAEQRAEQRAE4T9ANP4EQRAEQRAEQRD+A0TjTxAEQRAEQRAE4T9ANP4EQRAEQRAEQRD+A0TjTxAEQRAEQRAE4T9ANP4EQRAEQRAEQRD+A0TjTxAEQRAEQRAE4T9ANP4EQRAEQRAEQRD+A0TjTxAEQRAEQRAE4T9ANP4EQfhHWbVqFTKZjCtXrhR777fffkMmk9GtWzfy8vLeQHSCIAiCIAhvL9H4EwThX2H79u2MGjWKxo0bs2HDBrS0tN50SIIgCIIgCG8V0fgTBOEf78SJE/Tt2xd/f392796Nvr7+mw5JEARBEAThrSMaf4Ig/KPduHGDrl274uDgwMGDBzEzMyuWZvPmzdSsWRMDAwOsra3p378/L168UEszePBgjI2Nefz4MW3btsXIyAhHR0dmzpyJJEmqdE+ePEEmkzFnzhx+/PFH3NzcMDAwoGnTpty5c6fYZ9+/f593330XS0tL9PX1qVWrFrt27dL4XZo1a4ZMJiv2WrVqlVq6JUuWULlyZQwNDdXSbdmyRW1blStXLvYZc+bMQSaT8eTJE9Wy/KG0hZcpFAoCAgI0fv6xY8do3LgxRkZGmJub07VrV4KCgtTSTJ8+HZlMRmxsrNryK1euFNtmft4XtWXLFmQyGSdOnFAtO336ND179sTV1RU9PT1cXFz45JNPyMjI0Lh+rVq1MDExUcunOXPmFEtbWH5+6OrqEhMTo/be+fPnVdspPPS4PHENHjxYY/kWfuWXgbu7O506deLQoUNUq1YNfX19/P392bZtm8ZYy1N2r5LP2dnZTJs2jZo1a2JmZoaRkRGNGzfm+PHjpeZdPnd391K/Z2EymYwPP/yQ9evX4+Pjg76+PjVr1uTUqVNq6fL3qcJSU1Oxt7cvFv/IkSOpUKEChoaGWFpa0qJFC06fPl0sxk6dOhWL/cMPPyz2OStXrqRFixbY2tqip6eHv78/S5Ys0fi9Bw8erLbsgw8+QF9fXy0+gMWLF1OpUiX09PRwdHRkzJgxJCYmqqUpWidYW1vTsWNHjXWNIAhCeWi/6QAEQRD+rJCQENq1a4eenh4HDx7EwcGhWJpVq1YxZMgQateuzaxZs4iKiuKnn37i7NmzXL9+HXNzc1XavLw82rVrR7169fjhhx84cOAAgYGB5ObmMnPmTLXtrlmzhpSUFMaMGUNmZiY//fQTLVq04Pbt29jZ2QFw9+5dGjZsiJOTE5MnT8bIyIhNmzbRrVs3tm7dSvfu3YvF6+vryxdffAFAbGwsn3zyidr7GzduZPTo0TRr1oyxY8diZGREUFAQ33777V/NTjVr167l9u3bxZYfOXKE9u3b4+npyfTp08nIyGDhwoU0bNiQa9eu4e7u/lrjKGrz5s2kp6czatQorKysuHTpEgsXLiQsLIzNmzer0p0/f55evXpRtWpVvvvuO8zMzDTmZ2m0tLRYt26d2jorV65EX1+fzMzMV45rxIgRtGrVSrXOgAED6N69Oz169FAts7GxUf3/4cOH9O7dm5EjRzJo0CBWrlxJz549OXDgAK1bty4x7pLK7lUkJyezbNky+vbty/Dhw0lJSWH58uW0bduWS5cuUa1atTK3Ua1aNSZMmKC2bM2aNRw+fLhY2pMnT7Jx40Y++ugj9PT0WLx4Me3atePSpUsaL2Lkmzt3LlFRUcWWZ2dn079/f5ydnYmPj2fp0qW0a9eOoKAgXF1dy86AIpYsWUKlSpXo0qUL2tra7N69m9GjR6NQKBgzZkyJ6wUGBrJ8+XI2btxIs2bNVMunT5/OjBkzaNWqFaNGjSI4OJglS5Zw+fJlzp49i46Ojiptfp0gSRIhISHMmzePDh068OzZs1f+HoIgCEiCIAj/ICtXrpQAac+ePZKXl5cESG3atNGYNjs7W7K1tZUqV64sZWRkqJbv2bNHAqRp06aplg0aNEgCpLFjx6qWKRQKqWPHjpKurq4UExMjSZIkhYaGSoBkYGAghYWFqdJevHhRAqRPPvlEtaxly5ZSlSpVpMzMTLVtNmjQQKpQoUKxeBs2bCg1b95c9Xf+Z61cuVK1rG/fvpK5ubna9zl+/LgESJs3b1Yta9q0qVSpUqVinzF79mwJkEJDQ1XL8vM0f1lmZqbk6uoqtW/fvtjnV6tWTbK1tZXi4uJUy27evCnJ5XJp4MCBqmWBgYESoMq3fJcvXy62zUGDBklGRkbFYt28ebMESMePH1ctS09PL5Zu1qxZkkwmk54+fapaNmXKFAmQIiIiVMvy83P27NnFtlFYfn707dtXqlKlimp5WlqaZGpqKr333nsSIF2+fPmV4yoMkAIDAzW+5+bmJgHS1q1bVcuSkpIkBwcHqXr16sViLU/ZvUo+5+bmSllZWWrpEhISJDs7O+n999/XGHPR+Dt27Fhs+ZgxY6Sipx6ABEhXrlxRLXv69Kmkr68vde/eXbUsf5/KFx0dLZmYmKi+a+H4i7p06ZIESFu2bPlTMWoq37Zt20qenp5qy9zc3KRBgwZJkiRJS5culQBp4cKFammio6MlXV1dqU2bNlJeXp5q+aJFiyRAWrFihWpZ06ZNpaZNm6qt//nnn0uAFB0dXeL3FQRBKIkY9ikIwj/S4MGDef78Oe+99x6HDh1S6/XJd+XKFaKjoxk9erTafYAdO3bE19eXvXv3Flvnww8/VP0/fzhadnY2R44cUUvXrVs3nJycVH/XqVOHunXrsm/fPgDi4+M5duwYvXr1IiUlhdjYWGJjY4mLi6Nt27Y8fPiw2NDT7Oxs9PT0Sv3eKSkpGBoa/q33Nf7888/ExcURGBiotjwiIoIbN24wePBgLC0tVcsDAgJo3bq16rsXFh8fr/rusbGxJCUllfi5hdPFxsaSkpJSLI2BgYHq/2lpacTGxtKgQQMkSeL69euq91JSUpDL5Wo9u69qwIAB3L9/XzW8c+vWrZiZmdGyZcs/HdercHR0VOsdNjU1ZeDAgVy/fp3IyEiN65RUdq9KS0sLXV1dQDmMND4+ntzcXGrVqsW1a9f+0rY1qV+/PjVr1lT97erqSteuXTl48GCJM/d+9dVXmJmZ8dFHH2l8PzMzk9jYWIKCgvjpp58wMDCgVq1aamlycnKK7XdFe3VBvXyTkpKIjY2ladOmPH78WOM+vXPnTkaPHs3EiRPV6hRQ9p5nZ2czbtw45PKC07Dhw4djamparF7KjzEmJobz58+zfft2AgICsLa21vi9BUEQSiMaf4Ig/CPFx8ezbt06Vq9eTbVq1fj444+LnYQ9ffoUAB8fn2Lr+/r6qt7PJ5fL8fT0VFtWsWJFALV7qgAqVKhQbJsVK1ZUpXv06BGSJDF16lRsbGzUXvkn5tHR0WrrJyYmarwnq7D69esTHh7O9OnTefbsWZkNqleVlJTEt99+y/jx41XDV/OVlp9+fn7ExsaSlpamttzHx0ftuxce9lhYWlpasXx6//33i6V79uyZqvFpbGyMjY0NTZs2VcWer379+igUCj7++GNCQkKIjY0lISHhlfLCxsaGjh07smLFCgBWrFjBoEGD1E7YXzWuV+Ht7V3s3rOS9sf8zymp7P6M1atXExAQgL6+PlZWVtjY2LB3797Xur/lK+n3lJ6eXuy+S4DQ0FCWLl3KjBkzSrwQsmrVKmxsbPD39+fo0aMcPnwYNzc3tTSHDh0qtt8tX7682LbOnj1Lq1atVPe52tjY8PnnnwPFy/fGjRv07duXvLw84uPji22rpN+Rrq4unp6exeqlc+fOYWNjg62tLQ0
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x800 with 2 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
" precision recall f1-score support\n",
|
|||
|
"\n",
|
|||
|
" 0 0.89 0.98 0.93 377\n",
|
|||
|
" 1 0.70 0.31 0.43 67\n",
|
|||
|
"\n",
|
|||
|
" accuracy 0.88 444\n",
|
|||
|
" macro avg 0.79 0.64 0.68 444\n",
|
|||
|
"weighted avg 0.86 0.88 0.86 444\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import time\n",
|
|||
|
"import pandas as pd\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"from sklearn.model_selection import train_test_split, cross_val_score\n",
|
|||
|
"from sklearn.metrics import accuracy_score, roc_auc_score, classification_report\n",
|
|||
|
"from sklearn.ensemble import RandomForestClassifier\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"\n",
|
|||
|
"# 1. Оценка предсказательной способности (Accuracy и ROC-AUC для бинарной классификации)\n",
|
|||
|
"model = RandomForestClassifier(random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"start_time = time.perf_counter()\n",
|
|||
|
"model.fit(X_train, y_train)\n",
|
|||
|
"end_time = time.perf_counter()\n",
|
|||
|
"\n",
|
|||
|
"train_time = end_time - start_time\n",
|
|||
|
"\n",
|
|||
|
"y_pred = model.predict(X_test)\n",
|
|||
|
"accuracy = accuracy_score(y_test, y_pred)\n",
|
|||
|
"roc_auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1])\n",
|
|||
|
"\n",
|
|||
|
"print(f\"Accuracy: {accuracy}\")\n",
|
|||
|
"print(f\"ROC-AUC: {roc_auc}\")\n",
|
|||
|
"print(f\"Время обучения модели: {train_time:.4f} секунд\")\n",
|
|||
|
"\n",
|
|||
|
"# 2. Оценка скорости вычисления (время предсказания)\n",
|
|||
|
"start_time = time.perf_counter()\n",
|
|||
|
"y_pred = model.predict(X_test)\n",
|
|||
|
"end_time = time.perf_counter()\n",
|
|||
|
"predict_time = end_time - start_time\n",
|
|||
|
"\n",
|
|||
|
"print(f\"Время предсказания: {predict_time:.4f} секунд\")\n",
|
|||
|
"\n",
|
|||
|
"# 3. Оценка надежности модели с помощью перекрестной проверки\n",
|
|||
|
"cv_scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')\n",
|
|||
|
"mean_cv_score = np.mean(cv_scores)\n",
|
|||
|
"print(f\"Средняя точность по кросс-валидации: {mean_cv_score:.4f}\")\n",
|
|||
|
"\n",
|
|||
|
"# 4. Оценка корреляции признаков с целевой переменной\n",
|
|||
|
"correlation_matrix = pd.concat([X, y], axis=1).corr()\n",
|
|||
|
"correlation_with_target = correlation_matrix['Response'].sort_values(ascending=False)\n",
|
|||
|
"print(\"Корреляция признаков с целевой переменной:\")\n",
|
|||
|
"print(correlation_with_target)\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация корреляции\n",
|
|||
|
"plt.figure(figsize=(10, 8))\n",
|
|||
|
"sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')\n",
|
|||
|
"plt.title('Корреляционная матрица признаков')\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# Дополнительная информация о модели\n",
|
|||
|
"print(classification_report(y_test, y_pred))\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"ВРОДЕ ВСЁ Я ТАК ДОЛГО ЧИНИЛСЯ"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "miivenv",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.12.5"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|