AIM-PIbd-32-Chernyshev-G-Y/lab_2/Lab2.ipynb

1962 lines
393 KiB
Plaintext
Raw Normal View History

2024-10-18 20:05:31 +04:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Датасет 1\n",
"https://www.kaggle.com/datasets/antonkozyriev/game-recommendations-on-steam"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Проблемная область: аналитика рынка видеоигр (в данном случае на площадке Steam)\n",
"\n",
"Объект наблюдения: игры на площадке steam. Атрибутами являются характеристики игры (название, дата выпуска, цена, наличие игры на разных игровых платформах (пк, консоли)) и её оценка игроками (рейтинг, отзывы)\n",
"В данном датасете только 1 объект, но можно указать следующую связь: Игра связана со множеством отзывов\n",
"\n",
"Бизнес-цель: Определить, как основные характеристики влияют на оценку игры steam, чтобы разработчики и издатели игр знали, во что следует вкладывать больше временных и денежных ресуров. Эффект для бизнеса: увеличение шансов на успех игры, снижение рисков финансовых потерь\n",
"\n",
"Цель технического проекта: построить модель машинного обучения, которая предскажет, какую оценку от игроков получит игра.\n",
"Вход: дата выпуска игры (чтобы возможно найти закономерности между месяцем выпуска игры и её высокой оценкой), цена игры, наличие игры на windows, linux и mac. Целевой признак: рейтинг\n"
]
},
{
"cell_type": "code",
"execution_count": 296,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['app_id', 'title', 'date_release', 'win', 'mac', 'linux', 'rating',\n",
" 'positive_ratio', 'user_reviews', 'price_final', 'price_original',\n",
" 'discount', 'steam_deck'],\n",
" dtype='object')\n"
]
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"df1 = pd.read_csv(\"..//static//csv//games.csv\")\n",
"\n",
"print(df1.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Оценка всех числовых признаков показывает, что в датасете довольно много выбросов. \n",
"\n",
"По столбцу positive_ratio есть игры, у которых очень мало позитивных отзывов, однако в случае игр важно знать и игры, у которых больше негативных отзывов, чем положительных, т.е. это полезный шум. Данные же смещены в сторону игр с бОльшим количеством положительных отзывов (более 60%), чем отрицательных. Однако данный столбец может влиять на столбец со строковыми значениями rating, поэтому в дальнейшем его можно считать просто шумом \n",
"\n",
"В столбце user_reviews есть серьёзный выброс с крайне большим количеством отзывов, однако сам столбец можно считать шумом, т.к. в данной ситуации количество отзывов не так важно, как рейтинг игры. \n",
"\n",
"Столбец price_final зависит от столбцов price_original и discount. В данном случае не стоит учитывать скидки на игры и их цену после скидки, поэтому столбцы price_final и discount можно считать шумом.\n",
"\n",
"В столбце price_original есть много выбросов, которые находятся выше средних значений. Для анализа желательны разные цены игр, однако игры с ценами более 150$ можно удалить, т.к. вероятность настолько дорогой игры крайне мала и из-за таких игр модель может обучиться некорректно. Данные же в столбце смещены в сторону игр до 25$"
]
},
{
"cell_type": "code",
"execution_count": 297,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKUAAAMWCAYAAAAgRDUeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADNkElEQVR4nOzde3zP9f//8ft7m713ntM2Ww5byJxDcpxD5BCykBSFSJ9M5RA1lVOykNLBIT6fTERFqPTJoXJYmWJ9KoSQU9gobWMYttfvD7+9vt42bLz3fr83t+vl8rq01/P1eL9ej9drej33erxfr+fLYhiGIQAAAAAAAMCB3JydAAAAAAAAAG49FKUAAAAAAADgcBSlAAAAAAAA4HAUpQAAAAAAAOBwFKUAAAAAAADgcBSlAAAAAAAA4HAUpQAAAAAAAOBwFKUAAAAAAADgcBSlAAAAAAAA4HAUpeBwFotF48aNc8i2Vq1apTvvvFNeXl6yWCxKTU11yHYdqSDHMzw8XP369SvUfJxp/fr1slgsWr9+vbNTAWAn9Bm3Jkf+3gHcWuhXrs0Vzr8LFixQZGSkSpQooZIlS0qSWrVqpVatWhXqdseNGyeLxVKo20BuFKWKkfj4eFksFpspODhYrVu31ldffeXs9G7ab7/9pnHjxunAgQP5iv/777/Vs2dPeXt7a8aMGVqwYIF8fX0LN0kXsGnTJo0bN65IdHo3aubMmYqPj3d2GkCRRp9h61btMwDAXuhXbNGv3Jhdu3apX79+qly5subOnas5c+Y4OyUUMg9nJwD7mzBhgiIiImQYhlJSUhQfH6/77rtPX3zxhTp37uzs9G7Yb7/9pvHjx6tVq1YKDw+/bvyWLVt06tQpvfLKK2rbtm3hJ+gkZ8+elYfH//2vvGnTJo0fP179+vUzv1nIsXv3brm5Ff1a9MyZM1W2bNlcd321aNFCZ8+elaenp3MSA4og+oxLbpU+o6i4sm8DUHTQr1xSVPsVZ59/169fr+zsbL311luqUqWK2b5mzRqn5YTCRW9fDHXs2FF33XWXOT9gwACFhIRo8eLFRbojKKjjx49LUq7CzM3IyMhwuW84vLy88h1rtVoLMZMbYxiGzp07J29v75tel5ubW4GOBwD6jBy3Sp9xpezsbJ0/f/6Gz51nzpyRj4+PnbMqWN8GwLXQr1xSlPqVy/sCZ59/r3bc+NK5+Cr6t0zgukqWLClvb+9cFe+MjAyNGDFCFSpUkNVqVbVq1fT666/LMAxJl6rkkZGRioyM1NmzZ83PnTx5UqGhoWratKmysrIkSf369ZOfn5/++OMPtW/fXr6+vgoLC9OECRPM9V3L//73P3Xs2FEBAQHy8/NTmzZttHnzZnN5fHy8HnzwQUlS69atzVuCrzZ2UKtWrdS3b19JUsOGDWWxWGzuqlmyZIkaNGggb29vlS1bVn369NGRI0ds1pGzT/v27dN9990nf39/9e7d+6r7kPMM8q5du9SzZ08FBASoTJkyevbZZ3Xu3Dmb2IsXL+qVV15R5cqVZbVaFR4ertGjRyszM9MmbuvWrWrfvr3Kli0rb29vRURE6PHHH7eJufy573HjxmnkyJGSpIiICPM45dxmfPmYUlu3bpXFYtH8+fNz7cvq1atlsVi0cuVKs+3IkSN6/PHHFRISIqvVqpo1a+r999+/6vG4mvDwcHXu3FmrV6/WXXfdJW9vb7333nuSpHnz5umee+5RcHCwrFaratSooVmzZuX6/I4dO7RhwwZz/3KeL7/amFL5+X0DuIQ+wzF9Rr9+/fL8pj2v8SzWrl2r5s2bq2TJkvLz81O1atU0evRom5jMzEyNHTtWVapUkdVqVYUKFTRq1Khc/YrFYtGQIUP04YcfqmbNmrJarVq1atVV87xcq1atVKtWLSUlJalFixby8fEx88jP9mvVqqXWrVvnWm92drZuu+029ejRwybPK8c0uV4/ZBiGypYtq+HDh9usu2TJknJ3d7d5rH3y5Mny8PDQ6dOnJUnJycnq37+/ypcvL6vVqtDQUHXt2jXfj+kAuDr6Fde7FrlWX3C18++AAQMUFhYmq9WqiIgIPfXUUzp//rwZk5qaqqFDh5q/zypVqmjy5MnKzs6+as5XCg8P19ixYyVJQUFBNrlcOaZUzt/9n3zyiV599VWVL19eXl5eatOmjfbu3Wuz3oSEBD344IOqWLGi2UcNGzbM5t8VnIc7pYqhtLQ0/fXXXzIMQ8ePH9c777yj06dPq0+fPmaMYRi6//77tW7dOg0YMEB33nmnVq9erZEjR+rIkSN688035e3trfnz56tZs2Z68cUX9cYbb0iSYmJilJaWpvj4eLm7u5vrzMrKUocOHdS4cWNNmTJFq1at0tixY3Xx4kVNmDDhqvnu2LFDUVFRCggI0KhRo1SiRAm99957atWqlTZs2KBGjRqpRYsWeuaZZ/T2229r9OjRql69uiSZ/73Siy++qGrVqmnOnDnmLcSVK1eWdKlT6d+/vxo2bKi4uDilpKTorbfe0vfff6///e9/NlX5ixcvqn379mrevLlef/31fH0b3LNnT4WHhysuLk6bN2/W22+/rX/++UcffPCBGTNw4EDNnz9fPXr00IgRI/TDDz8oLi5OO3fu1PLlyyVd+pagXbt2CgoK0gsvvKCSJUvqwIEDWrZs2VW33a1bN/3+++9avHix3nzzTZUtW1bSpZP6le666y7dfvvt+uSTT8xOM8fHH3+sUqVKqX379pKklJQUNW7c2OzAgoKC9NVXX2nAgAFKT0/X0KFDr3tcLrd79249/PDDevLJJ/XEE0+oWrVqkqRZs2apZs2auv/+++Xh4aEvvvhCgwcPVnZ2tmJiYiRJ06dP19NPPy0/Pz+9+OKLkqSQkJCrbqsgv2/gVkSf4dw+43p27Nihzp07q06dOpowYYKsVqv27t2r77//3ozJzs7W/fffr++++06DBg1S9erVtW3bNr355pv6/ffftWLFCpt1fvvtt/rkk080ZMgQlS1bNl+PoeT4+++/1bFjR/Xq1Ut9+vRRSEhIvrf/0EMPady4cUpOTla5cuXMdX733Xc6evSoevXqddXt5qcfslgsatasmTZu3Gh+7tdff1VaWprc3Nz0/fffq1OnTpIuXaDUq1dPfn5+kqTu3btrx44devrppxUeHq7jx49r7dq1OnToUIGODwD6Fcn1r0Wk/PcFR48e1d13363U1FQNGjRIkZGROnLkiJYuXaozZ87I09NTZ86cUcuWLXXkyBE9+eSTqlixojZt2qTY2FgdO3ZM06dPv27e0qW/8z/44AMtX75cs2bNkp+fn+rUqXPNz7z22mtyc3PTc889p7S0NE2ZMkW9e/fWDz/8YMYsWbJEZ86c0VNPPaUyZcroxx9/1DvvvKM///xTS5YsyVduKEQGio158+YZknJNVqvViI+Pt4ldsWKFIcmYOHGiTXuPHj0Mi8Vi7N2712yLjY013NzcjI0bNxpLliwxJBnTp0+3+Vzfvn0NScbTTz9ttmVnZxudOnUyPD09jRMnTpjtkoyxY8ea89HR0Yanp6exb98+s+3o0aOGv7+/0aJFC7MtZ9vr1q0r0PHYsmWL2Xb+/HkjODjYqFWrlnH27FmzfeXKlYYkY8yYMbn26YUXXsjX9saOHWtIMu6//36b9sGDBxuSjF9++cUwDMP4+eefDUnGwIEDbeKee+45Q5Lx7bffGoZhGMuXL8+Vf16uPJ5Tp041JBn79+/PFVupUiWjb9++5nxsbKxRokQJ4+TJk2ZbZmamUbJkSePxxx832wYMGGCEhoYaf/31l836evXqZQQGBhpnzpy5Zo5X5iDJWLVqVa5lea2nffv2xu23327TVrNmTaNly5a5YtetW2fzb6Qgv2/gVkOfkffxcFSf0bdvX6NSpUq52nP6khxvvvmmIcnmmFxpwYIFhpubm5GQkGDTPnv2bEOS8f3335ttkgw3Nzdjx44d+crzci1btjQkGbNnz76h7e/evduQZLzzzjs2cYMHDzb8/Pxs+oArf+/57YemTp1quLu7G+np6YZhGMbbb79tVKpUybj77ruN559
"text/plain": [
"<Figure size 1200x800 with 5 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"numeric_cols = df1.select_dtypes(include=['number']).columns\n",
"\n",
"#все столбцы, кроме app_id\n",
"numeric_cols = [col for col in numeric_cols if col != 'app_id']\n",
"\n",
"plt.figure(figsize=(12, 8))\n",
" \n",
"\n",
"for i, col in enumerate(numeric_cols, 1):\n",
" if col == 'id':\n",
" continue\n",
" Q1 = df1[col].quantile(0.25)\n",
" Q3 = df1[col].quantile(0.75)\n",
" IQR = Q3 - Q1\n",
" lower_bound = Q1 - 1.5 * IQR\n",
" upper_bound = Q3 + 1.5 * IQR\n",
" outliers = df1[col][(df1[col] < lower_bound) | (df1[col] > upper_bound)]\n",
" plt.subplot(len(numeric_cols) // 3 + 1, 3, i) \n",
" plt.boxplot(x=df1[col])\n",
" plt.title(f'Boxplot for {col}')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Для проверки на просачивание данных, рейтинг игры, представленный в датасете в виде строковых значений, необходимо перевести в числовую шкалу. Было бы логично перевести игры в 5-бальную шкалу или 10-бальную, но всего разных строковых рейтингов 9, что не делится на 5 и 10. Поэтому для равномерного распределения строковых рейтингов они были переведены в 3-бальную шкалу. С этой шкалой сильно коррелирует только столбец с отношением положительных отзывов к отрицательным (positive_ratio), что логично, т.к. от этого столбца зависит столбец rating, на основе которого и был создан столбец rating_stars с 5-бальной шкалой. Однако признак positive_ratio не будет входным, поэтому просачивания данных не будет."
]
},
{
"cell_type": "code",
"execution_count": 298,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Very Positive' 'Positive' 'Mixed' 'Mostly Positive'\n",
" 'Overwhelmingly Positive' 'Negative' 'Mostly Negative'\n",
" 'Overwhelmingly Negative' 'Very Negative']\n",
"Просачивание данных: Высокая корреляция (0.82) между столбцами 'positive_ratio' и 'rating_stars'\n"
]
}
],
"source": [
"#просмотр того, какие рейтинги игр есть в таблице\n",
"print(df1['rating'].unique())\n",
"\n",
"#преобразование строковых значений рейтинга в числовые оценки от 1 до 5\n",
"# rating_mapping = {'Overwhelmingly Positive': 5, \n",
"# 'Very Positive': 5, \n",
"# 'Positive': 4, \n",
"# 'Mostly Positive': 4, \n",
"# 'Mixed': 3, \n",
"# 'Mostly Negative': 3, \n",
"# 'Negative': 2, \n",
"# 'Very Negative': 2,\n",
"# 'Overwhelmingly Negative': 1\n",
"# } \n",
"# rating_mapping = {'Overwhelmingly Positive': 10, \n",
"# 'Very Positive': 9, \n",
"# 'Positive': 8, \n",
"# 'Mostly Positive': 7, \n",
"# 'Mixed': 6, \n",
"# 'Mostly Negative': 5, \n",
"# 'Negative': 4, \n",
"# 'Very Negative': 3,\n",
"# 'Overwhelmingly Negative': 2\n",
"# } \n",
"rating_mapping = {'Overwhelmingly Positive': 3, \n",
" 'Very Positive': 3, \n",
" 'Positive': 3, \n",
" 'Mostly Positive': 2, \n",
" 'Mixed': 2, \n",
" 'Mostly Negative': 2, \n",
" 'Negative': 1, \n",
" 'Very Negative': 1,\n",
" 'Overwhelmingly Negative': 1\n",
" } \n",
"df1['rating_stars'] = df1['rating'].map(rating_mapping)\n",
"\n",
"\n",
"#проверка кореляции (просачивания данных)\n",
"main_col = 'rating_stars'\n",
"for col1 in numeric_cols:\n",
" if col1 != main_col:\n",
" correlation = df1[col1].corr(df1[main_col])\n",
" if abs(correlation) > 0.7:\n",
" print(f\"Просачивание данных: Высокая корреляция ({correlation:.2f}) между столбцами '{col1}' и '{main_col}'\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Данный датасет не совсем информативный, т.к. нет данных о жанре игры и об издателе, что вполне может повлиять на оценку игры. Тем не менее в нём есть данные об отзывах и оценке, дате выхода, цене и доступных платформах, что так же может влиять на оценку игры.\n",
"\n",
"Покрытие у датасета хорошее, т.к. содержится 50000 записей об играх с 1997 по 2023 год, однако важных данных об играх текущего года здесь нет. Данные также могут быть неактуальны, т.к. с последней даты выхода игры прошёл год, за который отзывы на игры могли измениться. \n",
"\n",
"Метки согласованы, однако метку final_price можно принять за окончательную цену игры после её выпуска, что неверно, т.к. это на самом деле означает цену после применения скидки"
]
},
{
"cell_type": "code",
"execution_count": 299,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['app_id', 'title', 'date_release', 'win', 'mac', 'linux', 'rating',\n",
" 'positive_ratio', 'user_reviews', 'price_final', 'price_original',\n",
" 'discount', 'steam_deck', 'rating_stars'],\n",
" dtype='object')\n",
"Количество записей: 50872\n",
"<DatetimeArray>\n",
"['1997-06-30 00:00:00', '1997-11-14 00:00:00', '1998-11-08 00:00:00',\n",
" '1999-04-01 00:00:00', '1999-09-08 00:00:00', '1999-11-01 00:00:00',\n",
" '2000-11-01 00:00:00', '2001-06-01 00:00:00', '2002-08-28 00:00:00',\n",
" '2003-05-01 00:00:00',\n",
" ...\n",
" '2023-10-12 00:00:00', '2023-10-13 00:00:00', '2023-10-15 00:00:00',\n",
" '2023-10-16 00:00:00', '2023-10-17 00:00:00', '2023-10-18 00:00:00',\n",
" '2023-10-19 00:00:00', '2023-10-20 00:00:00', '2023-10-23 00:00:00',\n",
" '2023-10-24 00:00:00']\n",
"Length: 4292, dtype: datetime64[ns]\n"
]
}
],
"source": [
"print(df1.columns)\n",
"print(f\"Количество записей: {df1.shape[0]}\")\n",
"#даты выхода игр\n",
"df1['date_release'] = pd.to_datetime(df1['date_release'])\n",
"df_sorted = df1.sort_values(by='date_release')\n",
"print(df_sorted['date_release'].unique())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Во всех столбцах нет пропущенных данных, поэтому данную проблему устранять не надо"
]
},
{
"cell_type": "code",
"execution_count": 300,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cтолбцы, в которых пропущены значения: []\n"
]
}
],
"source": [
"columns_with_nulls = []\n",
"for col in df1.columns:\n",
" if df1[col].isnull().sum() > 0: \n",
" columns_with_nulls.append(col)\n",
"print(f\"Cтолбцы, в которых пропущены значения: {columns_with_nulls}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**РАЗБИЕНИЕ НА ВЫБОРКИ**\n",
"\n",
"train_data - обучающая выборка\n",
"\n",
"val_data - контрольная выборка\n",
"\n",
"test_data - тестовая выборка"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Заметно, что в обучающую выборку попало слишком мало игр с низким рейтингом. Необходимо прирастить данные для таких игр через oversampling "
]
},
{
"cell_type": "code",
"execution_count": 301,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Размер обучающей выборки: 40697\n",
"Размер контрольной выборки: 5087\n",
"Размер тестовой выборки: 5088\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlEAAAGwCAYAAACJjDBkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAs30lEQVR4nO3de1xVdb7/8fcGuXhjkzeQkbxkIpih4Q27moyYTBMnZ0pzjIqsPOCodFEnU6tzxu5mRTlNo3RmcrKb1mhhiAKToCbKKKb+ssGoBLRUtjIJCuv3xxzWcXtLvqB7o6/n47Efj9b6ftban/VtuXk/1l4sHJZlWQIAAECD+Hi6AQAAgOaIEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCghacbuFDU1dVpz549atu2rRwOh6fbAQAAZ8GyLB06dEhhYWHy8WnYtSVCVBPZs2ePwsPDPd0GAAAw8M0336hLly4N2oYQ1UTatm0r6d//E4KCgjzcDQAAOBsul0vh4eH2z/GGIEQ1kfqv8IKCgghRAAA0Mya34nBjOQAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgIEWnm4AAICz0W36Ck+3AA/b/VSCp1tww5UoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAAx4NUXPnztXAgQPVtm1bderUSYmJidq5c6dbzZEjR5SSkqL27durTZs2Gj16tCoqKtxqSktLlZCQoFatWqlTp056+OGHdezYMbeanJwcXXXVVQoICFDPnj2VkZFxUj/p6enq1q2bAgMDNXjwYG3YsKHJjxkAAFwYPBqicnNzlZKSonXr1ikrK0tHjx7ViBEjVFVVZddMnTpVf/vb3/Tuu+8qNzdXe/bs0a233mqP19bWKiEhQTU1NcrPz9ebb76pjIwMzZo1y64pKSlRQkKChg0bpqKiIk2ZMkX33nuvVq5cadcsWbJEaWlpmj17tjZt2qTo6GjFx8dr796952cyAABAs+KwLMvydBP19u3bp06dOik3N1fXXXedKisr1bFjRy1evFi/+tWvJEk7duxQZGSkCgoKNGTIEH3yySf6xS9+oT179igkJESStGDBAk2bNk379u2Tv7+/pk2bphUrVqi4uNh+rzFjxujgwYPKzMyUJA0ePFgDBw7UK6+8Ikmqq6tTeHi4Jk2apOnTp5/Ua3V1taqrq+1ll8ul8PBwVVZWKigo6JzNEQBcrLpNX+HpFuBhu59KaPJ9ulwuOZ1Oo5/fXnVPVGVlpSSpXbt2kqTCwkIdPXpUcXFxdk3v3r116aWXqqCgQJJUUFCgvn372gFKkuLj4+VyubRt2za75vh91NfU76OmpkaFhYVuNT4+PoqLi7NrTjR37lw5nU77FR4e3tjDBwAAzYjXhKi6ujpNmTJFV199ta644gpJUnl5ufz9/RUcHOxWGxISovLycrvm+ABVP14/dqYal8ulH3/8Ud9//71qa2tPWVO/jxPNmDFDlZWV9uubb74xO3AAANAstfB0A/VSUlJUXFyszz77zNOtnJWAgAAFBAR4ug0AAOAhXnElKjU1VcuXL9eaNWvUpUsXe31oaKhqamp08OBBt/qKigqFhobaNSf+tl798k/VBAUFqWXLlurQoYN8fX1PWVO/DwAAgON5NERZlqXU1FQtXbpUq1evVvfu3d3GY2Ji5Ofnp+zsbHvdzp07VVpaqtjYWElSbGystm7d6vZbdFlZWQoKClJUVJRdc/w+6mvq9+Hv76+YmBi3mrq6OmVnZ9s1AAAAx/Po13kpKSlavHixPvzwQ7Vt29a+/8jpdKply5ZyOp1KTk5WWlqa2rVrp6CgIE2aNEmxsbEaMmSIJGnEiBGKiorS+PHj9cwzz6i8vFwzZ85USkqK/XXbAw88oFdeeUWPPPKI7rnnHq1evVrvvPOOVqz4v9/0SEtLU1JSkgYMGKBBgwbpxRdfVFVVle6+++7zPzEAAMDreTREvfbaa5KkG264wW39okWLdNddd0mS5s2bJx8fH40ePVrV1dWKj4/Xq6++atf6+vpq+fLlmjhxomJjY9W6dWslJSXpiSeesGu6d++uFStWaOrUqZo/f766dOmiN954Q/Hx8XbN7bffrn379mnWrFkqLy9Xv379lJmZedLN5gAAAJKXPSeqOWvMcyYAAD+N50SB50QBAABcAAhRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABlp4ugEAzUO36Ss83QI8bPdTCZ5uAfAqXIkCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAw4NEQlZeXp5tvvllhYWFyOBxatmyZ2/hdd90lh8Ph9ho5cqRbzf79+zVu3DgFBQUpODhYycnJOnz4sFvNli1bdO211yowMFDh4eF65plnTurl3XffVe/evRUYGKi+ffvq448/bvLjBQAAFw6PhqiqqipFR0crPT39tDUjR45UWVmZ/frrX//qNj5u3Dht27ZNWVlZWr58ufLy8nTffffZ4y6XSyNGjFDXrl1VWFioZ599VnPmzNHrr79u1+Tn52vs2LFKTk7W5s2blZiYqMTERBUXFzf9QQMAgAtCC0+++U033aSbbrrpjDUBAQEKDQ095dj27duVmZmpzz//XAMGDJAkvfzyyxo1apSee+45hYWF6a233lJNTY0WLlwof39/9enTR0VFRXrhhRfssDV//nyNHDlSDz/8sCTpySefVFZWll555RUtWLDglO9dXV2t6upqe9nlcjX4+AEAQPPl9fdE5eTkqFOnToqIiNDEiRP1ww8/2GMFBQUKDg62A5QkxcXFycfHR+vXr7drrrvuOvn7+9s18fHx2rlzpw4cOGDXxMXFub1vfHy8CgoKTtvX3Llz5XQ67Vd4eHiTHC8AAGgevDpEjRw5Uv/zP/+j7OxsPf3008rNzdVNN92k2tpaSVJ5ebk6derktk2LFi3Url07lZeX2zUhISFuNfXLP1VTP34qM2bMUGVlpf365ptvGnewAACgWfHo13k/ZcyYMfZ/9+3bV1deeaUuu+wy5eTkaPjw4R7s7N9fMwYEBHi0BwAA4DlefSXqRD169FCHDh20a9cuSVJoaKj27t3rVnPs2DHt37/fvo8qNDRUFRUVbjX1yz9Vc7p7sQAAAJpViPr222/1ww8/qHPnzpKk2NhYHTx4UIWFhXbN6tWrVVdXp8GDB9s1eXl5Onr0qF2TlZWliIgIXXLJJXZNdna223tlZWUpNjb2XB8SAABopjwaog4fPqyioiIVFRVJkkpKSlRUVKTS0lIdPnxYDz/8sNatW6fdu3crOztbt9xyi3r27Kn4+HhJUmR
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"rating_stars\n",
"1 296\n",
"2 18144\n",
"3 22257\n",
"Name: count, dtype: int64\n"
]
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"data=df1[['date_release', 'win', 'linux', 'mac', 'price_original', 'rating_stars']].copy()\n",
"# сначала разделение записей на 80% и 20%, где 80% - обучающая выборка\n",
"train_data, temp_data = train_test_split(data, test_size=0.2, random_state=42)\n",
"\n",
"# потом разделение остальных 20% поровну на контрольную и тестовую выборки\n",
"val_data, test_data = train_test_split(temp_data, test_size=0.5, random_state=42)\n",
"\n",
"# Проверка размеров выборок\n",
"print(\"Размер обучающей выборки:\", len(train_data))\n",
"print(\"Размер контрольной выборки:\", len(val_data))\n",
"print(\"Размер тестовой выборки:\", len(test_data))\n",
"\n",
"\n",
"# построение столбчатой диаграммы по столбцу rating_stars (сбалансированность обучающей выборки)\n",
"rating_counts = train_data['rating_stars'].value_counts().sort_index()\n",
"\n",
"plt.bar(rating_counts.index, rating_counts.values)\n",
"plt.xlabel('Rating Stars')\n",
"plt.ylabel('Count')\n",
"plt.show()\n",
"\n",
"print(train_data[\"rating_stars\"].value_counts().sort_index())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**ПРИРАЩЕНИЕ ДАННЫХ (oversampling)**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"После приращения данных по играм с отрицательными отзывами стало гораздо больше. Теперь распределение игр стало гораздо сбалансированнее"
]
},
{
"cell_type": "code",
"execution_count": 302,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlEAAAGwCAYAAACJjDBkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAs3klEQVR4nO3de1hVdb7H8c8GuXgD8gYykpdMBDM0vGE3TUZMp4mTU2mOUZGVBxyVLupkannOOF3NinKaRunM5GQ3rdHCEAVKUBNlFFNPNhiVXCyVrUyCwjp/zGGNW9HkB7o3+X49z34e1/p999rf9XO5/TxrLRYOy7IsAQAAoEG83N0AAABAc0SIAgAAMECIAgAAMECIAgAAMECIAgAAMECIAgAAMECIAgAAMNDC3Q38VNTW1mr//v1q27atHA6Hu9sBAADnwLIsHTlyRKGhofLyati5JUJUE9m/f7/CwsLc3QYAADDw9ddfq0uXLg16DyGqibRt21bSv/4SAgIC3NwNAAA4F06nU2FhYfb/4w1BiGoidZfwAgICCFEAADQzJrficGM5AACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAgRbubgDnptvM1e5uAW627/dj3N0C4FZ8D8LTvgc5EwUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCghbsbANA8dJu52t0twM32/X6Mu1sAPIpbz0QtWLBAAwcOVNu2bdWpUyfFx8drz549LjXHjh1TUlKS2rdvrzZt2mjs2LEqKytzqSkuLtaYMWPUqlUrderUSQ8//LBOnDjhUpOVlaWrrrpKfn5+6tmzp9LS0k7rJzU1Vd26dZO/v78GDx6szZs3N/k+AwCAnwa3hqjs7GwlJSVp48aNysjI0PHjxzVy5EhVVlbaNdOnT9ff/vY3vf3228rOztb+/ft1yy232OM1NTUaM2aMqqurlZubq9dff11paWmaM2eOXVNUVKQxY8Zo+PDhKigo0LRp03TvvfdqzZo1ds3y5cuVkpKiuXPnauvWrYqKilJcXJzKy8svzGQAAIBmxWFZluXuJuocOHBAnTp1UnZ2tq677jpVVFSoY8eOWrZsmX71q19Jknbv3q2IiAjl5eVpyJAh+uijj/SLX/xC+/fvV3BwsCRp8eLFmjFjhg4cOCBfX1/NmDFDq1evVmFhof1Z48aN0+HDh5Weni5JGjx4sAYOHKiXXnpJklRbW6uwsDBNmTJFM2fOPK3XqqoqVVVV2ctOp1NhYWGqqKhQQEBAk88Nl1Lg7kspHIPgGIS7nY9j0Ol0KjAw0Oj/b4+6sbyiokKS1K5dO0lSfn6+jh8/rtjYWLumd+/euvTSS5WXlydJysvLU9++fe0AJUlxcXFyOp3auXOnXXPyNupq6rZRXV2t/Px8lxovLy/FxsbaNadasGCBAgMD7VdYWFhjdx8AADQjHhOiamtrNW3aNF199dW64oorJEmlpaXy9fVVUFCQS21wcLBKS0vtmpMDVN143djZapxOp3744Qd99913qqmpqbembhunmjVrlioqKuzX119/bbbjAACgWfKYn85LSkpSYWGhPv30U3e3ck78/Pzk5+fn7jYAAICbeMSZqOTkZK1atUrr169Xly5d7PUhISGqrq7W4cOHXerLysoUEhJi15z603p1yz9WExAQoJYtW6pDhw7y9vaut6ZuGwAAACdza4iyLEvJyclasWKF1q1bp+7du7uMR0dHy8fHR5mZmfa6PXv2qLi4WDExMZKkmJgY7dixw+Wn6DIyMhQQEKDIyEi75uRt1NXUbcPX11fR0dEuNbW1tcrMzLRrAAAATubWy3lJSUlatmyZ3n//fbVt29a+/ygwMFAtW7ZUYGCgEhMTlZKSonbt2ikgIEBTpkxRTEyMhgwZIkkaOXKkIiMjNXHiRD311FMqLS3V7NmzlZSUZF9ue+CBB/TSSy/pkUce0T333KN169bprbfe0urV//5Jj5SUFCUkJGjAgAEaNGiQnn/+eVVWVuruu+++8BMDAAA8nltD1CuvvCJJGjZsmMv6pUuX6q677pIkLVy4UF5eXho7dqyqqqoUFxenl19+2a719vbWqlWrNHnyZMXExKh169ZKSEjQE088Ydd0795dq1ev1vTp07Vo0SJ16dJFr732muLi4uya22+/XQcOHNCcOXNUWlqqfv36KT09/bSbzQEAACQPe05Uc9aY50ycC56PAp7RA3fjGIS78ZwoAACAnwBCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAG3hqicnBzddNNNCg0NlcPh0MqVK13G77rrLjkcDpfXqFGjXGoOHjyoCRMmKCAgQEFBQUpMTNTRo0ddarZv365rr71W/v7+CgsL01NPPXVaL2+//bZ69+4tf39/9e3bVx9++GGT7y8AAPjpcGuIqqysVFRUlFJTU89YM2rUKJWUlNivv/71ry7jEyZM0M6dO5WRkaFVq1YpJydH9913nz3udDo1cuRIde3aVfn5+Xr66ac1b948vfrqq3ZNbm6uxo8fr8TERG3btk3x8fGKj49XYWFh0+80AAD4SWjhzg+/8cYbdeONN561xs/PTyEhIfWO7dq1S+np6frss880YMAASdKLL76o0aNH65lnnlFoaKjeeOMNVVdXa8mSJfL19VWfPn1UUFCg5557zg5bixYt0qhRo/Twww9LkubPn6+MjAy99NJLWrx4cb2fXVVVpaqqKnvZ6XQ2eP8BAEDz5fH3RGVlZalTp04KDw/X5MmT9f3339tjeXl5CgoKsgOUJMXGxsrLy0ubNm2ya6677jr5+vraNXFxcdqzZ48OHTpk18TGxrp8blxcnPLy8s7Y14IFCxQYGGi/wsLCmmR/AQBA8+DRIWrUqFH6n//5H2VmZurJJ59Udna2brzxRtXU1EiSSktL1alTJ5f3tGjRQu3atVNpaaldExwc7FJTt/xjNXXj9Zk1a5YqKirs19dff924nQUAAM2KWy/n/Zhx48bZf+7bt6+uvPJKXXbZZcrKytKIESPc2Nm/LjP6+fm5tQcAAOA+Hn0m6lQ9evRQhw4dtHfvXklSSEiIysvLXWpOnDihgwcP2vdRhYSEqKyszKWmbvnHas50LxYAAECzClHffPONvv/+e3Xu3FmSFBMTo8OHDys/P9+uWbdunWprazV48GC7JicnR8ePH7drMjIyFB4erksuucSuyczMdPmsjIwMxcTEnO9dAgAAzZRbQ9TRo0dVUFCggoICSVJRUZEKCgpUXFyso0eP6uGHH9bGjRu1b98+ZWZm6uabb1b
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"rating_stars\n",
"1 22234\n",
"2 20308\n",
"3 22257\n",
"Name: count, dtype: int64\n"
]
}
],
"source": [
"from imblearn.over_sampling import ADASYN\n",
"ada = ADASYN(n_neighbors=3)\n",
"#ada = ADASYN()\n",
"\n",
"\n",
"#Преобразование нечисленных значений к численным для возиожности работы с oversampling\n",
"train_data['date_release'] = pd.to_datetime(df1['date_release']).astype('int64')/ 10**9\n",
"train_data['mac'] = train_data[\"mac\"].astype(int)\n",
"train_data['win'] = train_data[\"mac\"].astype(int)\n",
"train_data['linux'] = train_data[\"linux\"].astype(int)\n",
"\n",
"X_resampled, y_resampled = ada.fit_resample(train_data, train_data[\"rating_stars\"])\n",
"train_data_adasyn = pd.DataFrame(X_resampled)\n",
"\n",
"\n",
"rating_counts_adasyn = train_data_adasyn['rating_stars'].value_counts().sort_index()\n",
"\n",
"plt.bar(rating_counts_adasyn.index, rating_counts_adasyn.values)\n",
"plt.xlabel('Rating Stars')\n",
"plt.ylabel('Count')\n",
"plt.show()\n",
"\n",
"print(train_data_adasyn[\"rating_stars\"].value_counts().sort_index())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## **ДАТАСЕТ 2**\n",
"\n",
"https://www.kaggle.com/datasets/dewangmoghe/mobile-phone-price-prediction\n",
"\n",
"\n",
"Проблемная область: рынок мобильных телефонов\n",
"\n",
"Объекты наблюдения: мобильные телефоны\n",
"\n",
"Атрибуты объектов:\n",
"* Name: Название\n",
"\n",
"* Rating: оценка телефона (от 0 до 5).\n",
"\n",
"* Spec_score: оценка телефона на основе его основных характеристик (от 0 до 100)\n",
"\n",
"* No_of_sim: поддерживает ли телефон две SIM-карты, 3G, 4G, 5G, Volte\n",
"\n",
"* RAM: кол-во оперативной памяти\n",
"\n",
"* Battery: хар-ки аккумулятора\n",
"\n",
"* Display: размере экрана телефона\n",
"\n",
"* Camera: хар-ки передней и задней камерах\n",
"\n",
"* External_Memory: поддерживает ли внешнюю память и сколько\n",
"\n",
"* Android_version: версия Android телефона\n",
"\n",
"* Price: цена\n",
"\n",
"* Company: компания, которой принадлежит телефон\n",
"\n",
"* Inbuilt_memory: встроенная память телефона\n",
"\n",
"* fast_charging: поддерживает ли быструю зарядку или нет и насколько ватт.\n",
"\n",
"* Screen_resolution: разрешение экрана\n",
"\n",
"* Processor: описание процессора\n",
"\n",
"* Processor_name: название процессора\n",
"\n",
"Связи между объектами:\n",
"Между ценой телефона и его другими хар-ками (чем лучше хар-ки, тем дороже должен быть телефон)\n",
"\n",
"Бизнес-цель: помочь производителям и продавцам определить оптимальную цену для новых телефонов на основе конкурентов.\n",
"Эффект для бизнеса: Улучшение конкурентоспособности на рынке, потенциальное увеличение прибыли\n",
"\n",
"Цель технического проекта: создать модель машинного обучения, которая будет предсказывать цену мобильного телефона на основе его характеристик.\n",
"Входные данные: Характеристики мобильных телефонов (хар-ки аккумулятора, камеры, процессор и т.д.).\n",
"Целевой признак: цена"
]
},
{
"cell_type": "code",
"execution_count": 303,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['Unnamed: 0', 'Name', 'Rating', 'Spec_score', 'No_of_sim', 'Ram',\n",
" 'Battery', 'Display', 'Camera', 'External_Memory', 'Android_version',\n",
" 'Price', 'company', 'Inbuilt_memory', 'fast_charging',\n",
" 'Screen_resolution', 'Processor', 'Processor_name'],\n",
" dtype='object')\n",
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 1370 entries, 0 to 1369\n",
"Data columns (total 18 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Unnamed: 0 1370 non-null int64 \n",
" 1 Name 1370 non-null object \n",
" 2 Rating 1370 non-null float64\n",
" 3 Spec_score 1370 non-null int64 \n",
" 4 No_of_sim 1370 non-null object \n",
" 5 Ram 1370 non-null object \n",
" 6 Battery 1370 non-null object \n",
" 7 Display 1370 non-null object \n",
" 8 Camera 1370 non-null object \n",
" 9 External_Memory 1370 non-null object \n",
" 10 Android_version 927 non-null object \n",
" 11 Price 1370 non-null object \n",
" 12 company 1370 non-null object \n",
" 13 Inbuilt_memory 1351 non-null object \n",
" 14 fast_charging 1281 non-null object \n",
" 15 Screen_resolution 1368 non-null object \n",
" 16 Processor 1342 non-null object \n",
" 17 Processor_name 1370 non-null object \n",
"dtypes: float64(1), int64(2), object(15)\n",
"memory usage: 192.8+ KB\n",
"None\n"
]
}
],
"source": [
"df2 = pd.read_csv(\"..//static//csv//mobiles.csv\")\n",
"print(df2.columns)\n",
"print(df2.info())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В столбце Ram есть шум в виде значений, которые явно не относятся к значению оперативной памяти ('Helio G90T', '128 GB inbuilt' '6000 mAh Battery with 22.5W Fast Charging'\n",
"'256 GB inbuilt' '512 GB inbuilt'). Строки с этими значениями можно удалить, т.к. у них значения съехали с других столбцов, а значит и в другом столбце будет неверное значение. \n",
"\n",
"Также было обнаружено, что не все цены указаны верно, т.к. у некоторых значений было 2 запятые. Для преобразования значений в числа запятые были заменены на точки, а в строках, где стало 2 точки, первая точка удалена.\n",
"\n",
"Актуальность данных проверить нельзя, т.к. в датасете нет даты релиза смартфона\n",
"\n",
"Покрытие данных очень хорошее, т.к. представлено большое количество смартфон разной ценовой категории"
]
},
{
"cell_type": "code",
"execution_count": 304,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 0 1 2 ... 1367 1368 1369]\n",
"['Samsung Galaxy F14 5G' 'Samsung Galaxy A11' 'Samsung Galaxy A13' ...\n",
" 'TCL 50 XE NxtPaper 5G' 'TCL 40 NxtPaper 5G' 'TCL Trifold']\n",
"[4.65 4.2 4.3 4.1 4.4 4.05 4.5 4.25 4.75 4.15 4.35 4.45 4.6 4.\n",
" 4.55 4.7 3.95 3.75 3.9 3.85]\n",
"[68 63 75 73 69 76 71 85 78 72 74 79 80 62 81 82 87 86 88 84 83 89 91 90\n",
" 96 93 92 95 65 59 42 67 60 61 54 66 70 51 64 53 77 94 98 97 58 57 49 46\n",
" 56 55]\n",
"['Dual Sim, 3G, 4G, 5G, VoLTE, ' 'Dual Sim, 3G, 4G, VoLTE, '\n",
" 'Dual Sim, 3G, 4G, 5G, VoLTE, Vo5G, ' 'Single Sim, 3G, 4G, 5G, VoLTE, '\n",
" 'Dual Sim, 3G, 4G, ' 'Single Sim, 3G, 4G, VoLTE, ' 'No Sim Supported, '\n",
" 'Single Sim, 3G, 4G, 5G, VoLTE, Vo5G, ' 'Dual Sim, 3G, VoLTE, ']\n",
"['4 GB RAM' '2 GB RAM' '6 GB RAM' '8 GB RAM' '12 GB RAM' '1 GB RAM'\n",
" '3 GB RAM' '16 GB RAM' 'Helio G90T' '24 GB RAM' '18 GB RAM' '1.5 GB RAM'\n",
" '128 GB inbuilt' '6000 mAh Battery with 22.5W Fast Charging'\n",
" '256 GB inbuilt' '512 GB inbuilt']\n",
"['6000 mAh Battery ' '4000 mAh Battery ' '5000 mAh Battery '\n",
" '6000 mAh Battery' '3500 mAh Battery' '4500 mAh Battery '\n",
" '3400 mAh Battery ' '3300 mAh Battery ' '4050 mAh Battery '\n",
" '3900 mAh Battery ' '4300 mAh Battery ' '4800 mAh Battery '\n",
" '4200 mAh Battery ' '3700 mAh Battery ' '4400 mAh Battery '\n",
" '3500 mAh Battery ' '4320 mAh Battery ' '4030 mAh Battery'\n",
" '1900 mAh Battery' '5000 mAh Battery' '2650 mAh Battery'\n",
" '3000 mAh Battery' '4600 mAh Battery ' '4100 mAh Battery '\n",
" '5500 mAh Battery ' '4830 mAh Battery ' '4700 mAh Battery '\n",
" '4810 mAh Battery ' '5100 mAh Battery ' '5400 mAh Battery '\n",
" '4870 mAh Battery ' '5700 mAh Battery ' '4730 mAh Battery '\n",
" '5100 mAh Battery' '6 GB RAM, 64 GB inbuilt' '5200 mAh Battery '\n",
" '5240 mAh Battery ' '5050 mAh Battery ' '4310 mAh Battery '\n",
" '4350 mAh Battery ' '4880 mAh Battery ' '4520 mAh Battery '\n",
" '4260 mAh Battery ' '4820 mAh Battery ' '4805 mAh Battery '\n",
" '5160 mAh Battery ' '5080 mAh Battery ' '5065 mAh Battery '\n",
" '10500 mAh Battery ' '5200 mAh Battery' '5800 mAh Battery '\n",
" '5300 mAh Battery ' '5450 mAh Battery ' '5600 mAh Battery '\n",
" '3000 mAh Battery ' '2800 mAh Battery ' '4620 mAh Battery '\n",
" '4385 mAh Battery ' '4410 mAh Battery ' '4355 mAh Battery '\n",
" '4492 mAh Battery ' '4575 mAh Battery ' '5003 mAh Battery '\n",
" '4821 mAh Battery ' '4000 mAh Battery' '7000 mAh Battery '\n",
" '3900 mAh Battery' '3760 mAh Battery ' '2600 mAh Battery'\n",
" '4900 mAh Battery ' '4020 mAh Battery ' '4450 mAh Battery '\n",
" '4610 mAh Battery ' '3800 mAh Battery ' '3440 mAh Battery '\n",
" '2510 mAh Battery ' '6100 mAh Battery ' '2100 mAh Battery'\n",
" '4030 mAh Battery ' '5020 mAh Battery ' '4980 mAh Battery '\n",
" '4250 mAh Battery ' '6.75 inches, 720 x 1600 px Display '\n",
" '4460 mAh Battery ' '4815 mAh Battery ' '4750 mAh Battery '\n",
" '5330 mAh Battery ' '5010 mAh Battery ' '4500 mAh Battery']\n",
"['6.6 inches' '6.4 inches' '6.5 inches' '6.1 inches' '6.7 inches'\n",
" '6.21 inches' '6.67 inches' '6.58 inches' '6.71 inches' '6.78 inches'\n",
" '6.8 inches' '6.56 inches' '6.3 inches' '7.45 inches' '6.2 inches'\n",
" '8.2 inches' '7.6 inches' '8 inches' '7.63 inches' '6.22 inches'\n",
" '4.5 inches' '6.51 inches' '6.53 inches' '6.35 inches' '6.55 inches'\n",
" '6.64 inches' '5.2 inches' '5.5 inches' '6.72 inches' '6.44 inches'\n",
" '6.82 inches' '6.68 inches' '7 inches' '6.74 inches' '8.03 inches'\n",
" '8.02 inches' '7.8 inches' '6.52 inches' '6.59 inches' '6.43 inches'\n",
" '4300 mAh Battery with 30W Fast Charging' '6.62 inches' '6.57 inches'\n",
" '6.73 inches' '6.83 inches' '7.1 inches' '7.4 inches' '7.56 inches'\n",
" '7.82 inches' '6.38 inches' '6.79 inches' '6.61 inches' '6.69 inches'\n",
" '12.1 inches' '6.77 inches' '6.75 inches' '6.81 inches' '7.2 inches'\n",
" '7.71 inches' '7.92 inches' '6.76 inches' '7.9 inches' '5.6 inches'\n",
" '5.7 inches' '6.34 inches' '6.14 inches' '6.03 inches' '8.3 inches'\n",
" '5.9 inches' '5.92 inches' '6 inches' '6.26 inches' '6.09 inches'\n",
" '5.99 inches' '6.92 inches' '5 inches' '6.45 inches' '6.9 inches'\n",
" '6.47 inches' '6.28 inches' '6.49 inches' '6.08 inches' '7.85 inches'\n",
" '7.11 inches' '6.95 inches'\n",
" '48 MP + 5 MP + 2 MP Triple Rear &amp; 8 MP Front Camera' '6.94 inches'\n",
" '7.09 inches' '10 inches']\n",
"['50 MP + 2 MP Dual Rear &amp; 13 MP Front Camera'\n",
" '13 MP + 5 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '50 MP Quad Rear &amp; 8 MP Front Camera'\n",
" '48 MP Quad Rear &amp; 13 MP Front Camera'\n",
" '13 MP + 2 MP + 2 MP Triple Rear &amp; 5 MP Front Camera'\n",
" '50 MP + 2 MP Dual Rear &amp; 5 MP Front Camera'\n",
" '48 MP + 8 MP + 5 MP Triple Rear &amp; 20 MP Front Camera'\n",
" '48 MP Quad Rear &amp; 8 MP Front Camera'\n",
" '50 MP + 2 MP + 2 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '50 MP + 5 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '50 MP + 8 MP + 2 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '50 MP + 8 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '48 MP + 8 MP + 5 MP Triple Rear &amp; 25 MP Front Camera'\n",
" '50 MP + 13 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '64 MP Quad Rear &amp; 20 MP Front Camera'\n",
" '64 MP + 8 MP + 5 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '13 MP + 2 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 2 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 5 MP + 2 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '64 MP + 8 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '12 MP + 12 MP + 8 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 8 MP + 5 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '48 MP Quad Rear &amp; 32 MP Front Camera'\n",
" '64 MP Quad Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 8 MP + 2 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '64 MP + 12 MP + 5 MP Triple Rear &amp; 10 MP Front Camera'\n",
" '24 MP + 10 MP + 5 MP Triple Rear &amp; 24 MP Front Camera'\n",
" '50 MP + 12 MP + 5 MP Triple Rear &amp; 32 MP Front Camera'\n",
" 'Foldable Display, Dual Display'\n",
" '108 MP + 8 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 8 MP Dual Rear &amp; 13 MP Front Camera'\n",
" '50 MP + 12 MP + 8 MP Triple Rear &amp; 10 MP Front Camera'\n",
" '108 MP Quad Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 12 MP + 10 MP Triple Rear &amp; 12 MP Front Camera'\n",
" '12 MP Quad Rear &amp; 10 MP Front Camera'\n",
" '64 MP + 12 MP + 12 MP Triple Rear &amp; 10 MP Front Camera'\n",
" '48 MP + 12 MP + 5 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '25 MP + 8 MP Dual Rear &amp; 13 MP Front Camera'\n",
" '50 MP + 12 MP + 12 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 12 MP + 10 MP Triple Rear &amp; 10 MP Front Camera'\n",
" '48 MP + 8 MP + 5 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '108 MP Quad Rear &amp; 12 MP + 8 MP Dual Front Camera'\n",
" '12 MP + 12 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '200 MP Quad Rear &amp; 12 MP Front Camera'\n",
" '108 MP Quad Rear &amp; 40 MP Front Camera'\n",
" '13 MP + 0.08 MP Dual Rear &amp; 5 MP Front Camera'\n",
" '13 MP + 2 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '5 MP Rear &amp; 2 MP Front Camera' '8 MP Rear &amp; 5 MP Front Camera'\n",
" '13 MP Rear &amp; 5 MP Front Camera'\n",
" '50 MP + 0.08 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '13 MP + 2 MP Dual Rear &amp; 5 MP Front Camera'\n",
" '16 MP + 8 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '13 MP + 2 MP Dual Rear &amp; 13 MP Front Camera'\n",
" '50 MP + 2 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '13 MP + 8 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '13 MP + 2 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '50 MP + 2 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '13 MP Rear &amp; 16 MP Front Camera'\n",
" '16 MP + 8 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '64 MP + 2 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 8 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '64 MP + 2 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 8 MP Dual Rear &amp; 50 MP Front Camera'\n",
" '108 MP + 8 MP + 2 MP Triple Rear &amp; 50 MP + 8 MP Dual Front Camera'\n",
" '64 MP + 8 MP + 2 MP Triple Rear &amp; 44 MP Front Camera'\n",
" '50 MP + 13 MP + 2 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '64 MP + 8 MP + 2 MP Triple Rear &amp; 44 MP + 8 MP Dual Front Camera'\n",
" '64 MP + 8 MP + 2 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '50 MP + 8 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 50 MP Dual Rear &amp; 50 MP Front Camera'\n",
" '64 MP + 8 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '64 MP + 2 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 50 MP + 50 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '108 MP + 64 MP + 50 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '50 MP + 13 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '108 MP + 8 MP + 2 MP Triple Rear &amp; 44 MP + 8 MP Dual Front Camera'\n",
" '50 MP + 12 MP + 8 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '64 MP + 50 MP + 50 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '200 MP + 8 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 13 MP + 8 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 13 MP + 13 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '200 MP + 64 MP + 50 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '50 MP + 12 MP + 12 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '64 MP + 50 MP + 50 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '108 MP + 8 MP + 2 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '50 MP Quad Rear &amp; 50 MP Front Camera'\n",
" '50 MP Quad Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 50 MP + 50 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '108 MP + 32 MP + 12 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50.3 MP + 50 MP + 50 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP Quad Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 50 MP + 12 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '200 MP + 50 MP + 50 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '200 MP + 50 MP + 50 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '200 MP Quad Rear &amp; 60 MP Front Camera'\n",
" '50.3 MP Quad Rear &amp; 32 MP Front Camera'\n",
" '12 MP Quad Rear &amp; 13 MP Front Camera'\n",
" '50 MP + Depth Sensor Dual Rear &amp; 8 MP Front Camera'\n",
" '50 MP + 2 MP + 0.3 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '50 MP + 0.08 MP Dual Rear &amp; 5 MP Front Camera'\n",
" '50 MP + 0.3 MP Dual Rear &amp; 5 MP Front Camera'\n",
" '50 MP + 2 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '64 MP + 2 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '48 MP + 2 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '13 MP Quad Rear &amp; 8 MP Front Camera'\n",
" '108 MP + 2 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '50 MP Rear &amp; 8 MP Front Camera'\n",
" '48 MP + 8 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '64 MP Quad Rear &amp; 16 MP Front Camera'\n",
" '108 MP + 2 MP Dual Rear Camera'\n",
" '13 MP + Depth Sensor Dual Rear &amp; 5 MP Front Camera'\n",
" '13 MP + 0.3 MP Dual Rear &amp; 5 MP Front Camera'\n",
" '48 MP + 2 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '13 MP + 8 MP Dual Rear &amp; 5 MP Front Camera'\n",
" '6.5 inches, 1080 x 2400 px, 90 Hz Display with Punch Hole'\n",
" '108 MP + 2 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '108 MP + 8 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 8 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '108 MP Quad Rear &amp; 16 MP Front Camera'\n",
" '64 MP + 5 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '64 MP Quad Rear &amp; 32 MP + 8 MP Dual Front Camera'\n",
" '50 MP + 50 MP + 32 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 32 MP + 8 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '64 MP + 50 MP + 8 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '64 MP + 13 MP + 13 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 50 MP + 8 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 50 MP + 3 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '200 MP + 64 MP + 32 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '200 MP + 64 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 50 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 2 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 48 MP + 32 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '108 MP + 13 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '100 MP + 2 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '48 MP Quad Rear &amp; 16 MP Front Camera'\n",
" '16 MP Quad Rear &amp; 16 MP Front Camera'\n",
" '64 MP + 32 MP + 8 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 32 MP + 8 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '108 MP + 2 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 50 MP + 32 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '108 MP + 5 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 32 MP + 32 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 2 MP + Ultra Wide Triple Rear &amp; 32 MP Front Camera'\n",
" '48 MP + 13 MP + 12 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 50 MP + 13 MP Triple Rear &amp; 32 MP Front Camera'\n",
" 'Dual Display'\n",
" '48 MP + 48 MP + 13 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '64 MP + 2 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '64 MP + 12 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 8 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '48 MP + 13 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '108 MP + 13 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '64 MP + 8 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '64 MP + 13 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 50 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 13 MP + 13 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '48 MP + 13 MP + 13 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 13 MP + 8 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 50 MP + 12 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '64 MP + 50 MP + 50 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 50 MP + 14.6 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 50 MP + 13 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '108 MP + 50 MP + 32 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 50 MP + 16 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '200 MP + 50 MP + 50 MP Triple Rear &amp; 60 MP Front Camera'\n",
" '8 MP + 0.08 MP Dual Rear &amp; 5 MP Front Camera'\n",
" '50 MP Dual Rear &amp; 5 MP Front Camera'\n",
" '8 MP Dual Rear &amp; 5 MP Front Camera'\n",
" '50 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '13 MP Rear &amp; 8 MP Front Camera'\n",
" '50 MP Dual Rear &amp; 13 MP Front Camera'\n",
" '64 MP + 8 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '48 MP + 8 MP + 2 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '64 MP Quad Rear &amp; 13 MP Front Camera'\n",
" '64 MP Quad Rear &amp; 20 MP + 2 MP Dual Front Camera'\n",
" '64 MP + 8 MP + 2 MP Triple Rear &amp; 20 MP Front Camera'\n",
" '64 MP + 8 MP + 5 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '8 MP Rear &amp; 8 MP Front Camera'\n",
" '50 MP + 8 MP + 5 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '48 MP + 2 MP + Depth Sensor Triple Rear &amp; 8 MP Front Camera'\n",
" '50 MP + 2 MP + 2 MP Triple Rear &amp; 5 MP Front Camera'\n",
" '108 MP + 2 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '108 MP + 5 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '100 MP + 5 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 13 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '108 MP + 5 MP + 2 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '50 MP + 12 MP Dual Rear &amp; 50 MP Front Camera'\n",
" '50 MP + 12 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '64 MP + 5 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '200 MP + 12 MP + 2 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '54 MP + 50 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '160 MP + 8 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 32 MP + 12 MP Triple Rear &amp; 50 MP + 2 MP Dual Front Camera'\n",
" '54 MP + 8 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '108 MP + 5 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '54 MP + 50 MP + 8 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '160 MP + 50 MP + 2 MP Triple Rear &amp; 50 MP + 2 MP Dual Front Camera'\n",
" '108 MP + 32 MP + 12 MP Triple Rear &amp; 50 MP + 2 MP Dual Front Camera'\n",
" '40 MP + 12 MP + 8 MP Triple Rear &amp; 32 MP + 8 MP Dual Front Camera'\n",
" '50 MP + 16 MP + 8 MP Triple Rear &amp; 32 MP + 8 MP Dual Front Camera'\n",
" '200 MP + 50 MP + 8 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '200 MP + 32 MP + 12 MP Triple Rear &amp; 50 MP + 2 MP Dual Front Camera'\n",
" '180 MP + 50 MP + 50 MP Triple Rear &amp; 50 MP Dual Front Camera'\n",
" '50 MP Quad Rear &amp; 12 MP + TOF 3D Dual Front Camera'\n",
" '54 MP Quad Rear &amp; 12 MP Front Camera'\n",
" '50 MP Quad Rear &amp; 12 MP Dual Front Camera'\n",
" '50 MP Penta Rear &amp; 12 MP + Depth Sensor Dual Front Camera'\n",
" '50 MP Quad Rear &amp; 13 MP Dual Front Camera'\n",
" '50 MP + 50 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '64 MP + 50 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '64 MP + 50 MP + 32 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '12.2 MP Rear &amp; 8 MP Front Camera'\n",
" '16 MP + 12.2 MP Dual Rear &amp; 8 MP + TOF 3D Dual Front Camera'\n",
" '16 MP + 12.2 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '64 MP + 13 MP Dual Rear &amp; 13 MP Front Camera'\n",
" '12.2 MP + 12 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '50 MP + 12 MP Dual Rear &amp; 10.8 MP Front Camera'\n",
" '108 MP + 13 MP Dual Rear &amp; 13 MP Front Camera'\n",
" '50 MP + 8 MP Dual Rear &amp; 12 MP Front Camera'\n",
" '50 MP + 48 MP + 12 MP Triple Rear &amp; 10.8 MP Front Camera'\n",
" '50 MP + 12 MP Dual Rear &amp; 10.5 MP Front Camera'\n",
" '16 MP + 16 MP + 12 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '64 MP + 48 MP + 12 MP Triple Rear &amp; 10.8 MP Front Camera'\n",
" '50 MP + 48 MP + 48 MP Triple Rear &amp; 10.5 MP Front Camera'\n",
" '50 MP + 48 MP + 12 MP Triple Rear &amp; 12 MP Front Camera'\n",
" '8 MP + 2 MP + 0.3 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '13 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '50 MP + 12 MP + 5 MP Triple Rear &amp; 12 MP Front Camera'\n",
" '64 MP + 13 MP + 5 MP Triple Rear &amp; 24 MP Front Camera'\n",
" '50 MP + 12 MP Dual Rear &amp; 12 MP Front Camera'\n",
" '50 MP + 13 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 13 MP + 5 MP Triple Rear &amp; 12 MP Front Camera'\n",
" '50 MP + 13 MP + 5 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 32 MP + 13 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '108 MP + 13 MP + 5 MP Triple Rear &amp; 12 MP Front Camera'\n",
" '48 MP + 8 MP + 5 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '13 MP + 5 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '16 MP + 2 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '64 MP + 13 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '64 MP + 16 MP Dual Rear &amp; 44 MP Front Camera'\n",
" '64 MP + 16 MP Dual Rear &amp; 20 MP Front Camera'\n",
" '16 MP Rear &amp; 13 MP Front Camera'\n",
" '13 MP Dual Rear &amp; 5 MP Front Camera'\n",
" '16 MP + 5 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '48 MP + 2 MP Dual Rear &amp; 5 MP Front Camera'\n",
" '64 MP Quad Rear &amp; 50 MP Front Camera'\n",
" '64 MP + 13 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 5 MP + 2 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '48 MP + 8 MP + 5 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '64 MP + 12 MP + 5 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '200 MP + 12 MP + 12 MP Triple Rear &amp; 60 MP Front Camera'\n",
" '108 MP + 12 MP + 12 MP Triple Rear &amp; 12 MP Front Camera'\n",
" '13 MP + 8 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '108 MP + 2 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '64 MP + 8 MP Dual Rear &amp; 50 MP Front Camera'\n",
" '50 MP + 50 MP + 12 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '100 MP + 2 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 50 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '64 MP + 50 MP + 3 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '16 MP Rear &amp; 5 MP Front Camera'\n",
" '16 MP + 5 MP Dual Rear &amp; 12 MP Front Camera'\n",
" '50 MP + Macro Dual Rear &amp; 8 MP Front Camera'\n",
" '48 MP + 2 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '50 MP Rear &amp; 5 MP Front Camera'\n",
" '16 MP + 2 MP + 2 MP Triple Rear &amp; 5 MP Front Camera'\n",
" '16 MP + 2 MP Dual Rear &amp; 5 MP Front Camera'\n",
" '16 MP + 2 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '64 MP + 13 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 5 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '108 MP + 13 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '108 MP + 16 MP + 8 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '48 MP + 8 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 13 MP + 10 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '50 MP + 50 MP + 2 MP Triple Rear &amp; 60 MP Front Camera'\n",
" '50 MP + 50 MP + 12 MP Triple Rear &amp; 60 MP Front Camera'\n",
" '108 MP + 13 MP + 5 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '108 MP + 16 MP + 8 MP Triple Rear &amp; 25 MP Front Camera'\n",
" '50 MP + 13 MP + 2 MP Triple Rear &amp; 32 MP + 16 MP Dual Front Camera'\n",
" '200 MP + 50 MP + 12 MP Triple Rear &amp; 60 MP Front Camera'\n",
" '50 MP + 50 MP + 2 MP Triple Rear &amp; 60 MP + 60 MP Triple Front Camera'\n",
" '64 MP + 16 MP + 2 MP Triple Rear &amp; 16 MP + 8 MP Dual Front Camera'\n",
" '50 MP + 50 MP + 50 MP Triple Rear &amp; 60 MP Front Camera'\n",
" '100 MP + 50 MP + 50 MP Triple Rear &amp; 50 MP Front Camera'\n",
" '200 MP + 50 MP + 2 MP Triple Rear &amp; 60 MP Front Camera'\n",
" '108 MP + 2 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '13 MP + 2 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '48 MP + 16 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '48 MP + 2 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 16 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '48 MP + 16 MP + 12 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 48 MP + 8 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 48 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 48 MP + 32 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '64 MP + 50 MP + 48 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 13 MP + 5 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '108 MP + 50 MP + 48 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '200 MP + 50 MP + 48 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '48 MP + 16 MP + 8 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 2 MP + 0.08 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '8 MP + Depth Sensor Dual Rear &amp; 5 MP Front Camera'\n",
" '50 MP + Depth Sensor Dual Rear &amp; 5 MP Front Camera'\n",
" '13 MP Rear Camera' '50 MP Quad Rear &amp; 13 MP Front Camera'\n",
" '48 MP + 8 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 8 MP Dual Rear &amp; 20 MP Front Camera'\n",
" '200 MP + 8 MP + 2 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '64 MP + 8 MP + 5 MP Triple Rear &amp; 20 MP Front Camera'\n",
" 'Foldable Display' '50 MP + 8 MP Dual Rear &amp; 60 MP Front Camera'\n",
" 'Memory Card (Hybrid)' '50 MP + 2 MP Triple Rear &amp; 5 MP Front Camera'\n",
" '100 MP + 8 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '48 MP + 5 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '50 MP + 8 MP + 2 MP Triple Rear &amp; 60 MP Front Camera'\n",
" '108 MP + 8 MP Dual Rear &amp; 60 MP + 8 MP Dual Front Camera'\n",
" '50 MP + 8 MP Dual Rear &amp; 60 MP + 8 MP Dual Front Camera'\n",
" '50 MP + 8 MP + 2 MP Triple Rear &amp; 60 MP + 8 MP Dual Front Camera'\n",
" '50 MP + 13 MP Dual Rear &amp; 13 MP Front Camera'\n",
" '48 MP + 13 MP + 12 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '50 MP + 13 MP + 12 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '50 MP + 50 MP + 40 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '50 MP + 48 MP + 12.5 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '48 MP + 48 MP + 13 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '50 MP + 50 MP + 50 MP Triple Rear &amp; 16 MP Dual Front Camera'\n",
" '40 MP Quad Rear &amp; 32 MP Dual Front Camera'\n",
" '50 MP + 48 MP + 12 MP Triple Rear &amp; 13 MP Dual Front Camera'\n",
" '64 MP + 50 MP + 13 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '50 MP + 32 MP + 12 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '48 MP + 48 MP + 40 MP Triple Rear &amp; 13 MP Front Camera'\n",
" '50 MP + 20 MP + 12 MP Triple Rear &amp; 13 MP Dual Front Camera'\n",
" '50 MP Penta Rear &amp; 13 MP Dual Front Camera'\n",
" '50 MP Quad Rear &amp; 32 MP Dual Front Camera'\n",
" '108 MP + 13 MP + 12 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + Depth Sensor Triple Rear &amp; 5 MP Front Camera'\n",
" '100 MP + 2 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '108 MP + 0.08 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '50 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 0.08 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '13 MP Triple Rear &amp; 5 MP Front Camera'\n",
" '48 MP + 0.08 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '50 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '13 MP + 2 MP Triple Rear &amp; 8 MP Front Camera'\n",
" '50 MP + 5 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '50 MP + 5 MP + 2 MP Triple Rear &amp; 32 MP Front Camera']\n",
"['Memory Card Supported, upto 1 TB' 'Memory Card Supported, upto 512 GB'\n",
" 'Memory Card Supported' 'Memory Card (Hybrid), upto 1 TB'\n",
" 'Memory Card Not Supported' 'Memory Card (Hybrid)'\n",
" '12 MP + 12 MP Dual Rear &amp; 10 MP Front Camera' 'Android v13'\n",
" 'Android v10' 'Android v12' 'Memory Card (Hybrid), upto 512 GB'\n",
" '50 MP + 12 MP + 5 MP Triple Rear &amp; 10 MP + 4 MP Dual Front Camera'\n",
" '200 MP Quad Rear &amp; 12 MP + 12 MP Dual Front Camera'\n",
" '50 MP + 12 MP + 10 MP Triple Rear &amp; 10 MP + 4 MP Dual Front Camera'\n",
" '50 MP + 12 MP + 10 MP Triple Rear &amp; 12 MP + 12 MP Dual Front Camera'\n",
" 'Memory Card Supported, upto 256 GB' 'Memory Card Supported, upto 128 GB'\n",
" 'Android v11' 'Android v15' 'Android v14'\n",
" '50 MP + 12 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 50 MP + 50 MP Triple Rear &amp; 32 MP + 32 MP Dual Front Camera'\n",
" '200 MP + 12 MP + 12 MP Triple Rear &amp; 32 MP + 32 MP Dual Front Camera'\n",
" '50 MP Quad Rear &amp; 16 MP Front Camera'\n",
" '64 MP + 50 MP + 50 MP Triple Rear &amp; 32 MP + 32 MP Dual Front Camera'\n",
" '50 MP Quad Rear &amp; 16 MP + 16 MP Dual Front Camera'\n",
" '48 MP + 48 MP + 10 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '200 MP + 12 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 12 MP + 12 MP Triple Rear &amp; 16 MP + 16 MP Dual Front Camera'\n",
" '64 MP + 12 MP + 12 MP Triple Rear &amp; 32 MP Front Camera'\n",
" 'Memory Card Supported, upto 2 TB' 'Memory Card (Hybrid), upto 2 TB'\n",
" '48 MP Quad Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 48 MP + 32 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 8 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 48 MP + 32 MP Triple Rear &amp; 32 MP + 32 MP Dual Front Camera'\n",
" '108 MP + 50 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '64 MP + 10 MP + 8 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '64 MP + 16 MP + 12 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '108 MP + 50 MP + 32 MP Triple Rear &amp; 32 MP + 32 MP Dual Front Camera'\n",
" '64 MP + 48 MP + 48 MP Triple Rear &amp; 32 MP + 20 MP Dual Front Camera'\n",
" 'Memory Card (Hybrid), upto 256 GB'\n",
" '50 MP + 12 MP Dual Rear &amp; 8 MP Front Camera'\n",
" '50 MP + 20 MP + 12 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 32 MP + 8 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 32 MP + 10 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '50 MP + 50 MP + 20 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '54 MP + 50 MP + 8 MP Triple Rear &amp; 16 MP Front Camera'\n",
" '108 MP + 8 MP + 5 MP Triple Rear &amp; 16 MP Front Camera'\n",
" 'Android v9.0 (Pie)' '48 MP + 12 MP Dual Rear &amp; 10 MP Front Camera'\n",
" '48 MP + 10.8 MP + 10.8 MP Triple Rear &amp; 9.5 MP + 8 MP Dual Front Camera'\n",
" '50 MP + 10.8 MP + 10.8 MP Triple Rear &amp; 12 MP + 12 MP Dual Front Camera'\n",
" 'Memory Card Supported, upto 32 GB'\n",
" '64 MP + 13 MP + 8 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 13 MP + 5 MP Triple Rear &amp; 12 MP Front Camera'\n",
" '64 MP + 13 MP + 0.3 MP Triple Rear &amp; 10 MP Front Camera'\n",
" '50 MP + 50 MP Dual Rear &amp; 16 MP Front Camera'\n",
" '64 MP + 13 MP Dual Rear &amp; 32 MP Front Camera' 'Android v10.0'\n",
" '64 MP + 13 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '13 MP + 12 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 50 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 13 MP Dual Rear &amp; 32 MP Front Camera'\n",
" '50 MP + 13 MP + 2 MP Triple Rear &amp; 32 MP Front Camera'\n",
" '16 MP Rear &amp; 5 MP Front Camera' 'Android v12.1' 'No FM Radio'\n",
" '50 MP + 50 MP + 13 MP Triple Rear &amp; 32 MP + 16 MP Dual Front Camera'\n",
" '50 MP + 50 MP + 32 MP Triple Rear &amp; 32 MP + 32 MP Dual Front Camera'\n",
" '50 MP Hexa Rear &amp; 32 MP Front Camera' 'Android' 'HarmonyOS v4'\n",
" 'EMUI v14' 'HarmonyOS v3.0' 'HarmonyOS' 'HarmonyOS v4.0' 'HarmonyOS v5.0'\n",
" 'HarmonyOS v2.0'\n",
" '48 MP + 20 MP + 13 MP Triple Rear &amp; 10.7 MP Front Camera'\n",
" 'HarmonyOS v4.2' 'HarmonyOS v5'\n",
" '50 MP Quad Rear &amp; 10.7 MP Front Camera'\n",
" '50 MP + 13 MP Dual Rear &amp; 12 MP Front Camera'\n",
" '50 MP + 48 MP + 8 MP Triple Rear &amp; 32 MP Front Camera']\n",
"['13' '10' '12' '11' '15' '10.0' '9.0 (Pie)' '14' nan '7.1.1 (Nougat)'\n",
" '8.0 (Oreo)' '8.1 (Oreo)' '5.1 (Lollipop)' '6.0 (Marshmallow)' '9 (Pie)'\n",
" '3' '2' '4.0' '3.0 (Honeycomb)' '2.0' '3.0' '3.1' '5.0' '4.1']\n",
"['9,999' '9,990' '11,999' '11,990' '11,599' '12,298' '14,999' '14,990'\n",
" '14,949' '19,999' '19,990' '19,799' '19,499' '18,999' '18,990' '20,999'\n",
" '29,999' '28,990' '30,500' '30,999' '39,999' '39,990' '38,900' '37,999'\n",
" '41,289' '41,790' '42,990' '42,999' '49,999' '49,990' '49,000' '47,990'\n",
" '44,999' '44,990' '51,999' '54,990' '54,999' '59,999' '57,990' '64,999'\n",
" '65,690' '69,990' '69,999' '70,000' '1,99,990' '1,84,999' '1,79,990'\n",
" '1,77,999' '1,64,999' '1,59,999' '1,54,999' '1,39,999' '1,30,376'\n",
" '1,29,999' '6,990' '6,999' '7,499' '7,999' '8,033' '8,199' '8,490'\n",
" '9,499' '10,199' '10,499' '11,899' '11,580' '11,490' '11,390' '10,999'\n",
" '12,350' '12,490' '15,050' '29,990' '29,799' '30,739' '31,398' '31,990'\n",
" '38,990' '38,799' '37,990' '40,990' '49,940' '48,990' '46,990' '45,990'\n",
" '45,210' '50,999' '56,990' '58,990' '62,990' '63,999' '64,990' '65,490'\n",
" '71,990' '74,899' '76,990' '79,999' '80,990' '1,39,990' '1,18,990'\n",
" '1,15,990' '1,13,990' '1,10,990' '1,09,990' '1,07,990' '1,06,990'\n",
" '99,990' '94,999' '89,999' '89,990' '82,990' '6,950' '7,199' '7,450'\n",
" '7,480' '7,790' '7,815' '7,850' '7,919' '7,920' '7,945' '7,950' '7,980'\n",
" '9,893' '9,820' '10,299' '10,390' '11,910' '11,749' '11,499' '12,251'\n",
" '14,844' '14,499' '13,999' '15,299' '15,329' '15,749' '15,990' '19,783'\n",
" '20,499' '20,500' '20,599' '30,049' '29,996' '28,979' '28,339' '31,089'\n",
" '38,999' '36,999' '36,990' '35,999' '34,999' '34,990' '33,999' '33,990'\n",
" '15,499' '20,699' '20,990' '28,900' '30,200' '30,900' '35,990' '45,999'\n",
" '47,999' '49,499' '50,990' '78,990' '79,990' '84,990' '84,999' '94,990'\n",
" '1,34,999' '1,29,990' '1,19,900' '1,14,990' '10,990' '12,899' '12,990'\n",
" '13,499' '13,990' '15,999' '17,990' '17,999' '21,838' '22,486' '22,990'\n",
" '22,999' '28,999' '26,990' '25,999' '25,990' '30,990' '32,990' '43,990'\n",
" '52,652' '52,999' '57,999' '72,990' '76,429' '7,299' '7,580' '7,890'\n",
" '7,972' '7,990' '8,499' '8,689' '8,990' '8,999' '9,799' '9,690' '9,249'\n",
" '10,330' '10,880' '11,539' '12,194' '12,999' '13,267' '13,290' '13,490'\n",
" '14,899' '14,950' '15,590' '16,999' '17,945' '19,490' '21,990' '21,999'\n",
" '24,499' '24,990' '25,890' '26,499' '27,990' '27,999' '27,199' '31,999'\n",
" '55,990' '56,999' '66,499' '67,990' '77,990' '1,02,999' '1,87,990'\n",
" '1,24,999' '1,04,999' '1,03,999' '23,999' '40,299' '40,999' '32,999'\n",
" '43,999' '46,999' '59,990' '62,999' '74,990' '1,01,999' '1,08,999'\n",
" '1,25,990' '1,46,990' '1,59,990' '7,190' '7,309' '7,394' '63,990'\n",
" '70,990' '71,999' '72,999' '74,999' '1,09,900' '82,999' '81,990' '7,124'\n",
" '7,290' '9,099' '7,599' '9,490' '7,899' '8,899' '8,690' '11,110' '11,450'\n",
" '11,000' '10,631' '10,900' '10,490' '12,332' '13,429' '13,599' '14,199'\n",
" '15,982' '16,990' '17,900' '17,499' '24,999' '22,863' '27,899' '26,690'\n",
" '25,171' '21,499' '21,390' '26,899' '22,492' '36,880' '33,779' '32,883'\n",
" '33,499' '35,499' '41,740' '1,24,990' '89,748' '99,999' '81,999'\n",
" '1,05,999' '1,03,000' '8,980' '8,489' '8,660' '12,749' '13,950' '16,499'\n",
" '16,299' '17,995' '15,190' '23,499' '25,299' '21,490' '20,198' '30,799'\n",
" '36,199' '31,899' '45,215' '68,899' '63,490' '8,349' '7,820' '8,890'\n",
" '9,478' '9,764' '9,489' '8,744' '9,800' '11,049' '10,190' '10,466'\n",
" '10,750' '10,899' '12,877' '13,374' '12,499' '12,900' '13,489' '15,323'\n",
" '18,708' '16,485' '18,398' '18,577' '16,400' '16,949' '17,949' '16,998'\n",
" '17,789' '16,500' '21,828' '27,875' '21,477' '23,880' '23,900' '20,615'\n",
" '23,649' '29,004' '22,799' '26,999' '24,150' '33,900' '52,990' '1,04,990'\n",
" '7,998' '7,090' '14,989' '18,928' '23,990' '41,990' '88,990' '1,49,999'\n",
" '20,000' '16,899' '18,879' '16,134' '24,454' '20,065' '22,592' '26,674'\n",
" '22,499' '35,609' '39,888' '42,437' '43,889' '40,108' '47,998' '43,299'\n",
" '58,699' '55,999' '63,359' '7,699' '9,190' '7,900' '7,689' '9,998'\n",
" '11,159' '11,350' '10,269' '11,489' '11,425' '10,949' '12,120' '12,239'\n",
" '12,428' '15,898' '18,377' '20,075' '17,975' '16,890' '18,390' '18,499'\n",
" '22,297' '28,517' '24,329' '20,048' '26,479' '24,890' '24,449' '36,898'\n",
" '44,949' '69,899' '53,990' '83,999' '93,990' '2,14,990' '1,34,990'\n",
" '1,21,999' '1,91,999' '92,990' '25,499' '7,319' '10,749' '10,489' '8,799'\n",
" '8,346' '7,949' '1,19,990']\n",
"['Samsung' 'Vivo' 'Realme' 'OPPO' 'Oppo' 'iQOO' 'IQOO' 'Poco' 'POCO'\n",
" 'Honor' 'Nothing' 'Google' 'itel' 'Itel' 'Asus' 'LG' 'Lenovo' 'Gionee'\n",
" 'Motorola' 'OnePlus' 'Xiaomi' 'Tecno' 'Huawei' 'Lava' 'Coolpad' 'TCL']\n",
"[' 128 GB inbuilt' ' 32 GB inbuilt' ' 64 GB inbuilt' ' 256 GB inbuilt'\n",
" ' 1 TB inbuilt' ' 512 GB inbuilt' ' 16 GB inbuilt' ' Octa Core'\n",
" ' 258 GB inbuilt' ' 8 GB inbuilt' nan]\n",
"[' 25W Fast Charging' ' 15W Fast Charging' nan ' 18W Fast Charging'\n",
" ' 30W Fast Charging' ' Fast Charging' ' 45W Fast Charging'\n",
" ' 33W Fast Charging' ' 67W Fast Charging' ' 80W Fast Charging'\n",
" ' 10W Fast Charging' ' 44W Fast Charging' ' 66W Fast Charging'\n",
" ' 100W Fast Charging' ' 120W Fast Charging' ' 150W Fast Charging'\n",
" ' 55W Fast Charging' ' 200W Fast Charging' ' 65W Fast Charging'\n",
" ' 60W Fast Charging' ' 20W Fast Charging' ' 50W Fast Charging'\n",
" ' 57W Fast Charging' ' 240W Fast Charging' ' 125W Fast Charging'\n",
" ' 68W Fast Charging' ' 250W Fast Charging' ' 27W Fast Charging'\n",
" ' 35W Fast Charging' ' 22.5W Fast Charging' ' 40W Fast Charging'\n",
" ' 90W Fast Charging' ' 08W Fast Charging' ' 68.2W Fast Charging'\n",
" ' 135W Fast Charging' ' 70W Fast Charging' ' Water Drop Notch'\n",
" ' 88W Fast Charging' ' 7.5W Fast Charging']\n",
"[' 2408 x 1080 px Display with Water Drop Notch'\n",
" ' 720 x 1560 px Display with Punch Hole'\n",
" ' 1080 x 2408 px Display with Water Drop Notch' ' 720 x 1600 px'\n",
" ' 720 x 1600 px Display with Water Drop Notch'\n",
" ' 1080 x 2340 px Display with Water Drop Notch'\n",
" ' 720 x 1560 px Display with Water Drop Notch' ' 1080 x 2408 px'\n",
" ' 1080 x 2400 px Display with Water Drop Notch' ' 1080 x 2340 px'\n",
" ' 1080 x 2400 px' ' 720 x 1520 px Display with Water Drop Notch'\n",
" ' 1080 x 2400 px Display with Punch Hole' ' 1440 x 3200 px'\n",
" ' 1080 x 2340 px Display with Punch Hole' ' 1080 x 2640 px'\n",
" ' 1080 x 2412 px' ' 1440 x 3040 px Display with Punch Hole'\n",
" ' 1080 x 2400 px Display' ' 1080 x 2460 px Display with Punch Hole'\n",
" ' 1440 x 3040 px Display' ' 1440 x 2960 px Display' ' 1812 x 2176 px'\n",
" ' 1440 x 3120 px' ' 1440 x 3080 px' ' 720 x 1612 px'\n",
" ' 480 x 854 px Display' ' 720 x 1544 px Display with Water Drop Notch'\n",
" ' 720 x 1612 px Display with Water Drop Notch' ' 1600 x 720 px'\n",
" ' 1080 x 2388 px Display with Water Drop Notch' ' 720 x 1280 px Display'\n",
" ' 1612 x 720 px' ' 1080 x 2376 px' ' 1800 x 3200 px'\n",
" ' 1080 x 2400 px Display with Small Notch' ' 1080 x 2388 px'\n",
" ' 1260 x 2800 px' ' 1260 x 2712 px' ' 1080 x 2256 px Display'\n",
" ' 1080 x 2520 px' ' 2200 x 2480 px' ' 1916 x 2160 px' ' 1768 x 2208 px'\n",
" ' 1600 x 720 px Display with Water Drop Notch' ' 720 x 1604 px'\n",
" ' 1080 x 2460 px' ' 720 x 1600 px Display with Punch Hole' nan\n",
" ' 1264 x 2780 px' ' 1240 x 2772 px'\n",
" ' 1440 x 3200 px Display with Punch Hole' ' 2400 x 1080 px'\n",
" ' 1864 x 3820 px' ' 1440 x 3216 px' ' 1080 x 2732 px' ' 1440 x 3168 px'\n",
" ' 1200 x 2400 px' ' 1792 x 1920 px' ' 1800 x 3400 px'\n",
" ' 1440 x 3200 px Display' ' 2268 x 2440 px'\n",
" ' 1080 x 2388 px Display with Punch Hole' ' 1800 x 3440 px'\n",
" ' 720 x 1650 px' ' 720 x 1650 px Display with Water Drop Notch'\n",
" ' 720 x 1680 px Display with Water Drop Notch' ' 720 x 1680 px'\n",
" ' 1220 x 2712 px' ' 1600 x 2560 px' ' 1080 x 2404 px' ' 1220 x 3200 px'\n",
" ' 1200 x 2400 px Display with Water Drop Notch' ' 1220 x 2652 px'\n",
" ' 1080 x 2412 px Display with Small Notch' ' 1200 x 2664 px'\n",
" ' 1224 x 2700 px' ' 1200 x 2652 px'\n",
" ' 1080 x 2400 px Display with Dual Punch Hole' ' 1264 x 2800 px'\n",
" ' 1280 x 2800 px' ' 2016 x 2348 px' ' 1312 x 2848 px' ' 2156 x 2344 px'\n",
" ' 1224 x 2688 px' ' 1344 x 2772 px' ' 1984 x 2272 px'\n",
" ' 2200 x 2480 px Display' ' 1084 x 2412 px' ' 1084 x 2728 px'\n",
" ' 1080 x 2220 px Display' ' 1080 x 2280 px Display' ' 1344 x 2992 px'\n",
" ' 1940 x 3120 px' ' 1840 x 2208 px'\n",
" ' 1600 x 720 px Display with Punch Hole' ' 720 x 1640 px'\n",
" ' 1080 x 2448 px' ' 2340 x 1080 px'\n",
" ' 1080 x 2460 px Display with Water Drop Notch' ' 1080 x 1920 px Display'\n",
" ' 720 x 1440 px Display' ' 720 x 1600 px Display with Large Notch'\n",
" ' 1080 x 2400 px Display with Large Notch' ' 540 x 960 px Display'\n",
" ' 1440 x 3088 px' ' 1080 x 2408 px Display with Punch Hole'\n",
" ' Full HD+ Display with Punch Hole'\n",
" ' 1080 x 2246 px Display with Large Notch' ' 2460 x 1080 px'\n",
" ' 1080 x 1920 px' ' 720 x 1612 px Display with Punch Hole'\n",
" ' 1200 x 2780 px' ' 876 x 2142 px Display with Large Notch'\n",
" ' 1440 x 2780 px' ' 1440 x 3412 px' ' 1440 x 3120 px Display'\n",
" ' 576 x 1440 px Display' ' 720 x 1650 px Display with Punch Hole'\n",
" ' 1080 x 2280 px Display with Water Drop Notch' ' 1080 x 2480 px'\n",
" ' 2000 x 2296 px' ' 1596 x 2296 px Display' ' 1080 x 2160 px'\n",
" ' 1224 x 2776 px' ' 1220 x 2700 px' ' 1260 x 2844 px' ' 1212 x 2616 px'\n",
" ' 1256 x 2760 px' ' 1176 x 2400 px Display with Large Notch'\n",
" ' 1860 x 3220 px' ' 1216 x 2688 px' ' 1260 x 2720 px'\n",
" ' 1344 x 2772 px Display' ' 1200 x 2640 px' ' 1136 x 2690 px'\n",
" ' 1188 x 2790 px' ' 1080 x 2388 px Display'\n",
" ' 1080 x 2412 px Display with Punch Hole' ' 540 x 1092 px Display'\n",
" ' 480 x 960 px Display' ' 720 x 1640 px Display with Water Drop Notch']\n",
"[' Octa Core Processor' ' 1.8 GHz Processor' ' 2 GHz Processor'\n",
" ' Octa Core' nan ' Quad Core' ' Nine-Cores' ' Nine Core' ' Nine Cores'\n",
" ' Deca Core Processor' ' 1.3 GHz Processor' ' 1.6 GHz Processor'\n",
" ' 2.3 GHz Processor' ' Deca Core' ' 128 GB inbuilt']\n",
"['Exynos 1330' 'Octa Core' 'Helio G88' 'Helio P35' 'Dimensity 700'\n",
" 'Exynos 9611' 'Exynos 850' 'Exynos 1280' 'Snapdragon 695' 'Exynos 850'\n",
" 'Helio P65' 'Octa Core Processor' 'Snapdragon 680' 'Helio G80'\n",
" 'Samsung Exynos 7884' 'Dimensity 6100 Plus' 'Dimensity 700 5G'\n",
" 'Snapdragon 680' 'Snapdragon 888' 'Exynos 1380' 'Snapdragon 865'\n",
" 'Exynos 980' 'Snapdragon 730' 'Snapdragon 675' 'Snapdragon 7 Gen1'\n",
" 'Snapdragon 750G' 'Snapdragon 855+' 'Snapdragon 870' 'Snapdragon 710'\n",
" 'Exynos 1480' 'Snapdragon 720G ' 'Snapdragon 778g' 'Exynos 2200'\n",
" 'Snapdragon 7+ Gen2' 'Snapdragon 8 Gen 2' 'Exynos 9825'\n",
" 'Snapdragon 7s Gen2' 'Exynos 2100' 'Dimensity 1300' 'Snapdragon 778G+'\n",
" 'Snapdragon 778G' 'Exynos 2300' 'Snapdragon 8+ Gen1' 'Snapdragon 8 Gen3'\n",
" 'Snapdragon 8+ Gen1' 'Snapdragon 8 Gen1' 'Exynos 990' 'Snapdragon 855'\n",
" 'Exynos 8895' 'Exynos 2100' 'Exynos 9810' 'Snapdragon 8 Gen2'\n",
" 'Helio G85' 'Helio P22' 'Helio MT6580' 'Snapdragon 439 ' 'Helio'\n",
" 'Snapdragon 675' 'Snapdragon 450' 'Dimensity 6020' 'Helio P22'\n",
" 'Helio G70' 'Snapdragon 680 ' 'Snapdragon 460' 'Snapdragon 430'\n",
" 'Helio P70 ' 'Snapdragon MSM8937' 'Snapdragon 6 Gen1'\n",
" 'Snapdragon 7 Gen2' 'Dimensity 7200' 'Snapdragon 4 Gen2' 'Snapdragon 685'\n",
" 'Helio G99' 'Dimensity 1200' 'Dimensity 800U ' 'Snapdragon'\n",
" 'Snapdragon 765G ' 'Dimensity 8200' 'Snapdragon 7 Gen3' 'Snapdragon 782G'\n",
" 'Dimensity 9300' 'Dimensity 9200' 'Dimensity 1100' 'Dimensity 8200'\n",
" 'Dimensity 9000 Plus' 'Dimensity 8300' 'Dimensity 9300 Plus'\n",
" 'Dimensity 9200 Plus' 'Snapdragon 888+' 'Dimensity 9000'\n",
" 'Dimensity 9400' 'Snapdragon 888 ' 'Snapdragon 8 Gen1' 'Unisoc SC9863A'\n",
" 'Helio G35' 'Tiger T612' 'Unisoc T610' 'SC9863A' 'Unisoc SC9863A'\n",
" 'Snapdragon 665' 'Unisoc T612' 'Tiger T616' 'Tiger T610' 'Helio G96'\n",
" 'Helio G36' 'Snapdragon 662' 'Helio G35' 'Dimensity 6300' 'Helio G85 '\n",
" 'Helio G95' 'Helio G95' 'Dimensity 810 5G' 'Dimensity 810 5G' 'No Wifi'\n",
" 'Dimensity 7025' 'Dimensity 700 5G' 'Snapdragon 712' 'Dimensity 7050'\n",
" 'Snapdragon 720G ' 'Snapdragon 7 Gen1' 'Snapdragon 7+ Gen3'\n",
" 'Snapdragon 695' 'Dimensity 8100' 'Snapdragon 778G' 'Dimensity 1000+'\n",
" 'Snapdragon 7s Gen3' 'Dimensity 6080' 'Snapdragon 888 '\n",
" 'Snapdragon 8s Gen3' 'Snapdragon 8 Gen4' 'Snapdragon 8 Gen1 Plus'\n",
" 'Dimensity 7020' 'Snapdragon 730G' 'Snapdragon 480' 'Snapdragon 662 '\n",
" 'Dimensity 800U' 'Snapdragon 765G ' 'Dimensity 900'\n",
" 'Dimensity 1200 Max' 'Dimensity 8100 Max' 'Dimensity 8100-Max'\n",
" 'Dimensity 9200 Plus' 'Snapdragon 765G' 'Snapdragon 865 '\n",
" 'Dimensity 9000' 'Snapdragon 4 Gen 1' 'Snapdragon 695 '\n",
" 'Snapdragon 480+' 'Snapdragon 6 Gen 1' 'Snapdragon 778G Plus'\n",
" 'Snapdragon 870' 'Helio G85' 'Helio A22' 'Helio G25' 'Helio G37'\n",
" 'Helio G91' 'Snapdragon 720G' 'Snapdragon 665' 'Snapdragon 732G'\n",
" 'Snapdragon 695 ' 'Dimensity 920' 'Snapdragon 7s Gen 2'\n",
" 'Dimensity 8300 Ultra' 'Dimensity 8100' 'Snapdragon 480+'\n",
" 'Dimensity 7030' 'Dimensity 1100'\n",
" 'Snapdragon 7 Gen 1 Accelerated Edition' 'Dimensity 8000' 'Exynos 1080'\n",
" 'Snapdragon 8 Gen 1' 'Dimensity 7200 Pro' 'Snapdragon 778G Plus'\n",
" 'Qualcomm Snapdragon 670' 'Tensor G2' 'Google Tensor' 'Google Tensor G2'\n",
" 'Tensor G3' 'Google Tensor 4' 'Google Tensor G2' 'Google Tensor G4'\n",
" 'Google Tensor 2' 'Quad Core' 'Unisoc T606' 'Unisoc T603' ' Unisoc T606'\n",
" 'Snapdragon 8 Gen1 Plus' 'Snapdragon 865+' 'Snapdragon 765G'\n",
" 'Snapdragon 865' 'Helio P25' 'Qualcomm Snapdragon 450' 'Helio P60'\n",
" 'Tiger T610' 'Tiger T310' 'Unisoc SC9836A' 'Snapdragon 439'\n",
" 'Unisoc T606' 'Helio MT6737T' 'Snapdragon 450 ' 'Exynos 1280 ' 'Exynos'\n",
" 'Snapdragon 750G' 'Exynos 1280' 'Dimensity 1080' 'Exynos 2400'\n",
" 'Snapdragon 480 ' 'Helio P35 ' 'Snapdragon 4 Gen1' 'Dimensity 900'\n",
" 'Tiger T616' 'Tiger T606' 'Snapdragon 636' 'Helio G37' 'Helio G99'\n",
" 'Snapdragon SM4375' 'Dimensity 8020' 'Snapdragon 7+ Gen2'\n",
" 'Snapdragon 778G ' 'Snapdragon 888+ ' 'Snapdragon 750G '\n",
" 'Snapdragon 888+' ' Dimensity 7030' 'Snapdragon 6 Gen 1'\n",
" 'Dimensity 1050' 'Snapdragon 8+ Gen2' 'Dimensity 930' 'Snapdragon (4 nm)'\n",
" 'Snapdragon 460 ' 'Snapdragon 782G' 'Snapdragon 695 5G' 'Snapdragon 690'\n",
" 'Dimensity 1300' 'Snapdragon 855+' 'Dimensity 1200 AI'\n",
" 'Snapdragon 8 Gen4' 'Snapdragon 8 Gen2' 'Helio G25' 'Unisoc SC9832E'\n",
" 'Snapdragon 4 Gen 1' 'Snapdragon 712' 'Dimensity 700 '\n",
" 'Snapdragon 662 ' 'Helio G96' 'Snapdragon 732G' 'Snapdragon 732G '\n",
" 'Snapdragon 678' 'Dimensity 8200 Ultra ' 'Dimensity 7200 Ultra'\n",
" 'Dimensity 920 5G' 'Helio G99 Ultra' 'Dimensity (4 nm)' 'Dimensity 8050'\n",
" 'Kirin 710A' 'Kirin 710A' 'Snapdragon (6 nm)' 'Snapdragon 778G 4G'\n",
" '4 GB RAM' 'Sanpdragon 680' 'Kirin 710F' 'Kirin 830' 'Kirin 9000S'\n",
" 'Kirin' 'Snapdragon 8+ Gen 1' 'Kirin 9010' 'Snapdragon 8+ Gen 1 '\n",
" 'Kirin 990' 'Kirin 9000E' 'Kirin 9000' 'Kirin 990' ' Helio G36'\n",
" 'Snapdragon 888' 'Tiger T616' 'Tiger T616 ' 'Helio A22' 'Helio A25']\n"
]
}
],
"source": [
"for col in df2.columns:\n",
" print(df2[col].unique())\n",
"\n",
"#Преобразование категориальных данных в числа\n",
"#Удаление подстроки 'GB RAM', чтобы остались только числа\n",
"df2['Ram'] = df2['Ram'].replace(' GB RAM', '', regex=True)\n",
"\n",
"import re\n",
"# Удаление строк, у кот. в Ram какое-то неверное значение (оставление только строк, где число)\n",
"df2 = df2[df2['Ram'].apply(lambda x: bool(re.match(r'^\\d+(\\.\\d+)?$', str(x))))]\n",
"\n",
"#Исправление батареи. Удаление подстроки 'mAh Battery', чтобы остались только числа\n",
"df2['Battery'] = df2['Battery'].replace(' mAh Battery', '', regex=True)\n",
"\n",
"#Исправление диагонали. Удаление подстроки 'inches'\n",
"df2['Display'] = df2['Display'].replace(' inches', '', regex=True)\n",
"\n",
"#Исправление встроенной памяти на числа\n",
"df2['Inbuilt_memory'] = df2['Inbuilt_memory'].replace(' GB inbuilt', '', regex=True)\n",
"df2['Inbuilt_memory'] = df2['Inbuilt_memory'].replace('TB inbuilt', '024', regex=True)\n",
"df2['Inbuilt_memory'] = df2['Inbuilt_memory'].replace(' ', '', regex=True)"
]
},
{
"cell_type": "code",
"execution_count": 305,
"metadata": {},
"outputs": [],
"source": [
"# Проверка количества запятых в каждой строке\n",
"df2['comma_count'] = df2['Price'].apply(lambda x: x.count(','))\n",
"# Удаление строк, где больше одной запятой\n",
"df2 = df2[df2['comma_count'] <= 1]\n",
"# Удаление вспомогательного столбца\n",
"df2 = df2.drop(columns=['comma_count'])\n",
"df2['Price'] = df2['Price'].replace(',', '.', regex=True)\n",
"\n",
"\n",
"df2['Price'] = pd.to_numeric(df2['Price'], errors='coerce')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"По boxplotам видно, что данные о телефонах смещены в сторону недорогих телефонов до 40 долларов с экранами до 7 дюймов и встроенной памятью до 256 Гб.\n",
"\n",
"\n",
"По цене и диагонали экрана много данных, находящихся вне основной массе, но в данном случае это является полезным шумом. По мощности батареи выбросы можно считать вредным шумом"
]
},
{
"cell_type": "code",
"execution_count": 306,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Выбросы в столбце 'Ram':\n",
"1 2.0\n",
"39 12.0\n",
"49 12.0\n",
"54 12.0\n",
"65 12.0\n",
" ... \n",
"1312 12.0\n",
"1344 2.0\n",
"1346 2.0\n",
"1348 2.0\n",
"1351 2.0\n",
"Name: Ram, Length: 267, dtype: float64\n",
"\n",
"Выбросы в столбце 'Battery':\n",
"0 6000\n",
"1 4000\n",
"3 6000\n",
"6 6000\n",
"9 6000\n",
" ... \n",
"1344 3000\n",
"1346 3000\n",
"1349 3000\n",
"1350 3000\n",
"1364 4000\n",
"Name: Battery, Length: 296, dtype: int64\n",
"\n",
"Выбросы в столбце 'Display':\n",
"15 6.10\n",
"21 6.21\n",
"53 6.10\n",
"64 6.10\n",
"65 6.10\n",
"72 7.45\n",
"74 6.20\n",
"75 6.20\n",
"91 4.50\n",
"122 5.20\n",
"125 5.50\n",
"197 8.03\n",
"208 7.80\n",
"391 7.10\n",
"393 7.10\n",
"538 12.10\n",
"571 7.20\n",
"597 7.71\n",
"600 7.92\n",
"606 7.80\n",
"627 7.80\n",
"628 5.60\n",
"629 5.70\n",
"631 6.10\n",
"632 6.14\n",
"635 6.10\n",
"636 6.03\n",
"637 6.10\n",
"639 6.20\n",
"640 6.20\n",
"641 6.20\n",
"643 6.10\n",
"662 5.90\n",
"663 5.90\n",
"665 5.92\n",
"669 6.00\n",
"687 6.09\n",
"688 5.20\n",
"689 6.09\n",
"690 5.99\n",
"701 6.20\n",
"715 5.70\n",
"719 5.00\n",
"779 6.10\n",
"789 6.20\n",
"797 6.20\n",
"923 8.00\n",
"938 6.20\n",
"1142 5.00\n",
"1158 6.08\n",
"1226 7.85\n",
"1227 7.85\n",
"1228 7.90\n",
"1229 7.11\n",
"1316 7.09\n",
"1344 6.00\n",
"1346 6.00\n",
"1349 6.00\n",
"1350 6.10\n",
"Name: Display, dtype: float64\n",
"\n",
"Выбросы в столбце 'Inbuilt_memory':\n",
"178 512\n",
"212 512\n",
"299 512\n",
"315 512\n",
"325 1024\n",
"329 512\n",
"372 512\n",
"448 512\n",
"454 512\n",
"525 512\n",
"532 512\n",
"548 512\n",
"573 512\n",
"598 512\n",
"599 512\n",
"604 512\n",
"605 512\n",
"623 512\n",
"664 512\n",
"670 512\n",
"673 512\n",
"674 512\n",
"675 512\n",
"677 512\n",
"679 512\n",
"699 512\n",
"794 512\n",
"855 512\n",
"1012 512\n",
"1031 512\n",
"1034 512\n",
"1038 512\n",
"1041 512\n",
"1051 512\n",
"1115 512\n",
"1123 512\n",
"1218 512\n",
"1226 512\n",
"1227 512\n",
"1276 512\n",
"Name: Inbuilt_memory, dtype: int64\n",
"\n",
"Выбросы в столбце 'Price':\n",
"196 79.999\n",
"197 80.990\n",
"206 99.990\n",
"207 99.990\n",
"208 99.990\n",
" ... \n",
"1280 79.990\n",
"1281 99.990\n",
"1288 82.990\n",
"1290 92.990\n",
"1291 79.990\n",
"Name: Price, Length: 67, dtype: float64\n",
"\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKUAAAMWCAYAAAAgRDUeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC36UlEQVR4nOzdeXxN1/7/8fdJIqMkaspQQRoqlJaixhCVK8Y2hvaa6aXa4vYaSquDokrN9NbQ3m8vramDRig1z62xXLeiKGmCloQqiRhCkv37wy/7OhIEyTknyev5eJxH7bU+Z+/P3m3POj5n7bUthmEYAgAAAAAAAGzIyd4JAAAAAAAAoOihKAUAAAAAAACboygFAAAAAAAAm6MoBQAAAAAAAJujKAUAAAAAAACboygFAAAAAAAAm6MoBQAAAAAAAJujKAUAAAAAAACboygFAAAAAAAAm6MoBdzEYrFo1KhRNjnW6tWrVbNmTbm7u8tisejChQs2OS4A4PYYBwAA94Jx439GjRoli8WSb/vv3bu3KlasmG/7h31QlIJNzJs3TxaLxepVtmxZNWvWTKtWrbJ3eg/s559/1qhRo5SQkJCr+HPnzun555+Xh4eHZs6cqfnz58vLyyvf8rv1+ru4uOjhhx9W79699fvvv+fbcQEgC+OANXuPA3l1/ceNG6eYmJhs7du3b9eoUaMc7i9MAAoOxg1r9h433N3dFRgYqMjISH344Ye6ePFivh0bRYuLvRNA0TJmzBgFBwfLMAwlJSVp3rx5at26tb799lu1bdvW3undt59//lmjR49WeHh4rqr3e/bs0cWLF/Xee+8pIiIi/xP8/7Ku/9WrV7Vz507NmzdP33//vWJjY+Xu7m6zPAAUXYwDN9h7HMir6z9u3Dh16tRJUVFRVu3bt2/X6NGj1bt3b5UoUSJvkgdQJDFu3GDvceP69etKTEzU5s2bNWjQIE2dOlXLly/X448/bsa+/fbbeuONN2yWGwoHilKwqVatWqlOnTrmdp8+feTn56fFixcX6EHlXp05c0aS8vSL+qVLl+76a8nN179v374qXbq0JkyYoOXLl+v555/Ps1wA4HYYB25whHFAKnjX//Lly/L09LR3GgBsqKB/buUVRxk3RowYoY0bN6pt27Z65plndOjQIXl4eEiSXFxc5OJCiQH3htv3YFclSpSQh4dHtg+vS5cuaejQoQoKCpKbm5uqVKmiyZMnyzAMSdKVK1cUGhqq0NBQXblyxXzfn3/+qYCAADVs2FAZGRmSbtx7XLx4cf3666+KjIyUl5eXAgMDNWbMGHN/d/Kf//xHrVq1ko+Pj4oXL67mzZtr586dZv+8efP03HPPSZKaNWtmTnHdvHlzjvsLDw9Xr169JEl169aVxWJR7969zf6vv/5atWvXloeHh0qXLq3u3btnu8Uu65zi4uLUunVreXt7q1u3bnc9l1uFhYVJkuLi4sy2a9euaeTIkapdu7Z8fX3l5eWlsLAwbdq0yeq9CQkJslgsmjx5smbOnKlHHnlEnp6eatGihU6ePCnDMPTee++pXLly8vDw0LPPPqs///zznnMEULgxDth3HLjd9Z88ebIaNmyoUqVKycPDQ7Vr19aSJUusYiwWiy5duqTPPvvMPOfevXtr1KhRGjZsmCQpODjY7Lv5FpUFCxaY51iyZEl17txZJ0+ezHadqlevrr1796pJkyby9PTUm2++qV69eql06dK6fv16tvNp0aKFqlSpcs/XAUDBwbhh33FDkp5++mm98847On78uBYsWGC257Sm1Lp169S4cWOVKFFCxYsXV5UqVfTmm2+a/Zs3b5bFYtGXX36pN998U/7+/vLy8tIzzzyTbVzISW7Gq6ZNm+qJJ57I8f1VqlRRZGTkvZw+8hhFKdhUcnKy/vjjD509e1YHDx7UK6+8otTUVHXv3t2MMQxDzzzzjKZNm6aWLVtq6tSpqlKlioYNG6YhQ4ZIkjw8PPTZZ5/p2LFjeuutt8z3DhgwQMnJyZo3b56cnZ3N9oyMDLVs2VJ+fn6aOHGiateurXfffVfvvvvuHfM9ePCgwsLC9N///lfDhw/XO++8o/j4eIWHh2vXrl2SpCZNmujVV1+VJL355puaP3++5s+fr6pVq+a4z7feekv9+vWTdGM67Pz58/XSSy9JujFAPf/883J2dtb48eP14osvKjo6Wo0bN862Lkd6eroiIyNVtmxZTZ48WR07dszNvwIrWX9BeOihh8y2lJQU/d///Z/Cw8M1YcIEjRo1SmfPnlVkZKT279+fbR8LFy7UrFmz9Pe//11Dhw7Vli1b9Pzzz+vtt9/W6tWr9frrr6tfv3769ttv9dprr91zjgAKF8YB+44Dubn+kjRjxgzVqlVLY8aM0bhx4+Ti4qLnnntOK1euNGPmz58vNzc3hYWFmef80ksvqUOHDurSpYskadq0aWZfmTJlJEnvv/++evbsqcqVK2vq1KkaNGiQNmzYoCZNmmQ7x3PnzqlVq1aqWbOmpk+frmbNmqlHjx46d+6c1qxZYxWbmJiojRs3ZjsXAAUb44Zj/f0hS48ePSRJa9euveO1aNu2rdLS0jRmzBhNmTJFzzzzjH744Ydsse+//75Wrlyp119/Xa+++qrWrVuniIgIqwJiTnIzXvXo0UM//fSTYmNjrd67Z88e/fLLL4wb9mYANjB37lxDUraXm5ubMW/ePKvYmJgYQ5IxduxYq/ZOnToZFovFOHbsmNk2YsQIw8nJydi6davx9ddfG5KM6dOnW72vV69ehiTj73//u9mWmZlptGnTxnB1dTXOnj1rtksy3n33XXM7KirKcHV1NeLi4sy2U6dOGd7e3kaTJk3Mtqxjb9q06Z6ux549e8y2a9euGWXLljWqV69uXLlyxWxfsWKFIckYOXJktnN644037ul469evN86ePWucPHnSWLJkiVGmTBnDzc3NOHnypBmbnp5upKWlWb3//Pnzhp+fn/G3v/3NbIuPjzckGWXKlDEuXLhgto8YMcKQZDzxxBPG9evXzfYuXboYrq6uxtWrV3OVM4DChXEg5+th63EgN9ffMAzj8uXLVtvXrl0zqlevbjz99NNW7V5eXkavXr2yvX/SpEmGJCM+Pt6qPSEhwXB2djbef/99q/YDBw4YLi4uVu1NmzY1JBlz5syxis3IyDDKlStn/PWvf7Vqnzp1qmGxWIxff/31ttcBQMHBuJHz9bD1uHHz8W7l6+tr1KpVy9x+9913jZtLDNOmTTMkWV2vW23atMmQZDz88MNGSkqK2f7VV18ZkowZM2ZYnUOFChWs3p+b8erChQuGu7u78frrr1vFvvrqq4aXl5eRmpp62/yQ/5gpBZuaOXOm1q1bp3Xr1mnBggVq1qyZ+vbtq+joaDPmu+++k7Ozs/nrQZahQ4fKMAyrp22MGjVKjz32mHr16qX+/furadOm2d6XZeDAgeafLRaLBg4cqGvXrmn9+vU5xmdkZGjt2rWKiorSI488YrYHBASoa9eu+v7775WSknJf1yEnP/74o86cOaP+/ftbLTrepk0bhYaGWlX7s7zyyiv3dIyIiAiVKVNGQUFB6tSpk7y8vLR8+XKVK1fOjHF2dparq6skKTMzU3/++afS09NVp04d7du3L9s+n3vuOfn6+prb9erVkyR1797dalp1vXr1dO3aNZ72BxRxjAO3Z4txIDfXX5K5PogknT9/XsnJyQoLC8txHLgX0dHRyszM1PPPP68//vjDfPn7+6ty5crZbhV3c3PTCy+8YNXm5OSkbt26afny5VZPf1q4cKEaNmyo4ODgB8oRgGNh3Lg9W4wbd1K8ePE7PoUva/2rZcuWKTMz84776tmzp7y9vc3tTp06KSAgQN99990d35eb8crX11fPPvusFi9ebN5+mZGRoS+//FJRUVH5+hRD3B1FKdjUU089pYiICEVERKhbt25auXKlqlWrZn7AS9Lx48cVGBho9aEkyZzOevz4cbPN1dVV//73vxUfH6+LFy9q7ty52e5jlm58gb15YJCkRx99VJJu+xjWs2fP6vLlyzmuTVG1alVlZmbm6j7n3Mo6r5y
"text/plain": [
"<Figure size 1200x800 with 5 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df2['Ram'] = pd.to_numeric(df2['Ram'])\n",
"df2['Battery'] = pd.to_numeric(df2['Battery'])\n",
"df2['Display'] = pd.to_numeric(df2['Display'])\n",
"df2['Inbuilt_memory'] = pd.to_numeric(df2['Inbuilt_memory'])\n",
"\n",
"numeric_cols = df2[['Ram', 'Battery', 'Display', 'Inbuilt_memory', 'Price']].columns\n",
"\n",
"numeric_cols = [col for col in numeric_cols]\n",
"\n",
"plt.figure(figsize=(12, 8))\n",
" \n",
"\n",
"for i, col in enumerate(numeric_cols, 1):\n",
" if col == 'id':\n",
" continue\n",
" Q1 = df2[col].quantile(0.25)\n",
" Q3 = df2[col].quantile(0.75)\n",
" IQR = Q3 - Q1\n",
" lower_bound = Q1 - 1.5 * IQR\n",
" upper_bound = Q3 + 1.5 * IQR\n",
" outliers = df2[col][(df2[col] < lower_bound) | (df2[col] > upper_bound)]\n",
" print(f\"Выбросы в столбце '{col}':\\n{outliers}\\n\")\n",
" plt.subplot(len(numeric_cols) // 3 + 1, 3, i) \n",
" plt.boxplot(x=df2[col])\n",
" plt.title(f'Boxplot for {col}')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Просачивания данных нет, т.к. никакой столбец не коррелирует с целевым признаком более, чем на 0,7"
]
},
{
"cell_type": "code",
"execution_count": 307,
"metadata": {},
"outputs": [],
"source": [
"#Проверка кореляции\n",
"price_col = 'Price' # Имя столбца с ценой\n",
"for col1 in numeric_cols:\n",
" if col1 != price_col:\n",
" correlation = df2[col1].corr(df2[price_col])\n",
" if abs(correlation) > 0.7:\n",
" print(f\"Просачивание данных: Высокая корреляция ({correlation:.2f}) между столбцами '{col1}' и '{price_col}'\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Пропущенные значения есть в 3-х столбцах. Для этих столбцов возможно только задать какое-то константное значение, например \"Unknown\""
]
},
{
"cell_type": "code",
"execution_count": 308,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Столбцы с null: ['Android_version', 'fast_charging', 'Processor']\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\ujijrujijr\\AppData\\Local\\Temp\\ipykernel_10056\\2788500696.py:10: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.\n",
"The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.\n",
"\n",
"For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.\n",
"\n",
"\n",
" df2[col].fillna(\"Unknown\", inplace=True)\n"
]
}
],
"source": [
"# Проверка наличия пропущенных значений\n",
"columns_with_nulls = []\n",
"for col in df2.columns:\n",
" if df2[col].isnull().sum() > 0: \n",
" columns_with_nulls.append(col)\n",
"print(f\"Столбцы с null: {columns_with_nulls}\")\n",
"\n",
"# Замена значений null на \"Unknown\" в столбцах с пропусками\n",
"for col in columns_with_nulls:\n",
" df2[col].fillna(\"Unknown\", inplace=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**РАЗБИЕНИЕ НА ВЫБОРКИ**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучающая выборка сбалансрована, т.к. график идёт достаточно ровно и нет \"перекоса\" количества телефонов в каком-то диапазоне цен. Поэтому аугментация данных не требуется "
]
},
{
"cell_type": "code",
"execution_count": 309,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Размер обучающей выборки: 1035\n",
"Размер контрольной выборки: 129\n",
"Размер тестовой выборки: 130\n"
]
},
{
"data": {
"text/plain": [
"Text(0.5, 1.0, 'Отсортированные цены в обучающей выборке')"
]
},
"execution_count": 309,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAz8AAAHDCAYAAAAKmqQIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABanElEQVR4nO3dd3xUVf7G8edOeu8NCCEUAQWkKV1QWEBRbKsLooK6givuWnbVxbWxqyL4W9eOoi6WRV2xiwoiIsjSkd6lSE1CgHRS5/z+SDJmSGhhkklmPu+XMTP3nrnznZubYZ6ce861jDFGAAAAAODhbO4uAAAAAADqA+EHAAAAgFcg/AAAAADwCoQfAAAAAF6B8AMAAADAKxB+AAAAAHgFwg8AAAAAr0D4AQAAAOAVCD8AANSx3Nxc7d69W/n5+e4uBS6WlZWln3/+WaWlpe4uBcBpIPwAAOBixhhNmzZNPXv2VHBwsMLDw5Wamqr//Oc/7i6tUdi3b5/eeustx/3du3drxowZ7iuoipKSEk2ZMkXnn3++AgICFBUVpTZt2mjevHnuLg3AabCMMcbdRQCo2caNGzVp0iTNnz9fmZmZiomJ0cUXX6yHHnpI5513nrvLA3ACI0eO1H//+1+NHj1al19+uSIiImRZljp16qS4uDh3l9fg7d+/X+ecc44++eQTtW3bVg888ICio6P16quvurWuoqIiDR48WEuXLtUdd9yhgQMHKjg4WD4+PurWrZvCw8PdWh+AU/N1dwEAavbJJ59o5MiRio6O1m233abU1FTt3r1bb775pj766CN98MEHuvrqq91dJoDjvPPOO/rvf/+r//znP7rhhhvcXU6j1LRpU91+++0aOnSoJCkpKUk//PCDe4uSNHnyZC1btkxz5szRgAED3F0OgFqg5wdogHbs2KFOnTqpefPmWrhwodNfijMzM9WvXz/t3btX69atU8uWLd1YKYDjdezYUZ06dWowp2k1Zjt27FBmZqY6dOigkJAQt9ZSWlqq+Ph4/eEPf9CTTz7p1loA1B5jfoAG6JlnnlFBQYGmTZtW7RSZ2NhYvfbaa8rPz9eUKVMkSY8//rgsyzrpV9W/mi5btkyXXXaZoqKiFBISok6dOun55593ep7vv/9e/fr1U0hIiCIjI3XllVdq8+bNTm0qn3fLli26/vrrFR4erpiYGN19990qLCx0tDtVbZV/Qf3hhx+q1SpJw4YNk2VZevzxx8/4uaXyDy3/+Mc/1KpVKwUEBKhFixZ66KGHVFRU5NSuRYsWjppsNpsSExP1u9/9Tnv27HFq93//93/q3bu3YmJiFBQUpG7duumjjz6q9nO0LEt33XVXteWXX365WrRo4bi/e/duWZal//u//6vWtkOHDk5/Ya7cRzU9X6UxY8Y4bV+S7Ha7nnvuOZ133nkKDAxUQkKCxo0bp6NHj55wO1W3FxoaWm35Rx99VOPPq6ioSI899phat26tgIAAJScn64EHHqi2v12xfypVHg+nMmDAAKdjLzY2VsOGDdOGDRtO+VhJmjlzprp166agoCDFxsbqxhtv1P79+x3r8/PztWHDBiUnJ2vYsGEKDw9XSEiIBgwYoB9//NHRbufOnbIsS//617+qPcfixYtlWZbef/99R83H9zJU7pOq42LWrVunMWPGqGXLlgoMDFRiYqJuvfVWHT582Omxb731lizL0u7dux3L5syZo969eys4OFgRERG6/PLLq+2Tyn2cmZnpWLZy5cpqdUjVj9tK33zzjeN9JSwsTMOGDdPGjRud2lQ9flu1aqUePXroyJEjCgoKqlZ3TcaMGeP0M46Kiqq2/6Xy3/fLL7/8hNs5/v1o69atOnr0qMLCwtS/f/+T7itJWr16tS699FKFh4crNDRUAwcO1NKlS53aVP4sFi5cqHHjxikmJkbh4eG6+eabq/1utmjRQmPGjHFaNnbsWAUGBlb7HTyd/Qx4K057AxqgL7/8Ui1atFC/fv1qXH/RRRepRYsW+uqrryRJ11xzjVq3bu1Yf++996p9+/YaO3asY1n79u0lSXPnztXll1+upKQk3X333UpMTNTmzZs1a9Ys3X333ZKk7777Tpdeeqlatmypxx9/XMeOHdOLL76oPn366Keffqr2wfr6669XixYtNGnSJC1dulQvvPCCjh49qnfeeUeS9O677zra/vjjj5o2bZr+9a9/KTY2VpKUkJBwwn2xcOFCff311ydcf6rnlqTf//73evvtt/Xb3/5Wf/7zn7Vs2TJNmjRJmzdv1qeffuq0vX79+mns2LGy2+3asGGDnnvuOR04cMDpg9Pzzz+v4cOHa9SoUSouLtYHH3yg6667TrNmzdKwYcNOWKs7jRs3Tm+99ZZuueUW/elPf9KuXbv00ksvafXq1frf//4nPz8/lzyP3W7X8OHDtWjRIo0dO1bt27fX+vXr9a9//Uvbtm3TZ5995pLnORvt2rXT3/72NxljtGPHDj377LO67LLLqoXc41XuvwsuuECTJk1Senq6nn/+ef3vf//T6tWrFRkZ6QgakydPVmJiou6//34FBgbq9ddf16BBgzR37lxddNFFatmypfr06aMZM2bo3nvvdXqeGTNmKCwsTFdeeeUZva65c+dq586duuWWW5SYmKiNGzdq2rRp2rhxo5YuXXrCcPjjjz/qsssuU0pKih577DGVlJTolVdeUZ8+fbRixQqdc845Z1THibz77rsaPXq0hgwZosmTJ6ugoEBTp05V3759tXr16mrvK1U9+uij1f6ocTKxsbGOYLlv3z49//zzuuyyy7R3715FRkbWqv7Kn+2ECRPUpk0bTZw4UYWFhXr55Zer7auNGzeqX79+Cg8P1wMPPCA/Pz+99tprGjBggBYsWKAePXo4bfuuu+5SZGSkHn/8cW3dulVTp07VL7/84ghgNXnsscf05ptv6r///a9T0Dyb/Qx4BQOgQcnKyjKSzJVXXnnSdsOHDzeSTE5OTrV1KSkpZvTo0dWWl5aWmtTUVJOSkmKOHj3qtM5utztud+7c2cTHx5vDhw87lq1du9bYbDZz8803O5Y99thjRpIZPny407buvPNOI8msXbu2Wg3Tp083ksyuXbuqrZs/f76RZObPn+9Y1qNHD3PppZcaSeaxxx474+des2aNkWR+//vfO7X7y1/+YiSZ77//3rGspv12ww03mODgYKdlBQUFTveLi4tNhw4dzCWXXOK0XJIZP358tdc5bNgwk5KS4ri/a9cuI8k888wz1dqed955pn///o77lfto5syZ1dpWGj16tNP2f/zxRyPJzJgxw6nd7Nmza1xe0/ZCQkKqLZ85c2a1n9e7775rbDab+fHHH53avvrqq0aS+d///udY5or9U6nyeDiV/v37O+1PY4x56KGHjCSTkZFxwscVFxeb+Ph406FDB3Ps2DHH8lmzZhlJ5tFHH3Wq1d/f32zbts3R7tChQyYmJsZ069bNsey1114zkszmzZudnic2NtbpOLz44ovNRRdd5FRP5fNMnz7dsez449IYY95//30jySxcuNCx7PjfwW7dupmIiAiTlpbmaLNt2zbj5+dnrr32Wseyyn186NAhx7IVK1ZUq8OY6sdtbm6uiYyMNLfffrtTu7S0NBMREeG0/Pjjd8OGDcZmszneB2p676jq+McbY8y0adOMJLN8+XLHspSUFDNs2LATbuf496PK+7GxsSYzM9PRrqZ9ddVVVxl/f3+zY8cOx7IDBw6YsLAwp59l5c+iW7dupri42LF8ypQpRpL5/PPPneqtPC4qj50XX3zRqeYz2c+At+K0N6CByc3NlSSFhYWdtF3l+pycnNPe9urVq7Vr1y7dc8891f76WfnXxYMHD2rNmjUaM2aMoqOjHes7deqk3/zmNzX2wowfP97p/h//+EdJOmmPzen45JNPtGLFCj399NMnbHOq5678ft999zm1+/Of/yxJjt6zSkVFRcrMzFRGRobmzp2r77//XgMHDnRqExQU5Lh99OhRZWdnq1+/fvrpp5+q1VdYWKjMzEynr5KSkhpfS0FBQbW2ZWVlNbbNzc1VZmamsrKyalxf1cy
"text/plain": [
"<Figure size 1000x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data=df2[['Ram', 'Battery', 'Display','Price', 'Inbuilt_memory']].copy()\n",
"data['Price'] = pd.to_numeric(data['Price'], errors='coerce')\n",
"# сначала разделение записей на 80% и 20%, где 80% - обучающая выборка\n",
"train_data, temp_data = train_test_split(data, test_size=0.2, random_state=42)\n",
"\n",
"# потом разделение остальных 20% поровну на контрольную и тестовую выборки\n",
"val_data, test_data = train_test_split(temp_data, test_size=0.5, random_state=42)\n",
"\n",
"# Проверка размеров выборок\n",
"print(\"Размер обучающей выборки:\", len(train_data))\n",
"print(\"Размер контрольной выборки:\", len(val_data))\n",
"print(\"Размер тестовой выборки:\", len(test_data))\n",
"\n",
"\n",
"sort_train_data=train_data.sort_values(by='Price')['Price'].values\n",
"plt.figure(figsize=(10, 5))\n",
"plt.plot(sort_train_data)\n",
"plt.title('Отсортированные цены в обучающей выборке')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## **ДАТАСЕТ 3**\n",
"\n",
"https://www.kaggle.com/datasets/shivam2503/diamonds\n",
"\n",
"Проблемная оласть: цены на бриллианты\n",
"\n",
"Объект наблюдения: бриллиант\n",
"\n",
"Атрибуты:\n",
"* carat: Вес в каратах\n",
"* cut: Качество огранки\n",
"* color: Цвет\n",
"* clarity: Чистота\n",
"* depth: Процент глубины \n",
"* table: Процент ширины\n",
"* price: Цена в долларах\n",
"* x: Длина в миллиметрах\n",
"* y: Ширина в миллиметрах\n",
"* z: Глубина в миллиметрах\n",
"\n",
"Объект только 1, но в нём есть связь между ценой и всеми остальными характеристиками (чем лучше какая-либо характеристика, тем дороже бриллиант)\n",
"\n",
"Бизнес-цель: Предсказать оптимальную стоимость бриллианта на основе его характеристик. Эффект для бизнеса: ювелиры смогут предлагать конкурентоспособные цены, что потенциально увеличить прибыль. \n",
"\n",
"Цель технического проекта: Построить модель машинного обучения для прогнозирования цены бриллианта на основе его характеристик. Вход: характеристики бриллианта (вес, огранка, цвет, чистота, размеры). Целевой признак: цена"
]
},
{
"cell_type": "code",
"execution_count": 290,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['Unnamed: 0', 'carat', 'cut', 'color', 'clarity', 'depth', 'table',\n",
" 'price', 'x', 'y', 'z'],\n",
" dtype='object')\n"
]
}
],
"source": [
"df3 = pd.read_csv(\"..//static//csv//diamonds.csv\")\n",
"print(df3.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Оценка всех числовых признаков показывает, что в датасете довольно много шума. В большинстве своём он полезные, т.к. бриллианты могут иметь абсолютно разные значения характеристик, и их важно учитывать. Однако есть одиночные выбросы, из-за которых модель может некорректно обучиться. Это данные, у которых значение:\n",
"* по параметру table больше 90\n",
"* по параметру x около 0\n",
"* по параметру y значение более 30 и около 0\n",
"* по параметру z значение более 30\n",
"\n",
"Имеет смысл удалить данные выбросы.\n",
"\n",
"Большинство данных смещено в следующую сторону:\n",
"* меньше 3 карат\n",
"* по проценту глубины между 50 и 70\n",
"* по проценту ширины между 50 и 60\n",
"* по длине между 4 и 9 мм\n",
"* по ширине между 5 и 10 мм\n",
"* по глубине между 2 и 5 мм "
]
},
{
"cell_type": "code",
"execution_count": 291,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKUAAAMWCAYAAAAgRDUeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADAF0lEQVR4nOzdeVxWdf7//+cFyCaLS8qSKOQGueQ6Kuoo5kQWjoTWaFraWPp1qXFtpElzS9I0rdzSKbXMajSjsjTNj6YmmmJOMm5okpqATQm4gl6c3x/+OOMVqKjAdV3wuN9u5xbnvF/nfV7X+XzmvL1e1znvYzEMwxAAAAAAAABQhlzsnQAAAAAAAAAqHopSAAAAAAAAKHMUpQAAAAAAAFDmKEoBAAAAAACgzFGUAgAAAAAAQJmjKAUAAAAAAIAyR1EKAAAAAAAAZY6iFAAAAAAAAMocRSkAAAAAAACUOYpSwB2yWCyaOHFimRxr3bp1atasmTw9PWWxWJSVlVUmxwUAFFbRrv8TJ06UxWIp8+MW6Ny5sxo3bmy34wPAjVS0MeFWhIaGKiYm5qZxmzdvlsVi0ebNm0s/KTgMilJwWEuXLpXFYrFZatasqaioKK1du9be6d2x/fv3a+LEiUpLSytW/K+//qrHHntMXl5emjdvnt577z1Vrly5dJN0IKdOndLEiRO1d+9ee6cCoJRx/bdVka7/XOsB/B5jgq2yHhO2b9+uiRMnOnzhC87Lzd4JADczefJkhYWFyTAMZWZmaunSpXrooYf0+eefF6vi7qj279+vSZMmqXPnzgoNDb1p/K5du3T27FlNmTJFXbt2Lf0EHcypU6c0adIkhYaGqlmzZvZOB0AZ4Pp/VUW6/nOtB3A9jAlXlfWYsH37dk2aNEkDBgxQlSpVSv14qHgoSsHhdevWTa1atTLXBw4cqICAAH3wwQdOPQDdqtOnT0tSiQ4G58+ft9uv7ZcuXZK7u7tcXLhhE0DRuP5fVd6u/wBwOxgTrmJMQHnDt0E4nSpVqsjLy0tubrY11fPnz2v06NEKCQmRh4eHGjZsqJkzZ8owDEnSxYsXFR4ervDwcF28eNHc77ffflNQUJAiIyNltVolSQMGDJCPj49+/PFHRUdHq3LlygoODtbkyZPN/m7k+++/V7du3eTn5ycfHx/df//92rFjh9m+dOlSPfroo5KkqKgo81bk6z0/3blzZ/Xv31+S1Lp1a1ksFg0YMMBsX7lypVq2bCkvLy/ddddd6tevn37++WebPgo+09GjR/XQQw/J19dXffv2veHn+PnnnzVw4EAFBwfLw8NDYWFhGjJkiPLy8sxzN2bMGDVp0kQ+Pj7y8/NTt27d9O9//9umn4Lnwz/88EO9+OKLuvvuu+Xt7a2cnJxi9bF582a1bt1akvTUU0+Z52vp0qU3zB9A+cL1v+yu/9u2bVPr1q3l6empunXr6q233rpu7PLly80cqlWrpt69e+vEiROFPkfjxo2VnJysyMhIeXl5KSwsTAsXLjRjinut379/v6KiouTt7a27775bM2bMuOFnAVA+MSaU/pgwceJEjR07VpIUFhZm5lfwqOGSJUvUpUsX1axZUx4eHrr33nu1YMGC656P9evXm3Nh3XvvvVq9evUNz1+BnTt36sEHH5S/v7+8vb3VqVMnffvtt8XaF07AABzUkiVLDEnG119/bfzyyy/G6dOnjZSUFGPw4MGGi4uLsX79ejM2Pz/f6NKli2GxWIynn37amDt3rtG9e3dDkjFixAgzbseOHYarq6sxcuRIc1vv3r0NLy8v49ChQ+a2/v37G56enkb9+vWNJ554wpg7d64RExNjSDLGjx9vk6ck46WXXjLXU1JSjMqVKxtBQUHGlClTjFdeecUICwszPDw8jB07dhiGYRhHjx41nnvuOUOS8cILLxjvvfee8d577xkZGRlFnov169cbgwYNMiQZkydPNt577z1j+/btNuepdevWxuzZs41x48YZXl5eRmhoqHHmzBmbz+Th4WHUrVvX6N+/v7Fw4ULj3Xffve75//nnn43g4GDD29vbGDFihLFw4UJj/PjxRkREhNnvrl27jLp16xrjxo0z3nrrLWPy5MnG3Xffbfj7+xs///yz2demTZsMSca9995rNGvWzHjttdeMhIQE4/z588XqIyMjw5g8ebIhyRg0aJB5vo4ePXrd/AE4L67//2OP6/8PP/xgeHl5GbVr1zYSEhKMKVOmGAEBAUbTpk2N3//TcerUqYbFYjH+8pe/GPPnzzcmTZpk3HXXXYVy6NSpkxEcHGzUrFnTGD58uPHGG28YHTp0MCQZb7/9tmEYN7/WF/QREhJi/O1vfzPmz59vdOnSxZBkfPnll9f9PACcG2PC/5T1mPDvf//b6NOnjyHJmD17tpnfuXPnDMMwjNatWxsDBgwwZs+ebbz55pvGAw88YEgy5s6da9NPnTp1jAYNGhhVqlQxxo0bZ7z22mtGkyZNCv3fr+A7w6ZNm8xtGzduNNzd3Y127doZs2bNMmbPnm00bdrUcHd3N3bu3Flk3nAuFKXgsAourL9fPDw8jKVLl9rEJiYmGpKMqVOn2mzv1auXYbFYjCNHjpjb4uPjDRcXF2PLli3GypUrDUnGnDlzbPbr37+/Icl49tlnzW35+fnGww8/bLi7uxu//PKLuf33A1BsbKzh7u5uUzA5deqU4evra/zxj380txUc+9qLbnHOx65du8xteXl5Rs2aNY3GjRsbFy9eNLevWbPGkGRMmDCh0GcaN25csY735JNPGi4uLjbHK5Cfn28YhmFcunTJsFqtNm3Hjh0zPDw8jMmTJ5vbCgaYe+65x7hw4YJNfHH72LVrlyHJWLJkSbHyB+C8uP4XfT7K6vofGxtreHp6Gj/99JO5bf/+/Yarq6tNUSotLc1wdXU1Xn75ZZv99+3bZ7i5udls79SpkyHJmDVrlrktNzfXaNasmVGzZk0jLy/PMIwbX+sL+rj2y1Nubq4RGBho9OzZs1ifDYDzYUwo+nyU1Zjw6quvGpKMY8eOFWr7/b/rDcMwoqOjjXvuucdmW506dQxJxscff2xuy87ONoKCgozmzZub235flMrPzzfq169vREdHm98/Co4bFhZm/OlPfyrWZ4Bj4/E9OLx58+Zpw4YN2rBhg5YvX66oqCg9/fTTNrd7fvnll3J1ddVzzz1ns+/o0aNlGIbNmzkmTpyoRo0aqX///ho6dKg6depUaL8Cw4cPN/+2WCwaPny48vLy9PXXXxcZb7VatX79esXGxuqee+4xtwcFBenxxx/Xtm3blJOTc1vnoSi7d+/W6dOnNXToUHl6eprbH374YYWHh+uLL74otM+QIUNu2m9+fr4SExPVvXt3m2f3CxS8EtzDw8OcE8pqterXX3+Vj4+PGjZsqD179hTar3///vLy8rLZdqt9AKg4uP5fX2ld/61Wq7766ivFxsaqdu3a5vaIiAhFR0fbxK5evVr5+fl67LHH9N///tdcAgMDVb9+fW3atMkm3s3NTYMHDzbX3d3dNXjwYJ0+fVrJycnF+tw+Pj7q16+fTR9/+MMf9OOPPxZrfwDOizHh+kprTLiZa/9dn52drf/+97/q1KmTfvzxR2VnZ9vEBgcH65FHHjHX/fz89OSTT+r7779XRkZGkf3v3btXqampevzxx/Xrr7+a48z58+d1//33a8uWLcrPz7/jzwH7YqJzOLw//OEPNoWRPn36qHnz5ho+fLhiYmLk7u6un376ScHBwfL19bXZNyIiQpL0008/mdvc3d31zjvvmHNlLFmyxCyyXMvFxcVmEJGkBg0aSNJ1X9n6yy+/6MKFC2rYsGGhtoiICOXn5+vEiRNq1KhR8T78TRR8rqKOFx4erm3bttlsc3NzU61atW7a7y+//KKcnBw1btz4hnH5+fl6/fXXNX/+fB07dsx8/l6SqlevXig+LCzsjvsAUHFw/b++0rz+X7x4UfXr1y/U1rBhQ3355ZfmempqqgzDKDJWkipVqmSzHhwcXGgi3WvPa9u2bW+aX61atQr936xq1ar64YcfbrovAOfGmHB9pTUm3My3336
"text/plain": [
"<Figure size 1200x800 with 7 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"numeric_cols = df3.select_dtypes(include=['number']).columns\n",
"\n",
"#все столбцы, кроме Unnamed (с индексом)\n",
"numeric_cols = [col for col in numeric_cols if 'Unnamed' not in col]\n",
"\n",
"# столбец 'id' также исключен\n",
"numeric_cols = [col for col in numeric_cols if col != 'id']\n",
"\n",
"plt.figure(figsize=(12, 8))\n",
" \n",
"\n",
"for i, col in enumerate(numeric_cols, 1):\n",
" if col == 'id':\n",
" continue\n",
" Q1 = df3[col].quantile(0.25)\n",
" Q3 = df3[col].quantile(0.75)\n",
" IQR = Q3 - Q1\n",
" lower_bound = Q1 - 1.5 * IQR\n",
" upper_bound = Q3 + 1.5 * IQR\n",
" outliers = df3[col][(df3[col] < lower_bound) | (df3[col] > upper_bound)]\n",
" plt.subplot(len(numeric_cols) // 3 + 1, 3, i) \n",
" plt.boxplot(x=df3[col])\n",
" plt.title(f'Boxplot for {col}')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"По числовым данным видно, что цена имеет прямую зависимость от веса и размеров бриллианта. Такая корреляции между столбцами carat, x, y, z и price является естественной и ожидаемой, так как чем больше бриллиант, тем он дороже"
]
},
{
"cell_type": "code",
"execution_count": 292,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Просачивание данных: Высокая корреляция (0.92) между столбцами 'carat' и 'price'\n",
"Просачивание данных: Высокая корреляция (0.88) между столбцами 'x' и 'price'\n",
"Просачивание данных: Высокая корреляция (0.87) между столбцами 'y' и 'price'\n",
"Просачивание данных: Высокая корреляция (0.86) между столбцами 'z' и 'price'\n"
]
}
],
"source": [
"#Проверка кореляции\n",
"\n",
"price_col = 'price' # Имя столбца с ценой\n",
"for col1 in numeric_cols:\n",
" if col1 != price_col:\n",
" correlation = df3[col1].corr(df3[price_col])\n",
" if abs(correlation) > 0.7:\n",
" print(f\"Просачивание данных: Высокая корреляция ({correlation:.2f}) между столбцами '{col1}' и '{price_col}'\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Набор данных информативный, т.к. содержит основные характеристики бриллиантов, которые влияют на их цену\n",
"\n",
"Степень покрытия высокая, т.к. содержатся сведения о более 50000 бриллиантах\n",
"\n",
"Все метки согласованы, но 'depth' и 'x', 'y', 'z' могли быть названы немного подробнее"
]
},
{
"cell_type": "code",
"execution_count": 293,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Количество записей: 53940\n"
]
}
],
"source": [
"print(f\"Количество записей: {df3.shape[0]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Столбцов со значениями null нет, поэтому решать проблему пропущенных данных не надо"
]
},
{
"cell_type": "code",
"execution_count": 294,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Столбцы с null: []\n"
]
}
],
"source": [
"columns_with_nulls = []\n",
"for col in df3.columns:\n",
" if df3[col].isnull().sum() > 0: \n",
" columns_with_nulls.append(col)\n",
"print(f\"Столбцы с null: {columns_with_nulls}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**РАЗБИЕНИЕ НА ВЫБОРКИ**\n",
"\n",
"train_data - обучающая выборка\n",
"\n",
"val_data - контрольная выборка\n",
"\n",
"test_data - тестовая выборка"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Обучающая выборка сбалансрована, т.к. график идёт достаточно ровно и нет \"перекоса\" количества бриллиантов в каком-то диапазоне цен. Поэтому аугментация данных не требуется "
]
},
{
"cell_type": "code",
"execution_count": 295,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Ideal' 'Premium' 'Good' 'Very Good' 'Fair']\n",
"['E' 'I' 'J' 'H' 'F' 'G' 'D']\n",
"['SI2' 'SI1' 'VS1' 'VS2' 'VVS2' 'VVS1' 'I1' 'IF']\n",
"Размер обучающей выборки: 43152\n",
"Размер контрольной выборки: 5394\n",
"Размер тестовой выборки: 5394\n"
]
},
{
"data": {
"text/plain": [
"Text(0.5, 1.0, 'Отсортированные цены в обучающей выборке')"
]
},
"execution_count": 295,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1AAAAHDCAYAAAAqdvv1AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABxj0lEQVR4nO3deVxU5f4H8M8MMDNsM+wMKAJuuIG7iOaWJipu7S6llmWWlmnXzOqW1r3X0l/d7Faat9IWy7LFSs3EXRM3FBEX3MANhkWWYWeW5/cHcq4joKDgYfm8X695yZzznHO+53AY+fCc8xyFEEKAiIiIiIiIbkkpdwFEREREREQNBQMUERERERFRNTFAERERERERVRMDFBERERERUTUxQBEREREREVUTAxQREREREVE1MUARERERERFVEwMUERERERFRNTFAERERNQB5eXlITk5GQUGB3KVQLcvJycHZs2dhNpvlLoWIqoEBioiIqB4SQmDFihXo3bs3nJycoNVqERwcjG+++Ubu0hqEy5cvY9WqVdL75ORkrF69Wr6CrmMymbB48WJ07twZarUa7u7uaNOmDbZu3Sp3aURUDQohhJC7CCKqO8ePH8eiRYuwfft2ZGZmwtPTE4MGDcKrr76Kjh07yl0eEVVh/Pjx+P777zF58mSMHDkSOp0OCoUCYWFh8Pb2lru8eu/KlSto27Ytfv75Z4SEhODll1+Gh4cHli9fLmtdJSUlGDp0KPbt24fp06dj8ODBcHJygp2dHbp37w6tVitrfUR0a/ZyF0BEdefnn3/G+PHj4eHhgalTpyI4OBjJycn4/PPP8eOPP2LNmjW4//775S6TiG7w1Vdf4fvvv8c333yDCRMmyF1Og9SsWTM8/fTTGDZsGADAz88PO3bskLcoAO+++y7279+PP//8EwMHDpS7HCK6DeyBImqkzp07h7CwMLRo0QK7du2y+Yt1ZmYm+vXrh0uXLiE+Ph4tW7aUsVIiulFoaCjCwsLqzSVnDdm5c+eQmZmJTp06wdnZWdZazGYzfHx88Oyzz+Kf//ynrLUQ0e3jPVBEjdSSJUtQWFiIFStWVLjcx8vLC59++ikKCgqwePFiAMCCBQugUChu+rr+r7f79+/HiBEj4O7uDmdnZ4SFhWHp0qU229m2bRv69esHZ2dnuLm5YcyYMTh58qRNm/Ltnjp1Co888gi0Wi08PT0xa9YsFBcXS+1uVVv5X3J37NhRoVYAiIqKgkKhwIIFC2q8baDsF5+3334brVq1glqtRlBQEF599VWUlJTYtAsKCpJqUiqV0Ov1ePTRR3Hx4kWbdv/3f/+HPn36wNPTE46OjujevTt+/PHHCt9HhUKBmTNnVpg+cuRIBAUFSe+Tk5OhUCjwf//3fxXadurUyeYv3eXHqLLtlZsyZYrN+gHAarXigw8+QMeOHaHRaODr64tnnnkG2dnZVa7n+vW5uLhUmP7jjz9W+v0qKSnBm2++idatW0OtViMgIAAvv/xyheNdG8enXPn5cCsDBw60Ofe8vLwQFRWFhISEWy4LAGvXrkX37t3h6OgILy8vPPbYY7hy5Yo0v6CgAAkJCQgICEBUVBS0Wi2cnZ0xcOBA7N69W2p3/vx5KBQK/Pvf/66wjb1790KhUOC7776Tar6xt6P8mFx/n1B8fDymTJmCli1bQqPRQK/X48knn8TVq1dtll21ahUUCgWSk5OlaX/++Sf69OkDJycn6HQ6jBw5ssIxKT/GmZmZ0rRDhw5VqAOoeN6W++OPP6TPFVdXV0RFReH48eM2ba4/f1u1aoXw8HBkZWXB0dGxQt2VmTJlis332N3dvcLxB8p+3keOHFnlem78PEpMTER2djZcXV0xYMCAmx4rADhy5AiGDx8OrVYLFxcXDB48GPv27bNpU/692LVrF5555hl4enpCq9Vi0qRJFX42g4KCMGXKFJtp06ZNg0ajqfAzWJ3jTNRU8RI+okbq999/R1BQEPr161fp/P79+yMoKAgbNmwAADzwwANo3bq1NH/27Nlo3749pk2bJk1r3749ACA6OhojR46En58fZs2aBb1ej5MnT2L9+vWYNWsWAGDLli0YPnw4WrZsiQULFqCoqAj/+c9/0LdvXxw+fLjCL+ePPPIIgoKCsGjRIuzbtw8ffvghsrOz8dVXXwEAvv76a6nt7t27sWLFCvz73/+Gl5cXAMDX17fKY7Fr1y5s3Lixyvm32jYAPPXUU/jyyy/x0EMP4aWXXsL+/fuxaNEinDx5Er/88ovN+vr164dp06bBarUiISEBH3zwAVJSUmx++Vq6dClGjx6NiRMnorS0FGvWrMHDDz+M9evXIyoqqspa5fTMM89g1apVeOKJJ/DCCy8gKSkJH330EY4cOYK//voLDg4OtbIdq9WK0aNHY8+ePZg2bRrat2+PY8eO4d///jdOnz6NdevW1cp27kS7du3w2muvQQiBc+fO4f3338eIESMqBOUblR+/nj17YtGiRUhLS8PSpUvx119/4ciRI3Bzc5PCyrvvvgu9Xo+5c+dCo9Hgv//9L4YMGYLo6Gj0798fLVu2RN++fbF69WrMnj3bZjurV6+Gq6srxowZU6P9io6Oxvnz5/HEE09Ar9fj+PHjWLFiBY4fP459+/ZVGTB3796NESNGIDAwEG+++SZMJhM++eQT9O3bFwcPHkTbtm1rVEdVvv76a0yePBmRkZF49913UVhYiGXLluGee+7BkSNHKnyuXO+NN96o8IeRm/Hy8pLC6eXLl7F06VKMGDECly5dgpub223VX/69nT9/Ptq0aYOFCxeiuLgYH3/8cYVjdfz4cfTr1w9arRYvv/wyHBwc8Omnn2LgwIHYuXMnwsPDbdY9c+ZMuLm5YcGCBUhMTMSyZctw4cIFKcRV5s0338Tnn3+O77//3ias3slxJmoSBBE1Ojk5OQKAGDNmzE3bjR49WgAQRqOxwrzAwEAxefLkCtPNZrMIDg4WgYGBIjs722ae1WqVvu7SpYvw8fERV69elaYdPXpUKJVKMWnSJGnam2++KQCI0aNH26zrueeeEwDE0aNHK9SwcuVKAUAkJSVVmLd9+3YBQGzfvl2aFh4eLoYPHy4AiDfffLPG246LixMAxFNPPWXT7m9/+5sAILZt2yZNq+y4TZgwQTg5OdlMKywstHlfWloqOnXqJO69916b6QDEjBkzKuxnVFSUCAwMlN4nJSUJAGLJkiUV2nbs2FEMGDBAel9+jNauXVuhbbnJkyfbrH/37t0CgFi9erVNu02bNlU6vbL1OTs7V5i+du3aCt+vr7/+WiiVSrF7926btsuXLxcAxF9//SVNq43jU678fLiVAQMG2BxPIYR49dVXBQCRnp5e5XKlpaXCx8dHdOrUSRQVFUnT169fLwCIN954w6ZWlUolTp8+LbXLyMgQnp6eonv37tK0Tz/9VAAQJ0+etNmOl5eXzXk4aNAg0b9/f5t6yrezcuVKadqN56UQQnz33XcCgNi1a5c07cafwe7duwudTicMBoPU5vTp08LBwUE8+OCD0rTyY5yRkSFNO3jwYIU6hKh43ubl5Qk3Nzfx9NNP27QzGAxCp9PZTL/x/E1ISBBKpVL6HKjss+N6Ny4vhBArVqwQAMSBAwekaYGBgSIqKqrK9dz4eVT+3svLS2RmZkrtKjtWY8eOFSqVSpw7d06alpKSIlxdXW2+l+Xfi+7du4vS0lJp+uLFiwUA8euvv9rUW35elJ87//nPf2xqrslxJmqqeAkfUSOUl5cHAHB1db1pu/L5RqOx2us+cuQIkpKS8OKLL1b4K2z5XzlTU1MRFxeHKVOmwMPDQ5ofFhaG++67r9LeoBkzZti8f/755wHgpj1H1fHzzz/j4MGDeOedd6psc6ttl/87Z84cm3YvvfQSAEi9eOVKSkqQmZmJ9PR0REdHY9u2bRg8eLBNG0dHR+nr7Oxs5Obmol+/fjh8+HCF+oqLi5GZmWnzMplMle5LYWFhhbYWi6XStnl5ecjMzEROTk6l86+3du1a6HQ63HfffTbr7t69O1xcXLB9+/ZbrqO61q5di/bt26Ndu3Y227r33ns
"text/plain": [
"<Figure size 1000x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#приведение категориальных данных в числовые\n",
"print(df3['cut'].unique())\n",
"cut_mapping = {'Fair': 1, \n",
" 'Good': 2, \n",
" 'Very Good': 3, \n",
" 'Premium': 4, \n",
" 'Ideal': 5}\n",
"df3['cut'] = df3['cut'].map(cut_mapping)\n",
"\n",
"print(df3['color'].unique())\n",
"color_mapping = {'D': 1, \n",
" 'E': 2, \n",
" 'F': 3, \n",
" 'G': 4, \n",
" 'H': 5, \n",
" 'I': 6, \n",
" 'J': 7} \n",
"df3['color'] = df3['color'].map(color_mapping)\n",
"\n",
"\n",
"print(df3['clarity'].unique())\n",
"clarity_mapping = {\n",
" 'IF': 1, \n",
" 'VVS1': 2, \n",
" 'VVS2': 3, \n",
" 'VS1': 4, \n",
" 'VS2': 5, \n",
" 'SI1': 6, \n",
" 'SI2': 7, \n",
" 'I1': 8} \n",
"df3['clarity'] = df3['clarity'].map(clarity_mapping)\n",
"\n",
"\n",
"\n",
"data=df3.copy()\n",
"\n",
"\n",
"# сначала разделение записей на 80% и 20%, где 80% - обучающая выборка\n",
"train_data, temp_data = train_test_split(data, test_size=0.2, random_state=42)\n",
"\n",
"# потом разделение остальных 20% поровну на контрольную и тестовую выборки\n",
"val_data, test_data = train_test_split(temp_data, test_size=0.5, random_state=42)\n",
"\n",
"# Проверка размеров выборок\n",
"print(\"Размер обучающей выборки:\", len(train_data))\n",
"print(\"Размер контрольной выборки:\", len(val_data))\n",
"print(\"Размер тестовой выборки:\", len(test_data))\n",
"\n",
"\n",
"sort_train_data=train_data.sort_values(by='price')['price'].values\n",
"plt.figure(figsize=(10, 5))\n",
"plt.plot(sort_train_data)\n",
"plt.title('Отсортированные цены в обучающей выборке')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "aimenv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}