1962 lines
393 KiB
Plaintext
1962 lines
393 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Датасет 1\n",
|
|||
|
"https://www.kaggle.com/datasets/antonkozyriev/game-recommendations-on-steam"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Проблемная область: аналитика рынка видеоигр (в данном случае на площадке Steam)\n",
|
|||
|
"\n",
|
|||
|
"Объект наблюдения: игры на площадке steam. Атрибутами являются характеристики игры (название, дата выпуска, цена, наличие игры на разных игровых платформах (пк, консоли)) и её оценка игроками (рейтинг, отзывы)\n",
|
|||
|
"В данном датасете только 1 объект, но можно указать следующую связь: Игра связана со множеством отзывов\n",
|
|||
|
"\n",
|
|||
|
"Бизнес-цель: Определить, как основные характеристики влияют на оценку игры steam, чтобы разработчики и издатели игр знали, во что следует вкладывать больше временных и денежных ресуров. Эффект для бизнеса: увеличение шансов на успех игры, снижение рисков финансовых потерь\n",
|
|||
|
"\n",
|
|||
|
"Цель технического проекта: построить модель машинного обучения, которая предскажет, какую оценку от игроков получит игра.\n",
|
|||
|
"Вход: дата выпуска игры (чтобы возможно найти закономерности между месяцем выпуска игры и её высокой оценкой), цена игры, наличие игры на windows, linux и mac. Целевой признак: рейтинг\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 296,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Index(['app_id', 'title', 'date_release', 'win', 'mac', 'linux', 'rating',\n",
|
|||
|
" 'positive_ratio', 'user_reviews', 'price_final', 'price_original',\n",
|
|||
|
" 'discount', 'steam_deck'],\n",
|
|||
|
" dtype='object')\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import pandas as pd\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"df1 = pd.read_csv(\"..//static//csv//games.csv\")\n",
|
|||
|
"\n",
|
|||
|
"print(df1.columns)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Оценка всех числовых признаков показывает, что в датасете довольно много выбросов. \n",
|
|||
|
"\n",
|
|||
|
"По столбцу positive_ratio есть игры, у которых очень мало позитивных отзывов, однако в случае игр важно знать и игры, у которых больше негативных отзывов, чем положительных, т.е. это полезный шум. Данные же смещены в сторону игр с бОльшим количеством положительных отзывов (более 60%), чем отрицательных. Однако данный столбец может влиять на столбец со строковыми значениями rating, поэтому в дальнейшем его можно считать просто шумом \n",
|
|||
|
"\n",
|
|||
|
"В столбце user_reviews есть серьёзный выброс с крайне большим количеством отзывов, однако сам столбец можно считать шумом, т.к. в данной ситуации количество отзывов не так важно, как рейтинг игры. \n",
|
|||
|
"\n",
|
|||
|
"Столбец price_final зависит от столбцов price_original и discount. В данном случае не стоит учитывать скидки на игры и их цену после скидки, поэтому столбцы price_final и discount можно считать шумом.\n",
|
|||
|
"\n",
|
|||
|
"В столбце price_original есть много выбросов, которые находятся выше средних значений. Для анализа желательны разные цены игр, однако игры с ценами более 150$ можно удалить, т.к. вероятность настолько дорогой игры крайне мала и из-за таких игр модель может обучиться некорректно. Данные же в столбце смещены в сторону игр до 25$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 297,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKUAAAMWCAYAAAAgRDUeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADNkElEQVR4nOzde3zP9f//8ft7m713ntM2Ww5byJxDcpxD5BCykBSFSJ9M5RA1lVOykNLBIT6fTERFqPTJoXJYmWJ9KoSQU9gobWMYttfvD7+9vt42bLz3fr83t+vl8rq01/P1eL9ej9drej33erxfr+fLYhiGIQAAAAAAAMCB3JydAAAAAAAAAG49FKUAAAAAAADgcBSlAAAAAAAA4HAUpQAAAAAAAOBwFKUAAAAAAADgcBSlAAAAAAAA4HAUpQAAAAAAAOBwFKUAAAAAAADgcBSlAAAAAAAA4HAUpeBwFotF48aNc8i2Vq1apTvvvFNeXl6yWCxKTU11yHYdqSDHMzw8XP369SvUfJxp/fr1slgsWr9+vbNTAWAn9Bm3Jkf+3gHcWuhXrs0Vzr8LFixQZGSkSpQooZIlS0qSWrVqpVatWhXqdseNGyeLxVKo20BuFKWKkfj4eFksFpspODhYrVu31ldffeXs9G7ab7/9pnHjxunAgQP5iv/777/Vs2dPeXt7a8aMGVqwYIF8fX0LN0kXsGnTJo0bN65IdHo3aubMmYqPj3d2GkCRRp9h61btMwDAXuhXbNGv3Jhdu3apX79+qly5subOnas5c+Y4OyUUMg9nJwD7mzBhgiIiImQYhlJSUhQfH6/77rtPX3zxhTp37uzs9G7Yb7/9pvHjx6tVq1YKDw+/bvyWLVt06tQpvfLKK2rbtm3hJ+gkZ8+elYfH//2vvGnTJo0fP179+vUzv1nIsXv3brm5Ff1a9MyZM1W2bNlcd321aNFCZ8+elaenp3MSA4og+oxLbpU+o6i4sm8DUHTQr1xSVPsVZ59/169fr+zsbL311luqUqWK2b5mzRqn5YTCRW9fDHXs2FF33XWXOT9gwACFhIRo8eLFRbojKKjjx49LUq7CzM3IyMhwuW84vLy88h1rtVoLMZMbYxiGzp07J29v75tel5ubW4GOBwD6jBy3Sp9xpezsbJ0/f/6Gz51nzpyRj4+PnbMqWN8GwLXQr1xSlPqVy/sCZ59/r3bc+NK5+Cr6t0zgukqWLClvb+9cFe+MjAyNGDFCFSpUkNVqVbVq1fT666/LMAxJl6rkkZGRioyM1NmzZ83PnTx5UqGhoWratKmysrIkSf369ZOfn5/++OMPtW/fXr6+vgoLC9OECRPM9V3L//73P3Xs2FEBAQHy8/NTmzZttHnzZnN5fHy8HnzwQUlS69atzVuCrzZ2UKtWrdS3b19JUsOGDWWxWGzuqlmyZIkaNGggb29vlS1bVn369NGRI0ds1pGzT/v27dN9990nf39/9e7d+6r7kPMM8q5du9SzZ08FBASoTJkyevbZZ3Xu3Dmb2IsXL+qVV15R5cqVZbVaFR4ertGjRyszM9MmbuvWrWrfvr3Kli0rb29vRURE6PHHH7eJufy573HjxmnkyJGSpIiICPM45dxmfPmYUlu3bpXFYtH8+fNz7cvq1atlsVi0cuVKs+3IkSN6/PHHFRISIqvVqpo1a+r999+/6vG4mvDwcHXu3FmrV6/WXXfdJW9vb7333nuSpHnz5umee+5RcHCwrFaratSooVmzZuX6/I4dO7RhwwZz/3KeL7/amFL5+X0DuIQ+wzF9Rr9+/fL8pj2v8SzWrl2r5s2bq2TJkvLz81O1atU0evRom5jMzEyNHTtWVapUkdVqVYUKFTRq1Khc/YrFYtGQIUP04YcfqmbNmrJarVq1atVV87xcq1atVKtWLSUlJalFixby8fEx88jP9mvVqqXWrVvnWm92drZuu+029ejRwybPK8c0uV4/ZBiGypYtq+HDh9usu2TJknJ3d7d5rH3y5Mny8PDQ6dOnJUnJycnq37+/ypcvL6vVqtDQUHXt2jXfj+kAuDr6Fde7FrlWX3C18++AAQMUFhYmq9WqiIgIPfXUUzp//rwZk5qaqqFDh5q/zypVqmjy5MnKzs6+as5XCg8P19ixYyVJQUFBNrlcOaZUzt/9n3zyiV599VWVL19eXl5eatOmjfbu3Wuz3oSEBD344IOqWLGi2UcNGzbM5t8VnIc7pYqhtLQ0/fXXXzIMQ8ePH9c777yj06dPq0+fPmaMYRi6//77tW7dOg0YMEB33nmnVq9erZEjR+rIkSN688035e3trfnz56tZs2Z68cUX9cYbb0iSYmJilJaWpvj4eLm7u5vrzMrKUocOHdS4cWNNmTJFq1at0tixY3Xx4kVNmDDhqvnu2LFDUVFRCggI0KhRo1SiRAm99957atWqlTZs2KBGjRqpRYsWeuaZZ/T2229r9OjRql69uiSZ/73Siy++qGrVqmnOnDnmLcSVK1eWdKlT6d+/vxo2bKi4uDilpKTorbfe0vfff6///e9/NlX5ixcvqn379mrevLlef/31fH0b3LNnT4WHhysuLk6bN2/W22+/rX/++UcffPCBGTNw4EDNnz9fPXr00IgRI/TDDz8oLi5OO3fu1PLlyyVd+pagXbt2CgoK0gsvvKCSJUvqwIEDWrZs2VW33a1bN/3+++9avHix3nzzTZUtW1bSpZP6le666y7dfvvt+uSTT8xOM8fHH3+sUqVKqX379pKklJQUNW7c2OzAgoKC9NVXX2nAgAFKT0/X0KFDr3tcLrd79249/PDDevLJJ/XEE0+oWrVqkqRZs2apZs2auv/+++Xh4aEvvvhCgwcPVnZ2tmJiYiRJ06dP19NPPy0/Pz+9+OKLkqSQkJCrbqsgv2/gVkSf4dw+43p27Nihzp07q06dOpowYYKsVqv27t2r77//3ozJzs7W/fffr++++06DBg1S9erVtW3bNr355pv6/ffftWLFCpt1fvvtt/rkk080ZMgQlS1bNl+PoeT4+++/1bFjR/Xq1Ut9+vRRSEhIvrf/0EMPady4cUpOTla5cuXMdX733Xc6evSoevXqddXt5qcfslgsatasmTZu3Gh+7tdff1VaWprc3Nz0/fffq1OnTpIuXaDUq1dPfn5+kqTu3btrx44devrppxUeHq7jx49r7dq1OnToUIGODwD6Fcn1r0Wk/PcFR48e1d13363U1FQNGjRIkZGROnLkiJYuXaozZ87I09NTZ86cUcuWLXXkyBE9+eSTqlixojZt2qTY2FgdO3ZM06dPv27e0qW/8z/44AMtX75cs2bNkp+fn+rUqXPNz7z22mtyc3PTc889p7S0NE2ZMkW9e/fWDz/8YMYsWbJEZ86c0VNPPaUyZcroxx9/1DvvvKM///xTS5YsyVduKEQGio158+YZknJNVqvViI+Pt4ldsWKFIcmYOHGiTXuPHj0Mi8Vi7N2712yLjY013NzcjI0bNxpLliwxJBnTp0+3+Vzfvn0NScbTTz9ttmVnZxudOnUyPD09jRMnTpjtkoyxY8ea89HR0Yanp6exb98+s+3o0aOGv7+/0aJFC7MtZ9vr1q0r0PHYsmWL2Xb+/HkjODjYqFWrlnH27FmzfeXKlYYkY8yYMbn26YUXXsjX9saOHWtIMu6//36b9sGDBxuSjF9++cUwDMP4+eefDUnGwIEDbeKee+45Q5Lx7bffGoZhGMuXL8+Vf16uPJ5Tp041JBn79+/PFVupUiWjb9++5nxsbKxRokQJ4+TJk2ZbZmamUbJkSePxxx832wYMGGCEhoYaf/31l836evXqZQQGBhpnzpy5Zo5X5iDJWLVqVa5lea2nffv2xu23327TVrNmTaNly5a5YtetW2fzb6Qgv2/gVkOfkffxcFSf0bdvX6NSpUq52nP6khxvvvmmIcnmmFxpwYIFhpubm5GQkGDTPnv2bEOS8f3335ttkgw3Nzdjx44d+crzci1btjQkGbNnz76h7e/evduQZLzzzjs2cYMHDzb8/Pxs+oArf+/57YemTp1quLu7G+np6YZhGMbbb79tVKpUybj77ruN559
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1200x800 with 5 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"numeric_cols = df1.select_dtypes(include=['number']).columns\n",
|
|||
|
"\n",
|
|||
|
"#все столбцы, кроме app_id\n",
|
|||
|
"numeric_cols = [col for col in numeric_cols if col != 'app_id']\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(12, 8))\n",
|
|||
|
" \n",
|
|||
|
"\n",
|
|||
|
"for i, col in enumerate(numeric_cols, 1):\n",
|
|||
|
" if col == 'id':\n",
|
|||
|
" continue\n",
|
|||
|
" Q1 = df1[col].quantile(0.25)\n",
|
|||
|
" Q3 = df1[col].quantile(0.75)\n",
|
|||
|
" IQR = Q3 - Q1\n",
|
|||
|
" lower_bound = Q1 - 1.5 * IQR\n",
|
|||
|
" upper_bound = Q3 + 1.5 * IQR\n",
|
|||
|
" outliers = df1[col][(df1[col] < lower_bound) | (df1[col] > upper_bound)]\n",
|
|||
|
" plt.subplot(len(numeric_cols) // 3 + 1, 3, i) \n",
|
|||
|
" plt.boxplot(x=df1[col])\n",
|
|||
|
" plt.title(f'Boxplot for {col}')\n",
|
|||
|
"\n",
|
|||
|
"plt.tight_layout()\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Для проверки на просачивание данных, рейтинг игры, представленный в датасете в виде строковых значений, необходимо перевести в числовую шкалу. Было бы логично перевести игры в 5-бальную шкалу или 10-бальную, но всего разных строковых рейтингов 9, что не делится на 5 и 10. Поэтому для равномерного распределения строковых рейтингов они были переведены в 3-бальную шкалу. С этой шкалой сильно коррелирует только столбец с отношением положительных отзывов к отрицательным (positive_ratio), что логично, т.к. от этого столбца зависит столбец rating, на основе которого и был создан столбец rating_stars с 5-бальной шкалой. Однако признак positive_ratio не будет входным, поэтому просачивания данных не будет."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 298,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"['Very Positive' 'Positive' 'Mixed' 'Mostly Positive'\n",
|
|||
|
" 'Overwhelmingly Positive' 'Negative' 'Mostly Negative'\n",
|
|||
|
" 'Overwhelmingly Negative' 'Very Negative']\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.82) между столбцами 'positive_ratio' и 'rating_stars'\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"#просмотр того, какие рейтинги игр есть в таблице\n",
|
|||
|
"print(df1['rating'].unique())\n",
|
|||
|
"\n",
|
|||
|
"#преобразование строковых значений рейтинга в числовые оценки от 1 до 5\n",
|
|||
|
"# rating_mapping = {'Overwhelmingly Positive': 5, \n",
|
|||
|
"# 'Very Positive': 5, \n",
|
|||
|
"# 'Positive': 4, \n",
|
|||
|
"# 'Mostly Positive': 4, \n",
|
|||
|
"# 'Mixed': 3, \n",
|
|||
|
"# 'Mostly Negative': 3, \n",
|
|||
|
"# 'Negative': 2, \n",
|
|||
|
"# 'Very Negative': 2,\n",
|
|||
|
"# 'Overwhelmingly Negative': 1\n",
|
|||
|
"# } \n",
|
|||
|
"# rating_mapping = {'Overwhelmingly Positive': 10, \n",
|
|||
|
"# 'Very Positive': 9, \n",
|
|||
|
"# 'Positive': 8, \n",
|
|||
|
"# 'Mostly Positive': 7, \n",
|
|||
|
"# 'Mixed': 6, \n",
|
|||
|
"# 'Mostly Negative': 5, \n",
|
|||
|
"# 'Negative': 4, \n",
|
|||
|
"# 'Very Negative': 3,\n",
|
|||
|
"# 'Overwhelmingly Negative': 2\n",
|
|||
|
"# } \n",
|
|||
|
"rating_mapping = {'Overwhelmingly Positive': 3, \n",
|
|||
|
" 'Very Positive': 3, \n",
|
|||
|
" 'Positive': 3, \n",
|
|||
|
" 'Mostly Positive': 2, \n",
|
|||
|
" 'Mixed': 2, \n",
|
|||
|
" 'Mostly Negative': 2, \n",
|
|||
|
" 'Negative': 1, \n",
|
|||
|
" 'Very Negative': 1,\n",
|
|||
|
" 'Overwhelmingly Negative': 1\n",
|
|||
|
" } \n",
|
|||
|
"df1['rating_stars'] = df1['rating'].map(rating_mapping)\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"#проверка кореляции (просачивания данных)\n",
|
|||
|
"main_col = 'rating_stars'\n",
|
|||
|
"for col1 in numeric_cols:\n",
|
|||
|
" if col1 != main_col:\n",
|
|||
|
" correlation = df1[col1].corr(df1[main_col])\n",
|
|||
|
" if abs(correlation) > 0.7:\n",
|
|||
|
" print(f\"Просачивание данных: Высокая корреляция ({correlation:.2f}) между столбцами '{col1}' и '{main_col}'\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Данный датасет не совсем информативный, т.к. нет данных о жанре игры и об издателе, что вполне может повлиять на оценку игры. Тем не менее в нём есть данные об отзывах и оценке, дате выхода, цене и доступных платформах, что так же может влиять на оценку игры.\n",
|
|||
|
"\n",
|
|||
|
"Покрытие у датасета хорошее, т.к. содержится 50000 записей об играх с 1997 по 2023 год, однако важных данных об играх текущего года здесь нет. Данные также могут быть неактуальны, т.к. с последней даты выхода игры прошёл год, за который отзывы на игры могли измениться. \n",
|
|||
|
"\n",
|
|||
|
"Метки согласованы, однако метку final_price можно принять за окончательную цену игры после её выпуска, что неверно, т.к. это на самом деле означает цену после применения скидки"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 299,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Index(['app_id', 'title', 'date_release', 'win', 'mac', 'linux', 'rating',\n",
|
|||
|
" 'positive_ratio', 'user_reviews', 'price_final', 'price_original',\n",
|
|||
|
" 'discount', 'steam_deck', 'rating_stars'],\n",
|
|||
|
" dtype='object')\n",
|
|||
|
"Количество записей: 50872\n",
|
|||
|
"<DatetimeArray>\n",
|
|||
|
"['1997-06-30 00:00:00', '1997-11-14 00:00:00', '1998-11-08 00:00:00',\n",
|
|||
|
" '1999-04-01 00:00:00', '1999-09-08 00:00:00', '1999-11-01 00:00:00',\n",
|
|||
|
" '2000-11-01 00:00:00', '2001-06-01 00:00:00', '2002-08-28 00:00:00',\n",
|
|||
|
" '2003-05-01 00:00:00',\n",
|
|||
|
" ...\n",
|
|||
|
" '2023-10-12 00:00:00', '2023-10-13 00:00:00', '2023-10-15 00:00:00',\n",
|
|||
|
" '2023-10-16 00:00:00', '2023-10-17 00:00:00', '2023-10-18 00:00:00',\n",
|
|||
|
" '2023-10-19 00:00:00', '2023-10-20 00:00:00', '2023-10-23 00:00:00',\n",
|
|||
|
" '2023-10-24 00:00:00']\n",
|
|||
|
"Length: 4292, dtype: datetime64[ns]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"print(df1.columns)\n",
|
|||
|
"print(f\"Количество записей: {df1.shape[0]}\")\n",
|
|||
|
"#даты выхода игр\n",
|
|||
|
"df1['date_release'] = pd.to_datetime(df1['date_release'])\n",
|
|||
|
"df_sorted = df1.sort_values(by='date_release')\n",
|
|||
|
"print(df_sorted['date_release'].unique())"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Во всех столбцах нет пропущенных данных, поэтому данную проблему устранять не надо"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 300,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Cтолбцы, в которых пропущены значения: []\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"columns_with_nulls = []\n",
|
|||
|
"for col in df1.columns:\n",
|
|||
|
" if df1[col].isnull().sum() > 0: \n",
|
|||
|
" columns_with_nulls.append(col)\n",
|
|||
|
"print(f\"Cтолбцы, в которых пропущены значения: {columns_with_nulls}\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"**РАЗБИЕНИЕ НА ВЫБОРКИ**\n",
|
|||
|
"\n",
|
|||
|
"train_data - обучающая выборка\n",
|
|||
|
"\n",
|
|||
|
"val_data - контрольная выборка\n",
|
|||
|
"\n",
|
|||
|
"test_data - тестовая выборка"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Заметно, что в обучающую выборку попало слишком мало игр с низким рейтингом. Необходимо прирастить данные для таких игр через oversampling "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 301,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Размер обучающей выборки: 40697\n",
|
|||
|
"Размер контрольной выборки: 5087\n",
|
|||
|
"Размер тестовой выборки: 5088\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlEAAAGwCAYAAACJjDBkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAs30lEQVR4nO3de1xVdb7/8fcGuXhjkzeQkbxkIpih4Q27moyYTBMnZ0pzjIqsPOCodFEnU6tzxu5mRTlNo3RmcrKb1mhhiAKToCbKKKb+ssGoBLRUtjIJCuv3xxzWcXtLvqB7o6/n47Efj9b6ftban/VtuXk/1l4sHJZlWQIAAECD+Hi6AQAAgOaIEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCghacbuFDU1dVpz549atu2rRwOh6fbAQAAZ8GyLB06dEhhYWHy8WnYtSVCVBPZs2ePwsPDPd0GAAAw8M0336hLly4N2oYQ1UTatm0r6d//E4KCgjzcDQAAOBsul0vh4eH2z/GGIEQ1kfqv8IKCgghRAAA0Mya34nBjOQAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgIEWnm4AAICz0W36Ck+3AA/b/VSCp1tww5UoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAA4QoAAAAAx4NUXPnztXAgQPVtm1bderUSYmJidq5c6dbzZEjR5SSkqL27durTZs2Gj16tCoqKtxqSktLlZCQoFatWqlTp056+OGHdezYMbeanJwcXXXVVQoICFDPnj2VkZFxUj/p6enq1q2bAgMDNXjwYG3YsKHJjxkAAFwYPBqicnNzlZKSonXr1ikrK0tHjx7ViBEjVFVVZddMnTpVf/vb3/Tuu+8qNzdXe/bs0a233mqP19bWKiEhQTU1NcrPz9ebb76pjIwMzZo1y64pKSlRQkKChg0bpqKiIk2ZMkX33nuvVq5cadcsWbJEaWlpmj17tjZt2qTo6GjFx8dr796952cyAABAs+KwLMvydBP19u3bp06dOik3N1fXXXedKisr1bFjRy1evFi/+tWvJEk7duxQZGSkCgoKNGTIEH3yySf6xS9+oT179igkJESStGDBAk2bNk379u2Tv7+/pk2bphUrVqi4uNh+rzFjxujgwYPKzMyUJA0ePFgDBw7UK6+8Ikmqq6tTeHi4Jk2apOnTp5/Ua3V1taqrq+1ll8ul8PBwVVZWKigo6JzNEQBcrLpNX+HpFuBhu59KaPJ9ulwuOZ1Oo5/fXnVPVGVlpSSpXbt2kqTCwkIdPXpUcXFxdk3v3r116aWXqqCgQJJUUFCgvn372gFKkuLj4+VyubRt2za75vh91NfU76OmpkaFhYVuNT4+PoqLi7NrTjR37lw5nU77FR4e3tjDBwAAzYjXhKi6ujpNmTJFV199ta644gpJUnl5ufz9/RUcHOxWGxISovLycrvm+ABVP14/dqYal8ulH3/8Ud9//71qa2tPWVO/jxPNmDFDlZWV9uubb74xO3AAANAstfB0A/VSUlJUXFyszz77zNOtnJWAgAAFBAR4ug0AAOAhXnElKjU1VcuXL9eaNWvUpUsXe31oaKhqamp08OBBt/qKigqFhobaNSf+tl798k/VBAUFqWXLlurQoYN8fX1PWVO/DwAAgON5NERZlqXU1FQtXbpUq1evVvfu3d3GY2Ji5Ofnp+zsbHvdzp07VVpaqtjYWElSbGystm7d6vZbdFlZWQoKClJUVJRdc/w+6mvq9+Hv76+YmBi3mrq6OmVnZ9s1AAAAx/Po13kpKSlavHixPvzwQ7Vt29a+/8jpdKply5ZyOp1KTk5WWlqa2rVrp6CgIE2aNEmxsbEaMmSIJGnEiBGKiorS+PHj9cwzz6i8vFwzZ85USkqK/XXbAw88oFdeeUWPPPKI7rnnHq1evVrvvPOOVqz4v9/0SEtLU1JSkgYMGKBBgwbpxRdfVFVVle6+++7zPzEAAMDreTREvfbaa5KkG264wW39okWLdNddd0mS5s2bJx8fH40ePVrV1dWKj4/Xq6++atf6+vpq+fLlmjhxomJjY9W6dWslJSXpiSeesGu6d++uFStWaOrUqZo/f766dOmiN954Q/Hx8XbN7bffrn379mnWrFkqLy9Xv379lJmZedLN5gAAAJKXPSeqOWvMcyYAAD+N50SB50QBAABcAAhRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABghRAAAABlp4ugEAzUO36Ss83QI8bPdTCZ5uAfAqXIkCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAwQIgCAAAw4NEQlZeXp5tvvllhYWFyOBxatmyZ2/hdd90lh8Ph9ho5cqRbzf79+zVu3DgFBQUpODhYycnJOnz4sFvNli1bdO211yowMFDh4eF65plnTurl3XffVe/evRUYGKi+ffvq448/bvLjBQAAFw6PhqiqqipFR0crPT39tDUjR45UWVmZ/frrX//qNj5u3Dht27ZNWVlZWr58ufLy8nTffffZ4y6XSyNGjFDXrl1VWFioZ599VnPmzNHrr79u1+Tn52vs2LFKTk7W5s2blZiYqMTERBUXFzf9QQMAgAtCC0+++U033aSbbrrpjDUBAQEKDQ095dj27duVmZmpzz//XAMGDJAkvfzyyxo1apSee+45hYWF6a233lJNTY0WLlwof39/9enTR0VFRXrhhRfssDV//nyNHDlSDz/8sCTpySefVFZWll555RUtWLDglO9dXV2t6upqe9nlcjX4+AEAQPPl9fdE5eTkqFOnToqIiNDEiRP1ww8/2GMFBQUKDg62A5QkxcXFycfHR+vXr7drrrvuOvn7+9s18fHx2rlzpw4cOGDXxMXFub1vfHy8CgoKTtvX3Llz5XQ67Vd4eHiTHC8AAGgevDpEjRw5Uv/zP/+j7OxsPf3008rNzdVNN92k2tpaSVJ5ebk6derktk2LFi3Url07lZeX2zUhISFuNfXLP1VTP34qM2bMUGVlpf365ptvGnewAACgWfHo13k/ZcyYMfZ/9+3bV1deeaUuu+wy5eTkaPjw4R7s7N9fMwYEBHi0BwAA4DlefSXqRD169FCHDh20a9cuSVJoaKj27t3rVnPs2DHt37/fvo8qNDRUFRUVbjX1yz9Vc7p7sQAAAJpViPr222/1ww8/qHPnzpKk2NhYHTx4UIWFhXbN6tWrVVdXp8GDB9s1eXl5Onr0qF2TlZWliIgIXXLJJXZNdna223tlZWUpNjb2XB8SAABopjwaog4fPqyioiIVFRVJkkpKSlRUVKTS0lIdPnxYDz/8sNatW6fdu3crOztbt9xyi3r27Kn4+HhJUmR
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"rating_stars\n",
|
|||
|
"1 296\n",
|
|||
|
"2 18144\n",
|
|||
|
"3 22257\n",
|
|||
|
"Name: count, dtype: int64\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"data=df1[['date_release', 'win', 'linux', 'mac', 'price_original', 'rating_stars']].copy()\n",
|
|||
|
"# сначала разделение записей на 80% и 20%, где 80% - обучающая выборка\n",
|
|||
|
"train_data, temp_data = train_test_split(data, test_size=0.2, random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"# потом разделение остальных 20% поровну на контрольную и тестовую выборки\n",
|
|||
|
"val_data, test_data = train_test_split(temp_data, test_size=0.5, random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"# Проверка размеров выборок\n",
|
|||
|
"print(\"Размер обучающей выборки:\", len(train_data))\n",
|
|||
|
"print(\"Размер контрольной выборки:\", len(val_data))\n",
|
|||
|
"print(\"Размер тестовой выборки:\", len(test_data))\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"# построение столбчатой диаграммы по столбцу rating_stars (сбалансированность обучающей выборки)\n",
|
|||
|
"rating_counts = train_data['rating_stars'].value_counts().sort_index()\n",
|
|||
|
"\n",
|
|||
|
"plt.bar(rating_counts.index, rating_counts.values)\n",
|
|||
|
"plt.xlabel('Rating Stars')\n",
|
|||
|
"plt.ylabel('Count')\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"print(train_data[\"rating_stars\"].value_counts().sort_index())"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"**ПРИРАЩЕНИЕ ДАННЫХ (oversampling)**"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"После приращения данных по играм с отрицательными отзывами стало гораздо больше. Теперь распределение игр стало гораздо сбалансированнее"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 302,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlEAAAGwCAYAAACJjDBkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAs3klEQVR4nO3de1hVdb7H8c8GuXgD8gYykpdMBDM0vGE3TUZMp4mTU2mOUZGVBxyVLupkannOOF3NinKaRunM5GQ3rdHCEAVKUBNlFFNPNhiVXCyVrUyCwjp/zGGNW9HkB7o3+X49z34e1/p999rf9XO5/TxrLRYOy7IsAQAAoEG83N0AAABAc0SIAgAAMECIAgAAMECIAgAAMECIAgAAMECIAgAAMECIAgAAMNDC3Q38VNTW1mr//v1q27atHA6Hu9sBAADnwLIsHTlyRKGhofLyati5JUJUE9m/f7/CwsLc3QYAADDw9ddfq0uXLg16DyGqibRt21bSv/4SAgIC3NwNAAA4F06nU2FhYfb/4w1BiGoidZfwAgICCFEAADQzJrficGM5AACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAAUIUAACAgRbubgDnptvM1e5uAW627/dj3N0C4FZ8D8LTvgc5EwUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCAEAUAAGCghbsbANA8dJu52t0twM32/X6Mu1sAPIpbz0QtWLBAAwcOVNu2bdWpUyfFx8drz549LjXHjh1TUlKS2rdvrzZt2mjs2LEqKytzqSkuLtaYMWPUqlUrderUSQ8//LBOnDjhUpOVlaWrrrpKfn5+6tmzp9LS0k7rJzU1Vd26dZO/v78GDx6szZs3N/k+AwCAnwa3hqjs7GwlJSVp48aNysjI0PHjxzVy5EhVVlbaNdOnT9ff/vY3vf3228rOztb+/ft1yy232OM1NTUaM2aMqqurlZubq9dff11paWmaM2eOXVNUVKQxY8Zo+PDhKigo0LRp03TvvfdqzZo1ds3y5cuVkpKiuXPnauvWrYqKilJcXJzKy8svzGQAAIBmxWFZluXuJuocOHBAnTp1UnZ2tq677jpVVFSoY8eOWrZsmX71q19Jknbv3q2IiAjl5eVpyJAh+uijj/SLX/xC+/fvV3BwsCRp8eLFmjFjhg4cOCBfX1/NmDFDq1evVmFhof1Z48aN0+HDh5Weni5JGjx4sAYOHKiXXnpJklRbW6uwsDBNmTJFM2fOPK3XqqoqVVVV2ctOp1NhYWGqqKhQQEBAk88Nl1Lg7kspHIPgGIS7nY9j0Ol0KjAw0Oj/b4+6sbyiokKS1K5dO0lSfn6+jh8/rtjYWLumd+/euvTSS5WXlydJysvLU9++fe0AJUlxcXFyOp3auXOnXXPyNupq6rZRXV2t/Px8lxovLy/FxsbaNadasGCBAgMD7VdYWFhjdx8AADQjHhOiamtrNW3aNF199dW64oorJEmlpaXy9fVVUFCQS21wcLBKS0vtmpMDVN143djZapxOp3744Qd99913qqmpqbembhunmjVrlioqKuzX119/bbbjAACgWfKYn85LSkpSYWGhPv30U3e3ck78/Pzk5+fn7jYAAICbeMSZqOTkZK1atUrr169Xly5d7PUhISGqrq7W4cOHXerLysoUEhJi15z603p1yz9WExAQoJYtW6pDhw7y9vaut6ZuGwAAACdza4iyLEvJyclasWKF1q1bp+7du7uMR0dHy8fHR5mZmfa6PXv2qLi4WDExMZKkmJgY7dixw+Wn6DIyMhQQEKDIyEi75uRt1NXUbcPX11fR0dEuNbW1tcrMzLRrAAAATubWy3lJSUlatmyZ3n//fbVt29a+/ygwMFAtW7ZUYGCgEhMTlZKSonbt2ikgIEBTpkxRTEyMhgwZIkkaOXKkIiMjNXHiRD311FMqLS3V7NmzlZSUZF9ue+CBB/TSSy/pkUce0T333KN169bprbfe0urV//5Jj5SUFCUkJGjAgAEaNGiQnn/+eVVWVuruu+++8BMDAAA8nltD1CuvvCJJGjZsmMv6pUuX6q677pIkLVy4UF5eXho7dqyqqqoUFxenl19+2a719vbWqlWrNHnyZMXExKh169ZKSEjQE088Ydd0795dq1ev1vTp07Vo0SJ16dJFr732muLi4uya22+/XQcOHNCcOXNUWlqqfv36KT09/bSbzQEAACQPe05Uc9aY50ycC56PAp7RA3fjGIS78ZwoAACAnwBCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAFCFAAAgAG3hqicnBzddNNNCg0NlcPh0MqVK13G77rrLjkcDpfXqFGjXGoOHjyoCRMmKCAgQEFBQUpMTNTRo0ddarZv365rr71W/v7+CgsL01NPPXVaL2+//bZ69+4tf39/9e3bVx9++GGT7y8AAPjpcGuIqqysVFRUlFJTU89YM2rUKJWUlNivv/71ry7jEyZM0M6dO5WRkaFVq1YpJydH9913nz3udDo1cuRIde3aVfn5+Xr66ac1b948vfrqq3ZNbm6uxo8fr8TERG3btk3x8fGKj49XYWFh0+80AAD4SWjhzg+/8cYbdeONN561xs/PTyEhIfWO7dq1S+np6frss880YMAASdKLL76o0aNH65lnnlFoaKjeeOMNVVdXa8mSJfL19VWfPn1UUFCg5557zg5bixYt0qhRo/Twww9LkubPn6+MjAy99NJLWrx4cb2fXVVVpaqqKnvZ6XQ2eP8BAEDz5fH3RGVlZalTp04KDw/X5MmT9f3339tjeXl5CgoKsgOUJMXGxsrLy0ubNm2ya6677jr5+vraNXFxcdqzZ48OHTpk18TGxrp8blxcnPLy8s7Y14IFCxQYGGi/wsLCmmR/AQBA8+DRIWrUqFH6n//5H2VmZurJJ59Udna2brzxRtXU1EiSSktL1alTJ5f3tGjRQu3atVNpaaldExwc7FJTt/xjNXXj9Zk1a5YqKirs19dff924nQUAAM2KWy/n/Zhx48bZf+7bt6+uvPJKXXbZZcrKytKIESPc2Nm/LjP6+fm5tQcAAOA+Hn0m6lQ9evRQhw4dtHfvXklSSEiIysvLXWpOnDihgwcP2vdRhYSEqKyszKWmbvnHas50LxYAAECzClHffPONvv/+e3Xu3FmSFBMTo8OHDys/P9+uWbdunWprazV48GC7JicnR8ePH7drMjIyFB4erksuucSuyczMdPmsjIwMxcTEnO9dAgAAzZRbQ9TRo0dVUFCggoICSVJRUZEKCgpUXFyso0eP6uGHH9bGjRu1b98+ZWZm6uabb1b
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"rating_stars\n",
|
|||
|
"1 22234\n",
|
|||
|
"2 20308\n",
|
|||
|
"3 22257\n",
|
|||
|
"Name: count, dtype: int64\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from imblearn.over_sampling import ADASYN\n",
|
|||
|
"ada = ADASYN(n_neighbors=3)\n",
|
|||
|
"#ada = ADASYN()\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"#Преобразование нечисленных значений к численным для возиожности работы с oversampling\n",
|
|||
|
"train_data['date_release'] = pd.to_datetime(df1['date_release']).astype('int64')/ 10**9\n",
|
|||
|
"train_data['mac'] = train_data[\"mac\"].astype(int)\n",
|
|||
|
"train_data['win'] = train_data[\"mac\"].astype(int)\n",
|
|||
|
"train_data['linux'] = train_data[\"linux\"].astype(int)\n",
|
|||
|
"\n",
|
|||
|
"X_resampled, y_resampled = ada.fit_resample(train_data, train_data[\"rating_stars\"])\n",
|
|||
|
"train_data_adasyn = pd.DataFrame(X_resampled)\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"rating_counts_adasyn = train_data_adasyn['rating_stars'].value_counts().sort_index()\n",
|
|||
|
"\n",
|
|||
|
"plt.bar(rating_counts_adasyn.index, rating_counts_adasyn.values)\n",
|
|||
|
"plt.xlabel('Rating Stars')\n",
|
|||
|
"plt.ylabel('Count')\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"print(train_data_adasyn[\"rating_stars\"].value_counts().sort_index())"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## **ДАТАСЕТ 2**\n",
|
|||
|
"\n",
|
|||
|
"https://www.kaggle.com/datasets/dewangmoghe/mobile-phone-price-prediction\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"Проблемная область: рынок мобильных телефонов\n",
|
|||
|
"\n",
|
|||
|
"Объекты наблюдения: мобильные телефоны\n",
|
|||
|
"\n",
|
|||
|
"Атрибуты объектов:\n",
|
|||
|
"* Name: Название\n",
|
|||
|
"\n",
|
|||
|
"* Rating: оценка телефона (от 0 до 5).\n",
|
|||
|
"\n",
|
|||
|
"* Spec_score: оценка телефона на основе его основных характеристик (от 0 до 100)\n",
|
|||
|
"\n",
|
|||
|
"* No_of_sim: поддерживает ли телефон две SIM-карты, 3G, 4G, 5G, Volte\n",
|
|||
|
"\n",
|
|||
|
"* RAM: кол-во оперативной памяти\n",
|
|||
|
"\n",
|
|||
|
"* Battery: хар-ки аккумулятора\n",
|
|||
|
"\n",
|
|||
|
"* Display: размере экрана телефона\n",
|
|||
|
"\n",
|
|||
|
"* Camera: хар-ки передней и задней камерах\n",
|
|||
|
"\n",
|
|||
|
"* External_Memory: поддерживает ли внешнюю память и сколько\n",
|
|||
|
"\n",
|
|||
|
"* Android_version: версия Android телефона\n",
|
|||
|
"\n",
|
|||
|
"* Price: цена\n",
|
|||
|
"\n",
|
|||
|
"* Company: компания, которой принадлежит телефон\n",
|
|||
|
"\n",
|
|||
|
"* Inbuilt_memory: встроенная память телефона\n",
|
|||
|
"\n",
|
|||
|
"* fast_charging: поддерживает ли быструю зарядку или нет и насколько ватт.\n",
|
|||
|
"\n",
|
|||
|
"* Screen_resolution: разрешение экрана\n",
|
|||
|
"\n",
|
|||
|
"* Processor: описание процессора\n",
|
|||
|
"\n",
|
|||
|
"* Processor_name: название процессора\n",
|
|||
|
"\n",
|
|||
|
"Связи между объектами:\n",
|
|||
|
"Между ценой телефона и его другими хар-ками (чем лучше хар-ки, тем дороже должен быть телефон)\n",
|
|||
|
"\n",
|
|||
|
"Бизнес-цель: помочь производителям и продавцам определить оптимальную цену для новых телефонов на основе конкурентов.\n",
|
|||
|
"Эффект для бизнеса: Улучшение конкурентоспособности на рынке, потенциальное увеличение прибыли\n",
|
|||
|
"\n",
|
|||
|
"Цель технического проекта: создать модель машинного обучения, которая будет предсказывать цену мобильного телефона на основе его характеристик.\n",
|
|||
|
"Входные данные: Характеристики мобильных телефонов (хар-ки аккумулятора, камеры, процессор и т.д.).\n",
|
|||
|
"Целевой признак: цена"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 303,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Index(['Unnamed: 0', 'Name', 'Rating', 'Spec_score', 'No_of_sim', 'Ram',\n",
|
|||
|
" 'Battery', 'Display', 'Camera', 'External_Memory', 'Android_version',\n",
|
|||
|
" 'Price', 'company', 'Inbuilt_memory', 'fast_charging',\n",
|
|||
|
" 'Screen_resolution', 'Processor', 'Processor_name'],\n",
|
|||
|
" dtype='object')\n",
|
|||
|
"<class 'pandas.core.frame.DataFrame'>\n",
|
|||
|
"RangeIndex: 1370 entries, 0 to 1369\n",
|
|||
|
"Data columns (total 18 columns):\n",
|
|||
|
" # Column Non-Null Count Dtype \n",
|
|||
|
"--- ------ -------------- ----- \n",
|
|||
|
" 0 Unnamed: 0 1370 non-null int64 \n",
|
|||
|
" 1 Name 1370 non-null object \n",
|
|||
|
" 2 Rating 1370 non-null float64\n",
|
|||
|
" 3 Spec_score 1370 non-null int64 \n",
|
|||
|
" 4 No_of_sim 1370 non-null object \n",
|
|||
|
" 5 Ram 1370 non-null object \n",
|
|||
|
" 6 Battery 1370 non-null object \n",
|
|||
|
" 7 Display 1370 non-null object \n",
|
|||
|
" 8 Camera 1370 non-null object \n",
|
|||
|
" 9 External_Memory 1370 non-null object \n",
|
|||
|
" 10 Android_version 927 non-null object \n",
|
|||
|
" 11 Price 1370 non-null object \n",
|
|||
|
" 12 company 1370 non-null object \n",
|
|||
|
" 13 Inbuilt_memory 1351 non-null object \n",
|
|||
|
" 14 fast_charging 1281 non-null object \n",
|
|||
|
" 15 Screen_resolution 1368 non-null object \n",
|
|||
|
" 16 Processor 1342 non-null object \n",
|
|||
|
" 17 Processor_name 1370 non-null object \n",
|
|||
|
"dtypes: float64(1), int64(2), object(15)\n",
|
|||
|
"memory usage: 192.8+ KB\n",
|
|||
|
"None\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df2 = pd.read_csv(\"..//static//csv//mobiles.csv\")\n",
|
|||
|
"print(df2.columns)\n",
|
|||
|
"print(df2.info())"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"В столбце Ram есть шум в виде значений, которые явно не относятся к значению оперативной памяти ('Helio G90T', '128 GB inbuilt' '6000 mAh Battery with 22.5W Fast Charging'\n",
|
|||
|
"'256 GB inbuilt' '512 GB inbuilt'). Строки с этими значениями можно удалить, т.к. у них значения съехали с других столбцов, а значит и в другом столбце будет неверное значение. \n",
|
|||
|
"\n",
|
|||
|
"Также было обнаружено, что не все цены указаны верно, т.к. у некоторых значений было 2 запятые. Для преобразования значений в числа запятые были заменены на точки, а в строках, где стало 2 точки, первая точка удалена.\n",
|
|||
|
"\n",
|
|||
|
"Актуальность данных проверить нельзя, т.к. в датасете нет даты релиза смартфона\n",
|
|||
|
"\n",
|
|||
|
"Покрытие данных очень хорошее, т.к. представлено большое количество смартфон разной ценовой категории"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 304,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"[ 0 1 2 ... 1367 1368 1369]\n",
|
|||
|
"['Samsung Galaxy F14 5G' 'Samsung Galaxy A11' 'Samsung Galaxy A13' ...\n",
|
|||
|
" 'TCL 50 XE NxtPaper 5G' 'TCL 40 NxtPaper 5G' 'TCL Trifold']\n",
|
|||
|
"[4.65 4.2 4.3 4.1 4.4 4.05 4.5 4.25 4.75 4.15 4.35 4.45 4.6 4.\n",
|
|||
|
" 4.55 4.7 3.95 3.75 3.9 3.85]\n",
|
|||
|
"[68 63 75 73 69 76 71 85 78 72 74 79 80 62 81 82 87 86 88 84 83 89 91 90\n",
|
|||
|
" 96 93 92 95 65 59 42 67 60 61 54 66 70 51 64 53 77 94 98 97 58 57 49 46\n",
|
|||
|
" 56 55]\n",
|
|||
|
"['Dual Sim, 3G, 4G, 5G, VoLTE, ' 'Dual Sim, 3G, 4G, VoLTE, '\n",
|
|||
|
" 'Dual Sim, 3G, 4G, 5G, VoLTE, Vo5G, ' 'Single Sim, 3G, 4G, 5G, VoLTE, '\n",
|
|||
|
" 'Dual Sim, 3G, 4G, ' 'Single Sim, 3G, 4G, VoLTE, ' 'No Sim Supported, '\n",
|
|||
|
" 'Single Sim, 3G, 4G, 5G, VoLTE, Vo5G, ' 'Dual Sim, 3G, VoLTE, ']\n",
|
|||
|
"['4 GB RAM' '2 GB RAM' '6 GB RAM' '8 GB RAM' '12 GB RAM' '1 GB RAM'\n",
|
|||
|
" '3 GB RAM' '16 GB RAM' 'Helio G90T' '24 GB RAM' '18 GB RAM' '1.5 GB RAM'\n",
|
|||
|
" '128 GB inbuilt' '6000 mAh Battery with 22.5W Fast Charging'\n",
|
|||
|
" '256 GB inbuilt' '512 GB inbuilt']\n",
|
|||
|
"['6000 mAh Battery ' '4000 mAh Battery ' '5000 mAh Battery '\n",
|
|||
|
" '6000 mAh Battery' '3500 mAh Battery' '4500 mAh Battery '\n",
|
|||
|
" '3400 mAh Battery ' '3300 mAh Battery ' '4050 mAh Battery '\n",
|
|||
|
" '3900 mAh Battery ' '4300 mAh Battery ' '4800 mAh Battery '\n",
|
|||
|
" '4200 mAh Battery ' '3700 mAh Battery ' '4400 mAh Battery '\n",
|
|||
|
" '3500 mAh Battery ' '4320 mAh Battery ' '4030 mAh Battery'\n",
|
|||
|
" '1900 mAh Battery' '5000 mAh Battery' '2650 mAh Battery'\n",
|
|||
|
" '3000 mAh Battery' '4600 mAh Battery ' '4100 mAh Battery '\n",
|
|||
|
" '5500 mAh Battery ' '4830 mAh Battery ' '4700 mAh Battery '\n",
|
|||
|
" '4810 mAh Battery ' '5100 mAh Battery ' '5400 mAh Battery '\n",
|
|||
|
" '4870 mAh Battery ' '5700 mAh Battery ' '4730 mAh Battery '\n",
|
|||
|
" '5100 mAh Battery' '6 GB RAM, 64 GB inbuilt' '5200 mAh Battery '\n",
|
|||
|
" '5240 mAh Battery ' '5050 mAh Battery ' '4310 mAh Battery '\n",
|
|||
|
" '4350 mAh Battery ' '4880 mAh Battery ' '4520 mAh Battery '\n",
|
|||
|
" '4260 mAh Battery ' '4820 mAh Battery ' '4805 mAh Battery '\n",
|
|||
|
" '5160 mAh Battery ' '5080 mAh Battery ' '5065 mAh Battery '\n",
|
|||
|
" '10500 mAh Battery ' '5200 mAh Battery' '5800 mAh Battery '\n",
|
|||
|
" '5300 mAh Battery ' '5450 mAh Battery ' '5600 mAh Battery '\n",
|
|||
|
" '3000 mAh Battery ' '2800 mAh Battery ' '4620 mAh Battery '\n",
|
|||
|
" '4385 mAh Battery ' '4410 mAh Battery ' '4355 mAh Battery '\n",
|
|||
|
" '4492 mAh Battery ' '4575 mAh Battery ' '5003 mAh Battery '\n",
|
|||
|
" '4821 mAh Battery ' '4000 mAh Battery' '7000 mAh Battery '\n",
|
|||
|
" '3900 mAh Battery' '3760 mAh Battery ' '2600 mAh Battery'\n",
|
|||
|
" '4900 mAh Battery ' '4020 mAh Battery ' '4450 mAh Battery '\n",
|
|||
|
" '4610 mAh Battery ' '3800 mAh Battery ' '3440 mAh Battery '\n",
|
|||
|
" '2510 mAh Battery ' '6100 mAh Battery ' '2100 mAh Battery'\n",
|
|||
|
" '4030 mAh Battery ' '5020 mAh Battery ' '4980 mAh Battery '\n",
|
|||
|
" '4250 mAh Battery ' '6.75 inches, 720 x 1600 px Display '\n",
|
|||
|
" '4460 mAh Battery ' '4815 mAh Battery ' '4750 mAh Battery '\n",
|
|||
|
" '5330 mAh Battery ' '5010 mAh Battery ' '4500 mAh Battery']\n",
|
|||
|
"['6.6 inches' '6.4 inches' '6.5 inches' '6.1 inches' '6.7 inches'\n",
|
|||
|
" '6.21 inches' '6.67 inches' '6.58 inches' '6.71 inches' '6.78 inches'\n",
|
|||
|
" '6.8 inches' '6.56 inches' '6.3 inches' '7.45 inches' '6.2 inches'\n",
|
|||
|
" '8.2 inches' '7.6 inches' '8 inches' '7.63 inches' '6.22 inches'\n",
|
|||
|
" '4.5 inches' '6.51 inches' '6.53 inches' '6.35 inches' '6.55 inches'\n",
|
|||
|
" '6.64 inches' '5.2 inches' '5.5 inches' '6.72 inches' '6.44 inches'\n",
|
|||
|
" '6.82 inches' '6.68 inches' '7 inches' '6.74 inches' '8.03 inches'\n",
|
|||
|
" '8.02 inches' '7.8 inches' '6.52 inches' '6.59 inches' '6.43 inches'\n",
|
|||
|
" '4300 mAh Battery with 30W Fast Charging' '6.62 inches' '6.57 inches'\n",
|
|||
|
" '6.73 inches' '6.83 inches' '7.1 inches' '7.4 inches' '7.56 inches'\n",
|
|||
|
" '7.82 inches' '6.38 inches' '6.79 inches' '6.61 inches' '6.69 inches'\n",
|
|||
|
" '12.1 inches' '6.77 inches' '6.75 inches' '6.81 inches' '7.2 inches'\n",
|
|||
|
" '7.71 inches' '7.92 inches' '6.76 inches' '7.9 inches' '5.6 inches'\n",
|
|||
|
" '5.7 inches' '6.34 inches' '6.14 inches' '6.03 inches' '8.3 inches'\n",
|
|||
|
" '5.9 inches' '5.92 inches' '6 inches' '6.26 inches' '6.09 inches'\n",
|
|||
|
" '5.99 inches' '6.92 inches' '5 inches' '6.45 inches' '6.9 inches'\n",
|
|||
|
" '6.47 inches' '6.28 inches' '6.49 inches' '6.08 inches' '7.85 inches'\n",
|
|||
|
" '7.11 inches' '6.95 inches'\n",
|
|||
|
" '48 MP + 5 MP + 2 MP Triple Rear & 8 MP Front Camera' '6.94 inches'\n",
|
|||
|
" '7.09 inches' '10 inches']\n",
|
|||
|
"['50 MP + 2 MP Dual Rear & 13 MP Front Camera'\n",
|
|||
|
" '13 MP + 5 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP Quad Rear & 8 MP Front Camera'\n",
|
|||
|
" '48 MP Quad Rear & 13 MP Front Camera'\n",
|
|||
|
" '13 MP + 2 MP + 2 MP Triple Rear & 5 MP Front Camera'\n",
|
|||
|
" '50 MP + 2 MP Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '48 MP + 8 MP + 5 MP Triple Rear & 20 MP Front Camera'\n",
|
|||
|
" '48 MP Quad Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP + 2 MP + 2 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '50 MP + 5 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP + 2 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '48 MP + 8 MP + 5 MP Triple Rear & 25 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP Quad Rear & 20 MP Front Camera'\n",
|
|||
|
" '64 MP + 8 MP + 5 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '13 MP + 2 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 2 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 5 MP + 2 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '64 MP + 8 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '12 MP + 12 MP + 8 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP + 5 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '48 MP Quad Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP Quad Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP + 2 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '64 MP + 12 MP + 5 MP Triple Rear & 10 MP Front Camera'\n",
|
|||
|
" '24 MP + 10 MP + 5 MP Triple Rear & 24 MP Front Camera'\n",
|
|||
|
" '50 MP + 12 MP + 5 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" 'Foldable Display, Dual Display'\n",
|
|||
|
" '108 MP + 8 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP Dual Rear & 13 MP Front Camera'\n",
|
|||
|
" '50 MP + 12 MP + 8 MP Triple Rear & 10 MP Front Camera'\n",
|
|||
|
" '108 MP Quad Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 12 MP + 10 MP Triple Rear & 12 MP Front Camera'\n",
|
|||
|
" '12 MP Quad Rear & 10 MP Front Camera'\n",
|
|||
|
" '64 MP + 12 MP + 12 MP Triple Rear & 10 MP Front Camera'\n",
|
|||
|
" '48 MP + 12 MP + 5 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '25 MP + 8 MP Dual Rear & 13 MP Front Camera'\n",
|
|||
|
" '50 MP + 12 MP + 12 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 12 MP + 10 MP Triple Rear & 10 MP Front Camera'\n",
|
|||
|
" '48 MP + 8 MP + 5 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '108 MP Quad Rear & 12 MP + 8 MP Dual Front Camera'\n",
|
|||
|
" '12 MP + 12 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '200 MP Quad Rear & 12 MP Front Camera'\n",
|
|||
|
" '108 MP Quad Rear & 40 MP Front Camera'\n",
|
|||
|
" '13 MP + 0.08 MP Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '13 MP + 2 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '5 MP Rear & 2 MP Front Camera' '8 MP Rear & 5 MP Front Camera'\n",
|
|||
|
" '13 MP Rear & 5 MP Front Camera'\n",
|
|||
|
" '50 MP + 0.08 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '13 MP + 2 MP Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '16 MP + 8 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '13 MP + 2 MP Dual Rear & 13 MP Front Camera'\n",
|
|||
|
" '50 MP + 2 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '13 MP + 8 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '13 MP + 2 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP + 2 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '13 MP Rear & 16 MP Front Camera'\n",
|
|||
|
" '16 MP + 8 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '64 MP + 2 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP + 2 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP Dual Rear & 50 MP Front Camera'\n",
|
|||
|
" '108 MP + 8 MP + 2 MP Triple Rear & 50 MP + 8 MP Dual Front Camera'\n",
|
|||
|
" '64 MP + 8 MP + 2 MP Triple Rear & 44 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 2 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '64 MP + 8 MP + 2 MP Triple Rear & 44 MP + 8 MP Dual Front Camera'\n",
|
|||
|
" '64 MP + 8 MP + 2 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP Dual Rear & 50 MP Front Camera'\n",
|
|||
|
" '64 MP + 8 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP + 2 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 50 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '108 MP + 64 MP + 50 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '108 MP + 8 MP + 2 MP Triple Rear & 44 MP + 8 MP Dual Front Camera'\n",
|
|||
|
" '50 MP + 12 MP + 8 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '64 MP + 50 MP + 50 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '200 MP + 8 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 8 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 13 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '200 MP + 64 MP + 50 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '50 MP + 12 MP + 12 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP + 50 MP + 50 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '108 MP + 8 MP + 2 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '50 MP Quad Rear & 50 MP Front Camera'\n",
|
|||
|
" '50 MP Quad Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 50 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '108 MP + 32 MP + 12 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50.3 MP + 50 MP + 50 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP Quad Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 12 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '200 MP + 50 MP + 50 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '200 MP + 50 MP + 50 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '200 MP Quad Rear & 60 MP Front Camera'\n",
|
|||
|
" '50.3 MP Quad Rear & 32 MP Front Camera'\n",
|
|||
|
" '12 MP Quad Rear & 13 MP Front Camera'\n",
|
|||
|
" '50 MP + Depth Sensor Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP + 2 MP + 0.3 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP + 0.08 MP Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '50 MP + 0.3 MP Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '50 MP + 2 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP + 2 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '48 MP + 2 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '13 MP Quad Rear & 8 MP Front Camera'\n",
|
|||
|
" '108 MP + 2 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP Rear & 8 MP Front Camera'\n",
|
|||
|
" '48 MP + 8 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '64 MP Quad Rear & 16 MP Front Camera'\n",
|
|||
|
" '108 MP + 2 MP Dual Rear Camera'\n",
|
|||
|
" '13 MP + Depth Sensor Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '13 MP + 0.3 MP Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '48 MP + 2 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '13 MP + 8 MP Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '6.5 inches, 1080 x 2400 px, 90 Hz Display with Punch Hole'\n",
|
|||
|
" '108 MP + 2 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '108 MP + 8 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '108 MP Quad Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP + 5 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP Quad Rear & 32 MP + 8 MP Dual Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 32 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 32 MP + 8 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP + 50 MP + 8 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP + 13 MP + 13 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 8 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 3 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '200 MP + 64 MP + 32 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '200 MP + 64 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 2 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 48 MP + 32 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '108 MP + 13 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '100 MP + 2 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '48 MP Quad Rear & 16 MP Front Camera'\n",
|
|||
|
" '16 MP Quad Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP + 32 MP + 8 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 32 MP + 8 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '108 MP + 2 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 32 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '108 MP + 5 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 32 MP + 32 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 2 MP + Ultra Wide Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '48 MP + 13 MP + 12 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 13 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" 'Dual Display'\n",
|
|||
|
" '48 MP + 48 MP + 13 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP + 2 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '64 MP + 12 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '48 MP + 13 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '108 MP + 13 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP + 8 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP + 13 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 13 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '48 MP + 13 MP + 13 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 8 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 12 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP + 50 MP + 50 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 14.6 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 13 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '108 MP + 50 MP + 32 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 16 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '200 MP + 50 MP + 50 MP Triple Rear & 60 MP Front Camera'\n",
|
|||
|
" '8 MP + 0.08 MP Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '50 MP Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '8 MP Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '50 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '13 MP Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP Dual Rear & 13 MP Front Camera'\n",
|
|||
|
" '64 MP + 8 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '48 MP + 8 MP + 2 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '64 MP Quad Rear & 13 MP Front Camera'\n",
|
|||
|
" '64 MP Quad Rear & 20 MP + 2 MP Dual Front Camera'\n",
|
|||
|
" '64 MP + 8 MP + 2 MP Triple Rear & 20 MP Front Camera'\n",
|
|||
|
" '64 MP + 8 MP + 5 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '8 MP Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP + 5 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '48 MP + 2 MP + Depth Sensor Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP + 2 MP + 2 MP Triple Rear & 5 MP Front Camera'\n",
|
|||
|
" '108 MP + 2 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '108 MP + 5 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '100 MP + 5 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '108 MP + 5 MP + 2 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '50 MP + 12 MP Dual Rear & 50 MP Front Camera'\n",
|
|||
|
" '50 MP + 12 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP + 5 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '200 MP + 12 MP + 2 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '54 MP + 50 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '160 MP + 8 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 32 MP + 12 MP Triple Rear & 50 MP + 2 MP Dual Front Camera'\n",
|
|||
|
" '54 MP + 8 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '108 MP + 5 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '54 MP + 50 MP + 8 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '160 MP + 50 MP + 2 MP Triple Rear & 50 MP + 2 MP Dual Front Camera'\n",
|
|||
|
" '108 MP + 32 MP + 12 MP Triple Rear & 50 MP + 2 MP Dual Front Camera'\n",
|
|||
|
" '40 MP + 12 MP + 8 MP Triple Rear & 32 MP + 8 MP Dual Front Camera'\n",
|
|||
|
" '50 MP + 16 MP + 8 MP Triple Rear & 32 MP + 8 MP Dual Front Camera'\n",
|
|||
|
" '200 MP + 50 MP + 8 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '200 MP + 32 MP + 12 MP Triple Rear & 50 MP + 2 MP Dual Front Camera'\n",
|
|||
|
" '180 MP + 50 MP + 50 MP Triple Rear & 50 MP Dual Front Camera'\n",
|
|||
|
" '50 MP Quad Rear & 12 MP + TOF 3D Dual Front Camera'\n",
|
|||
|
" '54 MP Quad Rear & 12 MP Front Camera'\n",
|
|||
|
" '50 MP Quad Rear & 12 MP Dual Front Camera'\n",
|
|||
|
" '50 MP Penta Rear & 12 MP + Depth Sensor Dual Front Camera'\n",
|
|||
|
" '50 MP Quad Rear & 13 MP Dual Front Camera'\n",
|
|||
|
" '50 MP + 50 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP + 50 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP + 50 MP + 32 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '12.2 MP Rear & 8 MP Front Camera'\n",
|
|||
|
" '16 MP + 12.2 MP Dual Rear & 8 MP + TOF 3D Dual Front Camera'\n",
|
|||
|
" '16 MP + 12.2 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '64 MP + 13 MP Dual Rear & 13 MP Front Camera'\n",
|
|||
|
" '12.2 MP + 12 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP + 12 MP Dual Rear & 10.8 MP Front Camera'\n",
|
|||
|
" '108 MP + 13 MP Dual Rear & 13 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP Dual Rear & 12 MP Front Camera'\n",
|
|||
|
" '50 MP + 48 MP + 12 MP Triple Rear & 10.8 MP Front Camera'\n",
|
|||
|
" '50 MP + 12 MP Dual Rear & 10.5 MP Front Camera'\n",
|
|||
|
" '16 MP + 16 MP + 12 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP + 48 MP + 12 MP Triple Rear & 10.8 MP Front Camera'\n",
|
|||
|
" '50 MP + 48 MP + 48 MP Triple Rear & 10.5 MP Front Camera'\n",
|
|||
|
" '50 MP + 48 MP + 12 MP Triple Rear & 12 MP Front Camera'\n",
|
|||
|
" '8 MP + 2 MP + 0.3 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '13 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP + 12 MP + 5 MP Triple Rear & 12 MP Front Camera'\n",
|
|||
|
" '64 MP + 13 MP + 5 MP Triple Rear & 24 MP Front Camera'\n",
|
|||
|
" '50 MP + 12 MP Dual Rear & 12 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 5 MP Triple Rear & 12 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 5 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 32 MP + 13 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '108 MP + 13 MP + 5 MP Triple Rear & 12 MP Front Camera'\n",
|
|||
|
" '48 MP + 8 MP + 5 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '13 MP + 5 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '16 MP + 2 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '64 MP + 13 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP + 16 MP Dual Rear & 44 MP Front Camera'\n",
|
|||
|
" '64 MP + 16 MP Dual Rear & 20 MP Front Camera'\n",
|
|||
|
" '16 MP Rear & 13 MP Front Camera'\n",
|
|||
|
" '13 MP Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '16 MP + 5 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '48 MP + 2 MP Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '64 MP Quad Rear & 50 MP Front Camera'\n",
|
|||
|
" '64 MP + 13 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 5 MP + 2 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '48 MP + 8 MP + 5 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '64 MP + 12 MP + 5 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '200 MP + 12 MP + 12 MP Triple Rear & 60 MP Front Camera'\n",
|
|||
|
" '108 MP + 12 MP + 12 MP Triple Rear & 12 MP Front Camera'\n",
|
|||
|
" '13 MP + 8 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '108 MP + 2 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP + 8 MP Dual Rear & 50 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 12 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '100 MP + 2 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP + 50 MP + 3 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '16 MP Rear & 5 MP Front Camera'\n",
|
|||
|
" '16 MP + 5 MP Dual Rear & 12 MP Front Camera'\n",
|
|||
|
" '50 MP + Macro Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '48 MP + 2 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP Rear & 5 MP Front Camera'\n",
|
|||
|
" '16 MP + 2 MP + 2 MP Triple Rear & 5 MP Front Camera'\n",
|
|||
|
" '16 MP + 2 MP Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '16 MP + 2 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '64 MP + 13 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 5 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '108 MP + 13 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '108 MP + 16 MP + 8 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '48 MP + 8 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 10 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 2 MP Triple Rear & 60 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 12 MP Triple Rear & 60 MP Front Camera'\n",
|
|||
|
" '108 MP + 13 MP + 5 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '108 MP + 16 MP + 8 MP Triple Rear & 25 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 2 MP Triple Rear & 32 MP + 16 MP Dual Front Camera'\n",
|
|||
|
" '200 MP + 50 MP + 12 MP Triple Rear & 60 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 2 MP Triple Rear & 60 MP + 60 MP Triple Front Camera'\n",
|
|||
|
" '64 MP + 16 MP + 2 MP Triple Rear & 16 MP + 8 MP Dual Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 50 MP Triple Rear & 60 MP Front Camera'\n",
|
|||
|
" '100 MP + 50 MP + 50 MP Triple Rear & 50 MP Front Camera'\n",
|
|||
|
" '200 MP + 50 MP + 2 MP Triple Rear & 60 MP Front Camera'\n",
|
|||
|
" '108 MP + 2 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '13 MP + 2 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '48 MP + 16 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '48 MP + 2 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 16 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '48 MP + 16 MP + 12 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 48 MP + 8 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 48 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 48 MP + 32 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP + 50 MP + 48 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 5 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '108 MP + 50 MP + 48 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '200 MP + 50 MP + 48 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '48 MP + 16 MP + 8 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 2 MP + 0.08 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '8 MP + Depth Sensor Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '50 MP + Depth Sensor Dual Rear & 5 MP Front Camera'\n",
|
|||
|
" '13 MP Rear Camera' '50 MP Quad Rear & 13 MP Front Camera'\n",
|
|||
|
" '48 MP + 8 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP Dual Rear & 20 MP Front Camera'\n",
|
|||
|
" '200 MP + 8 MP + 2 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP + 8 MP + 5 MP Triple Rear & 20 MP Front Camera'\n",
|
|||
|
" 'Foldable Display' '50 MP + 8 MP Dual Rear & 60 MP Front Camera'\n",
|
|||
|
" 'Memory Card (Hybrid)' '50 MP + 2 MP Triple Rear & 5 MP Front Camera'\n",
|
|||
|
" '100 MP + 8 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '48 MP + 5 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP + 2 MP Triple Rear & 60 MP Front Camera'\n",
|
|||
|
" '108 MP + 8 MP Dual Rear & 60 MP + 8 MP Dual Front Camera'\n",
|
|||
|
" '50 MP + 8 MP Dual Rear & 60 MP + 8 MP Dual Front Camera'\n",
|
|||
|
" '50 MP + 8 MP + 2 MP Triple Rear & 60 MP + 8 MP Dual Front Camera'\n",
|
|||
|
" '50 MP + 13 MP Dual Rear & 13 MP Front Camera'\n",
|
|||
|
" '48 MP + 13 MP + 12 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 12 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 40 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '50 MP + 48 MP + 12.5 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '48 MP + 48 MP + 13 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 50 MP Triple Rear & 16 MP Dual Front Camera'\n",
|
|||
|
" '40 MP Quad Rear & 32 MP Dual Front Camera'\n",
|
|||
|
" '50 MP + 48 MP + 12 MP Triple Rear & 13 MP Dual Front Camera'\n",
|
|||
|
" '64 MP + 50 MP + 13 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '50 MP + 32 MP + 12 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '48 MP + 48 MP + 40 MP Triple Rear & 13 MP Front Camera'\n",
|
|||
|
" '50 MP + 20 MP + 12 MP Triple Rear & 13 MP Dual Front Camera'\n",
|
|||
|
" '50 MP Penta Rear & 13 MP Dual Front Camera'\n",
|
|||
|
" '50 MP Quad Rear & 32 MP Dual Front Camera'\n",
|
|||
|
" '108 MP + 13 MP + 12 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + Depth Sensor Triple Rear & 5 MP Front Camera'\n",
|
|||
|
" '100 MP + 2 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '108 MP + 0.08 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 0.08 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '13 MP Triple Rear & 5 MP Front Camera'\n",
|
|||
|
" '48 MP + 0.08 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '13 MP + 2 MP Triple Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP + 5 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP + 5 MP + 2 MP Triple Rear & 32 MP Front Camera']\n",
|
|||
|
"['Memory Card Supported, upto 1 TB' 'Memory Card Supported, upto 512 GB'\n",
|
|||
|
" 'Memory Card Supported' 'Memory Card (Hybrid), upto 1 TB'\n",
|
|||
|
" 'Memory Card Not Supported' 'Memory Card (Hybrid)'\n",
|
|||
|
" '12 MP + 12 MP Dual Rear & 10 MP Front Camera' 'Android v13'\n",
|
|||
|
" 'Android v10' 'Android v12' 'Memory Card (Hybrid), upto 512 GB'\n",
|
|||
|
" '50 MP + 12 MP + 5 MP Triple Rear & 10 MP + 4 MP Dual Front Camera'\n",
|
|||
|
" '200 MP Quad Rear & 12 MP + 12 MP Dual Front Camera'\n",
|
|||
|
" '50 MP + 12 MP + 10 MP Triple Rear & 10 MP + 4 MP Dual Front Camera'\n",
|
|||
|
" '50 MP + 12 MP + 10 MP Triple Rear & 12 MP + 12 MP Dual Front Camera'\n",
|
|||
|
" 'Memory Card Supported, upto 256 GB' 'Memory Card Supported, upto 128 GB'\n",
|
|||
|
" 'Android v11' 'Android v15' 'Android v14'\n",
|
|||
|
" '50 MP + 12 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 50 MP Triple Rear & 32 MP + 32 MP Dual Front Camera'\n",
|
|||
|
" '200 MP + 12 MP + 12 MP Triple Rear & 32 MP + 32 MP Dual Front Camera'\n",
|
|||
|
" '50 MP Quad Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP + 50 MP + 50 MP Triple Rear & 32 MP + 32 MP Dual Front Camera'\n",
|
|||
|
" '50 MP Quad Rear & 16 MP + 16 MP Dual Front Camera'\n",
|
|||
|
" '48 MP + 48 MP + 10 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '200 MP + 12 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 12 MP + 12 MP Triple Rear & 16 MP + 16 MP Dual Front Camera'\n",
|
|||
|
" '64 MP + 12 MP + 12 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" 'Memory Card Supported, upto 2 TB' 'Memory Card (Hybrid), upto 2 TB'\n",
|
|||
|
" '48 MP Quad Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 48 MP + 32 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 8 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 48 MP + 32 MP Triple Rear & 32 MP + 32 MP Dual Front Camera'\n",
|
|||
|
" '108 MP + 50 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP + 10 MP + 8 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '64 MP + 16 MP + 12 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '108 MP + 50 MP + 32 MP Triple Rear & 32 MP + 32 MP Dual Front Camera'\n",
|
|||
|
" '64 MP + 48 MP + 48 MP Triple Rear & 32 MP + 20 MP Dual Front Camera'\n",
|
|||
|
" 'Memory Card (Hybrid), upto 256 GB'\n",
|
|||
|
" '50 MP + 12 MP Dual Rear & 8 MP Front Camera'\n",
|
|||
|
" '50 MP + 20 MP + 12 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 32 MP + 8 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 32 MP + 10 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 20 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '54 MP + 50 MP + 8 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" '108 MP + 8 MP + 5 MP Triple Rear & 16 MP Front Camera'\n",
|
|||
|
" 'Android v9.0 (Pie)' '48 MP + 12 MP Dual Rear & 10 MP Front Camera'\n",
|
|||
|
" '48 MP + 10.8 MP + 10.8 MP Triple Rear & 9.5 MP + 8 MP Dual Front Camera'\n",
|
|||
|
" '50 MP + 10.8 MP + 10.8 MP Triple Rear & 12 MP + 12 MP Dual Front Camera'\n",
|
|||
|
" 'Memory Card Supported, upto 32 GB'\n",
|
|||
|
" '64 MP + 13 MP + 8 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 5 MP Triple Rear & 12 MP Front Camera'\n",
|
|||
|
" '64 MP + 13 MP + 0.3 MP Triple Rear & 10 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP Dual Rear & 16 MP Front Camera'\n",
|
|||
|
" '64 MP + 13 MP Dual Rear & 32 MP Front Camera' 'Android v10.0'\n",
|
|||
|
" '64 MP + 13 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '13 MP + 12 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 50 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP Dual Rear & 32 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP + 2 MP Triple Rear & 32 MP Front Camera'\n",
|
|||
|
" '16 MP Rear & 5 MP Front Camera' 'Android v12.1' 'No FM Radio'\n",
|
|||
|
" '50 MP + 50 MP + 13 MP Triple Rear & 32 MP + 16 MP Dual Front Camera'\n",
|
|||
|
" '50 MP + 50 MP + 32 MP Triple Rear & 32 MP + 32 MP Dual Front Camera'\n",
|
|||
|
" '50 MP Hexa Rear & 32 MP Front Camera' 'Android' 'HarmonyOS v4'\n",
|
|||
|
" 'EMUI v14' 'HarmonyOS v3.0' 'HarmonyOS' 'HarmonyOS v4.0' 'HarmonyOS v5.0'\n",
|
|||
|
" 'HarmonyOS v2.0'\n",
|
|||
|
" '48 MP + 20 MP + 13 MP Triple Rear & 10.7 MP Front Camera'\n",
|
|||
|
" 'HarmonyOS v4.2' 'HarmonyOS v5'\n",
|
|||
|
" '50 MP Quad Rear & 10.7 MP Front Camera'\n",
|
|||
|
" '50 MP + 13 MP Dual Rear & 12 MP Front Camera'\n",
|
|||
|
" '50 MP + 48 MP + 8 MP Triple Rear & 32 MP Front Camera']\n",
|
|||
|
"['13' '10' '12' '11' '15' '10.0' '9.0 (Pie)' '14' nan '7.1.1 (Nougat)'\n",
|
|||
|
" '8.0 (Oreo)' '8.1 (Oreo)' '5.1 (Lollipop)' '6.0 (Marshmallow)' '9 (Pie)'\n",
|
|||
|
" '3' '2' '4.0' '3.0 (Honeycomb)' '2.0' '3.0' '3.1' '5.0' '4.1']\n",
|
|||
|
"['9,999' '9,990' '11,999' '11,990' '11,599' '12,298' '14,999' '14,990'\n",
|
|||
|
" '14,949' '19,999' '19,990' '19,799' '19,499' '18,999' '18,990' '20,999'\n",
|
|||
|
" '29,999' '28,990' '30,500' '30,999' '39,999' '39,990' '38,900' '37,999'\n",
|
|||
|
" '41,289' '41,790' '42,990' '42,999' '49,999' '49,990' '49,000' '47,990'\n",
|
|||
|
" '44,999' '44,990' '51,999' '54,990' '54,999' '59,999' '57,990' '64,999'\n",
|
|||
|
" '65,690' '69,990' '69,999' '70,000' '1,99,990' '1,84,999' '1,79,990'\n",
|
|||
|
" '1,77,999' '1,64,999' '1,59,999' '1,54,999' '1,39,999' '1,30,376'\n",
|
|||
|
" '1,29,999' '6,990' '6,999' '7,499' '7,999' '8,033' '8,199' '8,490'\n",
|
|||
|
" '9,499' '10,199' '10,499' '11,899' '11,580' '11,490' '11,390' '10,999'\n",
|
|||
|
" '12,350' '12,490' '15,050' '29,990' '29,799' '30,739' '31,398' '31,990'\n",
|
|||
|
" '38,990' '38,799' '37,990' '40,990' '49,940' '48,990' '46,990' '45,990'\n",
|
|||
|
" '45,210' '50,999' '56,990' '58,990' '62,990' '63,999' '64,990' '65,490'\n",
|
|||
|
" '71,990' '74,899' '76,990' '79,999' '80,990' '1,39,990' '1,18,990'\n",
|
|||
|
" '1,15,990' '1,13,990' '1,10,990' '1,09,990' '1,07,990' '1,06,990'\n",
|
|||
|
" '99,990' '94,999' '89,999' '89,990' '82,990' '6,950' '7,199' '7,450'\n",
|
|||
|
" '7,480' '7,790' '7,815' '7,850' '7,919' '7,920' '7,945' '7,950' '7,980'\n",
|
|||
|
" '9,893' '9,820' '10,299' '10,390' '11,910' '11,749' '11,499' '12,251'\n",
|
|||
|
" '14,844' '14,499' '13,999' '15,299' '15,329' '15,749' '15,990' '19,783'\n",
|
|||
|
" '20,499' '20,500' '20,599' '30,049' '29,996' '28,979' '28,339' '31,089'\n",
|
|||
|
" '38,999' '36,999' '36,990' '35,999' '34,999' '34,990' '33,999' '33,990'\n",
|
|||
|
" '15,499' '20,699' '20,990' '28,900' '30,200' '30,900' '35,990' '45,999'\n",
|
|||
|
" '47,999' '49,499' '50,990' '78,990' '79,990' '84,990' '84,999' '94,990'\n",
|
|||
|
" '1,34,999' '1,29,990' '1,19,900' '1,14,990' '10,990' '12,899' '12,990'\n",
|
|||
|
" '13,499' '13,990' '15,999' '17,990' '17,999' '21,838' '22,486' '22,990'\n",
|
|||
|
" '22,999' '28,999' '26,990' '25,999' '25,990' '30,990' '32,990' '43,990'\n",
|
|||
|
" '52,652' '52,999' '57,999' '72,990' '76,429' '7,299' '7,580' '7,890'\n",
|
|||
|
" '7,972' '7,990' '8,499' '8,689' '8,990' '8,999' '9,799' '9,690' '9,249'\n",
|
|||
|
" '10,330' '10,880' '11,539' '12,194' '12,999' '13,267' '13,290' '13,490'\n",
|
|||
|
" '14,899' '14,950' '15,590' '16,999' '17,945' '19,490' '21,990' '21,999'\n",
|
|||
|
" '24,499' '24,990' '25,890' '26,499' '27,990' '27,999' '27,199' '31,999'\n",
|
|||
|
" '55,990' '56,999' '66,499' '67,990' '77,990' '1,02,999' '1,87,990'\n",
|
|||
|
" '1,24,999' '1,04,999' '1,03,999' '23,999' '40,299' '40,999' '32,999'\n",
|
|||
|
" '43,999' '46,999' '59,990' '62,999' '74,990' '1,01,999' '1,08,999'\n",
|
|||
|
" '1,25,990' '1,46,990' '1,59,990' '7,190' '7,309' '7,394' '63,990'\n",
|
|||
|
" '70,990' '71,999' '72,999' '74,999' '1,09,900' '82,999' '81,990' '7,124'\n",
|
|||
|
" '7,290' '9,099' '7,599' '9,490' '7,899' '8,899' '8,690' '11,110' '11,450'\n",
|
|||
|
" '11,000' '10,631' '10,900' '10,490' '12,332' '13,429' '13,599' '14,199'\n",
|
|||
|
" '15,982' '16,990' '17,900' '17,499' '24,999' '22,863' '27,899' '26,690'\n",
|
|||
|
" '25,171' '21,499' '21,390' '26,899' '22,492' '36,880' '33,779' '32,883'\n",
|
|||
|
" '33,499' '35,499' '41,740' '1,24,990' '89,748' '99,999' '81,999'\n",
|
|||
|
" '1,05,999' '1,03,000' '8,980' '8,489' '8,660' '12,749' '13,950' '16,499'\n",
|
|||
|
" '16,299' '17,995' '15,190' '23,499' '25,299' '21,490' '20,198' '30,799'\n",
|
|||
|
" '36,199' '31,899' '45,215' '68,899' '63,490' '8,349' '7,820' '8,890'\n",
|
|||
|
" '9,478' '9,764' '9,489' '8,744' '9,800' '11,049' '10,190' '10,466'\n",
|
|||
|
" '10,750' '10,899' '12,877' '13,374' '12,499' '12,900' '13,489' '15,323'\n",
|
|||
|
" '18,708' '16,485' '18,398' '18,577' '16,400' '16,949' '17,949' '16,998'\n",
|
|||
|
" '17,789' '16,500' '21,828' '27,875' '21,477' '23,880' '23,900' '20,615'\n",
|
|||
|
" '23,649' '29,004' '22,799' '26,999' '24,150' '33,900' '52,990' '1,04,990'\n",
|
|||
|
" '7,998' '7,090' '14,989' '18,928' '23,990' '41,990' '88,990' '1,49,999'\n",
|
|||
|
" '20,000' '16,899' '18,879' '16,134' '24,454' '20,065' '22,592' '26,674'\n",
|
|||
|
" '22,499' '35,609' '39,888' '42,437' '43,889' '40,108' '47,998' '43,299'\n",
|
|||
|
" '58,699' '55,999' '63,359' '7,699' '9,190' '7,900' '7,689' '9,998'\n",
|
|||
|
" '11,159' '11,350' '10,269' '11,489' '11,425' '10,949' '12,120' '12,239'\n",
|
|||
|
" '12,428' '15,898' '18,377' '20,075' '17,975' '16,890' '18,390' '18,499'\n",
|
|||
|
" '22,297' '28,517' '24,329' '20,048' '26,479' '24,890' '24,449' '36,898'\n",
|
|||
|
" '44,949' '69,899' '53,990' '83,999' '93,990' '2,14,990' '1,34,990'\n",
|
|||
|
" '1,21,999' '1,91,999' '92,990' '25,499' '7,319' '10,749' '10,489' '8,799'\n",
|
|||
|
" '8,346' '7,949' '1,19,990']\n",
|
|||
|
"['Samsung' 'Vivo' 'Realme' 'OPPO' 'Oppo' 'iQOO' 'IQOO' 'Poco' 'POCO'\n",
|
|||
|
" 'Honor' 'Nothing' 'Google' 'itel' 'Itel' 'Asus' 'LG' 'Lenovo' 'Gionee'\n",
|
|||
|
" 'Motorola' 'OnePlus' 'Xiaomi' 'Tecno' 'Huawei' 'Lava' 'Coolpad' 'TCL']\n",
|
|||
|
"[' 128 GB inbuilt' ' 32 GB inbuilt' ' 64 GB inbuilt' ' 256 GB inbuilt'\n",
|
|||
|
" ' 1 TB inbuilt' ' 512 GB inbuilt' ' 16 GB inbuilt' ' Octa Core'\n",
|
|||
|
" ' 258 GB inbuilt' ' 8 GB inbuilt' nan]\n",
|
|||
|
"[' 25W Fast Charging' ' 15W Fast Charging' nan ' 18W Fast Charging'\n",
|
|||
|
" ' 30W Fast Charging' ' Fast Charging' ' 45W Fast Charging'\n",
|
|||
|
" ' 33W Fast Charging' ' 67W Fast Charging' ' 80W Fast Charging'\n",
|
|||
|
" ' 10W Fast Charging' ' 44W Fast Charging' ' 66W Fast Charging'\n",
|
|||
|
" ' 100W Fast Charging' ' 120W Fast Charging' ' 150W Fast Charging'\n",
|
|||
|
" ' 55W Fast Charging' ' 200W Fast Charging' ' 65W Fast Charging'\n",
|
|||
|
" ' 60W Fast Charging' ' 20W Fast Charging' ' 50W Fast Charging'\n",
|
|||
|
" ' 57W Fast Charging' ' 240W Fast Charging' ' 125W Fast Charging'\n",
|
|||
|
" ' 68W Fast Charging' ' 250W Fast Charging' ' 27W Fast Charging'\n",
|
|||
|
" ' 35W Fast Charging' ' 22.5W Fast Charging' ' 40W Fast Charging'\n",
|
|||
|
" ' 90W Fast Charging' ' 08W Fast Charging' ' 68.2W Fast Charging'\n",
|
|||
|
" ' 135W Fast Charging' ' 70W Fast Charging' ' Water Drop Notch'\n",
|
|||
|
" ' 88W Fast Charging' ' 7.5W Fast Charging']\n",
|
|||
|
"[' 2408 x 1080 px Display with Water Drop Notch'\n",
|
|||
|
" ' 720 x 1560 px Display with Punch Hole'\n",
|
|||
|
" ' 1080 x 2408 px Display with Water Drop Notch' ' 720 x 1600 px'\n",
|
|||
|
" ' 720 x 1600 px Display with Water Drop Notch'\n",
|
|||
|
" ' 1080 x 2340 px Display with Water Drop Notch'\n",
|
|||
|
" ' 720 x 1560 px Display with Water Drop Notch' ' 1080 x 2408 px'\n",
|
|||
|
" ' 1080 x 2400 px Display with Water Drop Notch' ' 1080 x 2340 px'\n",
|
|||
|
" ' 1080 x 2400 px' ' 720 x 1520 px Display with Water Drop Notch'\n",
|
|||
|
" ' 1080 x 2400 px Display with Punch Hole' ' 1440 x 3200 px'\n",
|
|||
|
" ' 1080 x 2340 px Display with Punch Hole' ' 1080 x 2640 px'\n",
|
|||
|
" ' 1080 x 2412 px' ' 1440 x 3040 px Display with Punch Hole'\n",
|
|||
|
" ' 1080 x 2400 px Display' ' 1080 x 2460 px Display with Punch Hole'\n",
|
|||
|
" ' 1440 x 3040 px Display' ' 1440 x 2960 px Display' ' 1812 x 2176 px'\n",
|
|||
|
" ' 1440 x 3120 px' ' 1440 x 3080 px' ' 720 x 1612 px'\n",
|
|||
|
" ' 480 x 854 px Display' ' 720 x 1544 px Display with Water Drop Notch'\n",
|
|||
|
" ' 720 x 1612 px Display with Water Drop Notch' ' 1600 x 720 px'\n",
|
|||
|
" ' 1080 x 2388 px Display with Water Drop Notch' ' 720 x 1280 px Display'\n",
|
|||
|
" ' 1612 x 720 px' ' 1080 x 2376 px' ' 1800 x 3200 px'\n",
|
|||
|
" ' 1080 x 2400 px Display with Small Notch' ' 1080 x 2388 px'\n",
|
|||
|
" ' 1260 x 2800 px' ' 1260 x 2712 px' ' 1080 x 2256 px Display'\n",
|
|||
|
" ' 1080 x 2520 px' ' 2200 x 2480 px' ' 1916 x 2160 px' ' 1768 x 2208 px'\n",
|
|||
|
" ' 1600 x 720 px Display with Water Drop Notch' ' 720 x 1604 px'\n",
|
|||
|
" ' 1080 x 2460 px' ' 720 x 1600 px Display with Punch Hole' nan\n",
|
|||
|
" ' 1264 x 2780 px' ' 1240 x 2772 px'\n",
|
|||
|
" ' 1440 x 3200 px Display with Punch Hole' ' 2400 x 1080 px'\n",
|
|||
|
" ' 1864 x 3820 px' ' 1440 x 3216 px' ' 1080 x 2732 px' ' 1440 x 3168 px'\n",
|
|||
|
" ' 1200 x 2400 px' ' 1792 x 1920 px' ' 1800 x 3400 px'\n",
|
|||
|
" ' 1440 x 3200 px Display' ' 2268 x 2440 px'\n",
|
|||
|
" ' 1080 x 2388 px Display with Punch Hole' ' 1800 x 3440 px'\n",
|
|||
|
" ' 720 x 1650 px' ' 720 x 1650 px Display with Water Drop Notch'\n",
|
|||
|
" ' 720 x 1680 px Display with Water Drop Notch' ' 720 x 1680 px'\n",
|
|||
|
" ' 1220 x 2712 px' ' 1600 x 2560 px' ' 1080 x 2404 px' ' 1220 x 3200 px'\n",
|
|||
|
" ' 1200 x 2400 px Display with Water Drop Notch' ' 1220 x 2652 px'\n",
|
|||
|
" ' 1080 x 2412 px Display with Small Notch' ' 1200 x 2664 px'\n",
|
|||
|
" ' 1224 x 2700 px' ' 1200 x 2652 px'\n",
|
|||
|
" ' 1080 x 2400 px Display with Dual Punch Hole' ' 1264 x 2800 px'\n",
|
|||
|
" ' 1280 x 2800 px' ' 2016 x 2348 px' ' 1312 x 2848 px' ' 2156 x 2344 px'\n",
|
|||
|
" ' 1224 x 2688 px' ' 1344 x 2772 px' ' 1984 x 2272 px'\n",
|
|||
|
" ' 2200 x 2480 px Display' ' 1084 x 2412 px' ' 1084 x 2728 px'\n",
|
|||
|
" ' 1080 x 2220 px Display' ' 1080 x 2280 px Display' ' 1344 x 2992 px'\n",
|
|||
|
" ' 1940 x 3120 px' ' 1840 x 2208 px'\n",
|
|||
|
" ' 1600 x 720 px Display with Punch Hole' ' 720 x 1640 px'\n",
|
|||
|
" ' 1080 x 2448 px' ' 2340 x 1080 px'\n",
|
|||
|
" ' 1080 x 2460 px Display with Water Drop Notch' ' 1080 x 1920 px Display'\n",
|
|||
|
" ' 720 x 1440 px Display' ' 720 x 1600 px Display with Large Notch'\n",
|
|||
|
" ' 1080 x 2400 px Display with Large Notch' ' 540 x 960 px Display'\n",
|
|||
|
" ' 1440 x 3088 px' ' 1080 x 2408 px Display with Punch Hole'\n",
|
|||
|
" ' Full HD+ Display with Punch Hole'\n",
|
|||
|
" ' 1080 x 2246 px Display with Large Notch' ' 2460 x 1080 px'\n",
|
|||
|
" ' 1080 x 1920 px' ' 720 x 1612 px Display with Punch Hole'\n",
|
|||
|
" ' 1200 x 2780 px' ' 876 x 2142 px Display with Large Notch'\n",
|
|||
|
" ' 1440 x 2780 px' ' 1440 x 3412 px' ' 1440 x 3120 px Display'\n",
|
|||
|
" ' 576 x 1440 px Display' ' 720 x 1650 px Display with Punch Hole'\n",
|
|||
|
" ' 1080 x 2280 px Display with Water Drop Notch' ' 1080 x 2480 px'\n",
|
|||
|
" ' 2000 x 2296 px' ' 1596 x 2296 px Display' ' 1080 x 2160 px'\n",
|
|||
|
" ' 1224 x 2776 px' ' 1220 x 2700 px' ' 1260 x 2844 px' ' 1212 x 2616 px'\n",
|
|||
|
" ' 1256 x 2760 px' ' 1176 x 2400 px Display with Large Notch'\n",
|
|||
|
" ' 1860 x 3220 px' ' 1216 x 2688 px' ' 1260 x 2720 px'\n",
|
|||
|
" ' 1344 x 2772 px Display' ' 1200 x 2640 px' ' 1136 x 2690 px'\n",
|
|||
|
" ' 1188 x 2790 px' ' 1080 x 2388 px Display'\n",
|
|||
|
" ' 1080 x 2412 px Display with Punch Hole' ' 540 x 1092 px Display'\n",
|
|||
|
" ' 480 x 960 px Display' ' 720 x 1640 px Display with Water Drop Notch']\n",
|
|||
|
"[' Octa Core Processor' ' 1.8 GHz Processor' ' 2 GHz Processor'\n",
|
|||
|
" ' Octa Core' nan ' Quad Core' ' Nine-Cores' ' Nine Core' ' Nine Cores'\n",
|
|||
|
" ' Deca Core Processor' ' 1.3 GHz Processor' ' 1.6 GHz Processor'\n",
|
|||
|
" ' 2.3 GHz Processor' ' Deca Core' ' 128 GB inbuilt']\n",
|
|||
|
"['Exynos 1330' 'Octa Core' 'Helio G88' 'Helio P35' 'Dimensity 700'\n",
|
|||
|
" 'Exynos 9611' 'Exynos 850' 'Exynos 1280' 'Snapdragon 695' 'Exynos 850'\n",
|
|||
|
" 'Helio P65' 'Octa Core Processor' 'Snapdragon 680' 'Helio G80'\n",
|
|||
|
" 'Samsung Exynos 7884' 'Dimensity 6100 Plus' 'Dimensity 700 5G'\n",
|
|||
|
" 'Snapdragon 680' 'Snapdragon 888' 'Exynos 1380' 'Snapdragon 865'\n",
|
|||
|
" 'Exynos 980' 'Snapdragon 730' 'Snapdragon 675' 'Snapdragon 7 Gen1'\n",
|
|||
|
" 'Snapdragon 750G' 'Snapdragon 855+' 'Snapdragon 870' 'Snapdragon 710'\n",
|
|||
|
" 'Exynos 1480' 'Snapdragon 720G ' 'Snapdragon 778g' 'Exynos 2200'\n",
|
|||
|
" 'Snapdragon 7+ Gen2' 'Snapdragon 8 Gen 2' 'Exynos 9825'\n",
|
|||
|
" 'Snapdragon 7s Gen2' 'Exynos 2100' 'Dimensity 1300' 'Snapdragon 778G+'\n",
|
|||
|
" 'Snapdragon 778G' 'Exynos 2300' 'Snapdragon 8+ Gen1' 'Snapdragon 8 Gen3'\n",
|
|||
|
" 'Snapdragon 8+ Gen1' 'Snapdragon 8 Gen1' 'Exynos 990' 'Snapdragon 855'\n",
|
|||
|
" 'Exynos 8895' 'Exynos 2100' 'Exynos 9810' 'Snapdragon 8 Gen2'\n",
|
|||
|
" 'Helio G85' 'Helio P22' 'Helio MT6580' 'Snapdragon 439 ' 'Helio'\n",
|
|||
|
" 'Snapdragon 675' 'Snapdragon 450' 'Dimensity 6020' 'Helio P22'\n",
|
|||
|
" 'Helio G70' 'Snapdragon 680 ' 'Snapdragon 460' 'Snapdragon 430'\n",
|
|||
|
" 'Helio P70 ' 'Snapdragon MSM8937' 'Snapdragon 6 Gen1'\n",
|
|||
|
" 'Snapdragon 7 Gen2' 'Dimensity 7200' 'Snapdragon 4 Gen2' 'Snapdragon 685'\n",
|
|||
|
" 'Helio G99' 'Dimensity 1200' 'Dimensity 800U ' 'Snapdragon'\n",
|
|||
|
" 'Snapdragon 765G ' 'Dimensity 8200' 'Snapdragon 7 Gen3' 'Snapdragon 782G'\n",
|
|||
|
" 'Dimensity 9300' 'Dimensity 9200' 'Dimensity 1100' 'Dimensity 8200'\n",
|
|||
|
" 'Dimensity 9000 Plus' 'Dimensity 8300' 'Dimensity 9300 Plus'\n",
|
|||
|
" 'Dimensity 9200 Plus' 'Snapdragon 888+' 'Dimensity 9000'\n",
|
|||
|
" 'Dimensity 9400' 'Snapdragon 888 ' 'Snapdragon 8 Gen1' 'Unisoc SC9863A'\n",
|
|||
|
" 'Helio G35' 'Tiger T612' 'Unisoc T610' 'SC9863A' 'Unisoc SC9863A'\n",
|
|||
|
" 'Snapdragon 665' 'Unisoc T612' 'Tiger T616' 'Tiger T610' 'Helio G96'\n",
|
|||
|
" 'Helio G36' 'Snapdragon 662' 'Helio G35' 'Dimensity 6300' 'Helio G85 '\n",
|
|||
|
" 'Helio G95' 'Helio G95' 'Dimensity 810 5G' 'Dimensity 810 5G' 'No Wifi'\n",
|
|||
|
" 'Dimensity 7025' 'Dimensity 700 5G' 'Snapdragon 712' 'Dimensity 7050'\n",
|
|||
|
" 'Snapdragon 720G ' 'Snapdragon 7 Gen1' 'Snapdragon 7+ Gen3'\n",
|
|||
|
" 'Snapdragon 695' 'Dimensity 8100' 'Snapdragon 778G' 'Dimensity 1000+'\n",
|
|||
|
" 'Snapdragon 7s Gen3' 'Dimensity 6080' 'Snapdragon 888 '\n",
|
|||
|
" 'Snapdragon 8s Gen3' 'Snapdragon 8 Gen4' 'Snapdragon 8 Gen1 Plus'\n",
|
|||
|
" 'Dimensity 7020' 'Snapdragon 730G' 'Snapdragon 480' 'Snapdragon 662 '\n",
|
|||
|
" 'Dimensity 800U' 'Snapdragon 765G ' 'Dimensity 900'\n",
|
|||
|
" 'Dimensity 1200 Max' 'Dimensity 8100 Max' 'Dimensity 8100-Max'\n",
|
|||
|
" 'Dimensity 9200 Plus' 'Snapdragon 765G' 'Snapdragon 865 '\n",
|
|||
|
" 'Dimensity 9000' 'Snapdragon 4 Gen 1' 'Snapdragon 695 '\n",
|
|||
|
" 'Snapdragon 480+' 'Snapdragon 6 Gen 1' 'Snapdragon 778G Plus'\n",
|
|||
|
" 'Snapdragon 870' 'Helio G85' 'Helio A22' 'Helio G25' 'Helio G37'\n",
|
|||
|
" 'Helio G91' 'Snapdragon 720G' 'Snapdragon 665' 'Snapdragon 732G'\n",
|
|||
|
" 'Snapdragon 695 ' 'Dimensity 920' 'Snapdragon 7s Gen 2'\n",
|
|||
|
" 'Dimensity 8300 Ultra' 'Dimensity 8100' 'Snapdragon 480+'\n",
|
|||
|
" 'Dimensity 7030' 'Dimensity 1100'\n",
|
|||
|
" 'Snapdragon 7 Gen 1 Accelerated Edition' 'Dimensity 8000' 'Exynos 1080'\n",
|
|||
|
" 'Snapdragon 8 Gen 1' 'Dimensity 7200 Pro' 'Snapdragon 778G Plus'\n",
|
|||
|
" 'Qualcomm Snapdragon 670' 'Tensor G2' 'Google Tensor' 'Google Tensor G2'\n",
|
|||
|
" 'Tensor G3' 'Google Tensor 4' 'Google Tensor G2' 'Google Tensor G4'\n",
|
|||
|
" 'Google Tensor 2' 'Quad Core' 'Unisoc T606' 'Unisoc T603' ' Unisoc T606'\n",
|
|||
|
" 'Snapdragon 8 Gen1 Plus' 'Snapdragon 865+' 'Snapdragon 765G'\n",
|
|||
|
" 'Snapdragon 865' 'Helio P25' 'Qualcomm Snapdragon 450' 'Helio P60'\n",
|
|||
|
" 'Tiger T610' 'Tiger T310' 'Unisoc SC9836A' 'Snapdragon 439'\n",
|
|||
|
" 'Unisoc T606' 'Helio MT6737T' 'Snapdragon 450 ' 'Exynos 1280 ' 'Exynos'\n",
|
|||
|
" 'Snapdragon 750G' 'Exynos 1280' 'Dimensity 1080' 'Exynos 2400'\n",
|
|||
|
" 'Snapdragon 480 ' 'Helio P35 ' 'Snapdragon 4 Gen1' 'Dimensity 900'\n",
|
|||
|
" 'Tiger T616' 'Tiger T606' 'Snapdragon 636' 'Helio G37' 'Helio G99'\n",
|
|||
|
" 'Snapdragon SM4375' 'Dimensity 8020' 'Snapdragon 7+ Gen2'\n",
|
|||
|
" 'Snapdragon 778G ' 'Snapdragon 888+ ' 'Snapdragon 750G '\n",
|
|||
|
" 'Snapdragon 888+' ' Dimensity 7030' 'Snapdragon 6 Gen 1'\n",
|
|||
|
" 'Dimensity 1050' 'Snapdragon 8+ Gen2' 'Dimensity 930' 'Snapdragon (4 nm)'\n",
|
|||
|
" 'Snapdragon 460 ' 'Snapdragon 782G' 'Snapdragon 695 5G' 'Snapdragon 690'\n",
|
|||
|
" 'Dimensity 1300' 'Snapdragon 855+' 'Dimensity 1200 AI'\n",
|
|||
|
" 'Snapdragon 8 Gen4' 'Snapdragon 8 Gen2' 'Helio G25' 'Unisoc SC9832E'\n",
|
|||
|
" 'Snapdragon 4 Gen 1' 'Snapdragon 712' 'Dimensity 700 '\n",
|
|||
|
" 'Snapdragon 662 ' 'Helio G96' 'Snapdragon 732G' 'Snapdragon 732G '\n",
|
|||
|
" 'Snapdragon 678' 'Dimensity 8200 Ultra ' 'Dimensity 7200 Ultra'\n",
|
|||
|
" 'Dimensity 920 5G' 'Helio G99 Ultra' 'Dimensity (4 nm)' 'Dimensity 8050'\n",
|
|||
|
" 'Kirin 710A' 'Kirin 710A' 'Snapdragon (6 nm)' 'Snapdragon 778G 4G'\n",
|
|||
|
" '4 GB RAM' 'Sanpdragon 680' 'Kirin 710F' 'Kirin 830' 'Kirin 9000S'\n",
|
|||
|
" 'Kirin' 'Snapdragon 8+ Gen 1' 'Kirin 9010' 'Snapdragon 8+ Gen 1 '\n",
|
|||
|
" 'Kirin 990' 'Kirin 9000E' 'Kirin 9000' 'Kirin 990' ' Helio G36'\n",
|
|||
|
" 'Snapdragon 888' 'Tiger T616' 'Tiger T616 ' 'Helio A22' 'Helio A25']\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"for col in df2.columns:\n",
|
|||
|
" print(df2[col].unique())\n",
|
|||
|
"\n",
|
|||
|
"#Преобразование категориальных данных в числа\n",
|
|||
|
"#Удаление подстроки 'GB RAM', чтобы остались только числа\n",
|
|||
|
"df2['Ram'] = df2['Ram'].replace(' GB RAM', '', regex=True)\n",
|
|||
|
"\n",
|
|||
|
"import re\n",
|
|||
|
"# Удаление строк, у кот. в Ram какое-то неверное значение (оставление только строк, где число)\n",
|
|||
|
"df2 = df2[df2['Ram'].apply(lambda x: bool(re.match(r'^\\d+(\\.\\d+)?$', str(x))))]\n",
|
|||
|
"\n",
|
|||
|
"#Исправление батареи. Удаление подстроки 'mAh Battery', чтобы остались только числа\n",
|
|||
|
"df2['Battery'] = df2['Battery'].replace(' mAh Battery', '', regex=True)\n",
|
|||
|
"\n",
|
|||
|
"#Исправление диагонали. Удаление подстроки 'inches'\n",
|
|||
|
"df2['Display'] = df2['Display'].replace(' inches', '', regex=True)\n",
|
|||
|
"\n",
|
|||
|
"#Исправление встроенной памяти на числа\n",
|
|||
|
"df2['Inbuilt_memory'] = df2['Inbuilt_memory'].replace(' GB inbuilt', '', regex=True)\n",
|
|||
|
"df2['Inbuilt_memory'] = df2['Inbuilt_memory'].replace('TB inbuilt', '024', regex=True)\n",
|
|||
|
"df2['Inbuilt_memory'] = df2['Inbuilt_memory'].replace(' ', '', regex=True)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 305,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Проверка количества запятых в каждой строке\n",
|
|||
|
"df2['comma_count'] = df2['Price'].apply(lambda x: x.count(','))\n",
|
|||
|
"# Удаление строк, где больше одной запятой\n",
|
|||
|
"df2 = df2[df2['comma_count'] <= 1]\n",
|
|||
|
"# Удаление вспомогательного столбца\n",
|
|||
|
"df2 = df2.drop(columns=['comma_count'])\n",
|
|||
|
"df2['Price'] = df2['Price'].replace(',', '.', regex=True)\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"df2['Price'] = pd.to_numeric(df2['Price'], errors='coerce')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"По boxplotам видно, что данные о телефонах смещены в сторону недорогих телефонов до 40 долларов с экранами до 7 дюймов и встроенной памятью до 256 Гб.\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"По цене и диагонали экрана много данных, находящихся вне основной массе, но в данном случае это является полезным шумом. По мощности батареи выбросы можно считать вредным шумом"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 306,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Выбросы в столбце 'Ram':\n",
|
|||
|
"1 2.0\n",
|
|||
|
"39 12.0\n",
|
|||
|
"49 12.0\n",
|
|||
|
"54 12.0\n",
|
|||
|
"65 12.0\n",
|
|||
|
" ... \n",
|
|||
|
"1312 12.0\n",
|
|||
|
"1344 2.0\n",
|
|||
|
"1346 2.0\n",
|
|||
|
"1348 2.0\n",
|
|||
|
"1351 2.0\n",
|
|||
|
"Name: Ram, Length: 267, dtype: float64\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'Battery':\n",
|
|||
|
"0 6000\n",
|
|||
|
"1 4000\n",
|
|||
|
"3 6000\n",
|
|||
|
"6 6000\n",
|
|||
|
"9 6000\n",
|
|||
|
" ... \n",
|
|||
|
"1344 3000\n",
|
|||
|
"1346 3000\n",
|
|||
|
"1349 3000\n",
|
|||
|
"1350 3000\n",
|
|||
|
"1364 4000\n",
|
|||
|
"Name: Battery, Length: 296, dtype: int64\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'Display':\n",
|
|||
|
"15 6.10\n",
|
|||
|
"21 6.21\n",
|
|||
|
"53 6.10\n",
|
|||
|
"64 6.10\n",
|
|||
|
"65 6.10\n",
|
|||
|
"72 7.45\n",
|
|||
|
"74 6.20\n",
|
|||
|
"75 6.20\n",
|
|||
|
"91 4.50\n",
|
|||
|
"122 5.20\n",
|
|||
|
"125 5.50\n",
|
|||
|
"197 8.03\n",
|
|||
|
"208 7.80\n",
|
|||
|
"391 7.10\n",
|
|||
|
"393 7.10\n",
|
|||
|
"538 12.10\n",
|
|||
|
"571 7.20\n",
|
|||
|
"597 7.71\n",
|
|||
|
"600 7.92\n",
|
|||
|
"606 7.80\n",
|
|||
|
"627 7.80\n",
|
|||
|
"628 5.60\n",
|
|||
|
"629 5.70\n",
|
|||
|
"631 6.10\n",
|
|||
|
"632 6.14\n",
|
|||
|
"635 6.10\n",
|
|||
|
"636 6.03\n",
|
|||
|
"637 6.10\n",
|
|||
|
"639 6.20\n",
|
|||
|
"640 6.20\n",
|
|||
|
"641 6.20\n",
|
|||
|
"643 6.10\n",
|
|||
|
"662 5.90\n",
|
|||
|
"663 5.90\n",
|
|||
|
"665 5.92\n",
|
|||
|
"669 6.00\n",
|
|||
|
"687 6.09\n",
|
|||
|
"688 5.20\n",
|
|||
|
"689 6.09\n",
|
|||
|
"690 5.99\n",
|
|||
|
"701 6.20\n",
|
|||
|
"715 5.70\n",
|
|||
|
"719 5.00\n",
|
|||
|
"779 6.10\n",
|
|||
|
"789 6.20\n",
|
|||
|
"797 6.20\n",
|
|||
|
"923 8.00\n",
|
|||
|
"938 6.20\n",
|
|||
|
"1142 5.00\n",
|
|||
|
"1158 6.08\n",
|
|||
|
"1226 7.85\n",
|
|||
|
"1227 7.85\n",
|
|||
|
"1228 7.90\n",
|
|||
|
"1229 7.11\n",
|
|||
|
"1316 7.09\n",
|
|||
|
"1344 6.00\n",
|
|||
|
"1346 6.00\n",
|
|||
|
"1349 6.00\n",
|
|||
|
"1350 6.10\n",
|
|||
|
"Name: Display, dtype: float64\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'Inbuilt_memory':\n",
|
|||
|
"178 512\n",
|
|||
|
"212 512\n",
|
|||
|
"299 512\n",
|
|||
|
"315 512\n",
|
|||
|
"325 1024\n",
|
|||
|
"329 512\n",
|
|||
|
"372 512\n",
|
|||
|
"448 512\n",
|
|||
|
"454 512\n",
|
|||
|
"525 512\n",
|
|||
|
"532 512\n",
|
|||
|
"548 512\n",
|
|||
|
"573 512\n",
|
|||
|
"598 512\n",
|
|||
|
"599 512\n",
|
|||
|
"604 512\n",
|
|||
|
"605 512\n",
|
|||
|
"623 512\n",
|
|||
|
"664 512\n",
|
|||
|
"670 512\n",
|
|||
|
"673 512\n",
|
|||
|
"674 512\n",
|
|||
|
"675 512\n",
|
|||
|
"677 512\n",
|
|||
|
"679 512\n",
|
|||
|
"699 512\n",
|
|||
|
"794 512\n",
|
|||
|
"855 512\n",
|
|||
|
"1012 512\n",
|
|||
|
"1031 512\n",
|
|||
|
"1034 512\n",
|
|||
|
"1038 512\n",
|
|||
|
"1041 512\n",
|
|||
|
"1051 512\n",
|
|||
|
"1115 512\n",
|
|||
|
"1123 512\n",
|
|||
|
"1218 512\n",
|
|||
|
"1226 512\n",
|
|||
|
"1227 512\n",
|
|||
|
"1276 512\n",
|
|||
|
"Name: Inbuilt_memory, dtype: int64\n",
|
|||
|
"\n",
|
|||
|
"Выбросы в столбце 'Price':\n",
|
|||
|
"196 79.999\n",
|
|||
|
"197 80.990\n",
|
|||
|
"206 99.990\n",
|
|||
|
"207 99.990\n",
|
|||
|
"208 99.990\n",
|
|||
|
" ... \n",
|
|||
|
"1280 79.990\n",
|
|||
|
"1281 99.990\n",
|
|||
|
"1288 82.990\n",
|
|||
|
"1290 92.990\n",
|
|||
|
"1291 79.990\n",
|
|||
|
"Name: Price, Length: 67, dtype: float64\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKUAAAMWCAYAAAAgRDUeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC36UlEQVR4nOzdeXxN1/7/8fdJIqMkaspQQRoqlJaixhCVK8Y2hvaa6aXa4vYaSquDokrN9NbQ3m8vramDRig1z62xXLeiKGmCloQqiRhCkv37wy/7OhIEyTknyev5eJxH7bU+Z+/P3m3POj5n7bUthmEYAgAAAAAAAGzIyd4JAAAAAAAAoOihKAUAAAAAAACboygFAAAAAAAAm6MoBQAAAAAAAJujKAUAAAAAAACboygFAAAAAAAAm6MoBQAAAAAAAJujKAUAAAAAAACboygFAAAAAAAAm6MoBdzEYrFo1KhRNjnW6tWrVbNmTbm7u8tisejChQs2OS4A4PYYBwAA94Jx439GjRoli8WSb/vv3bu3KlasmG/7h31QlIJNzJs3TxaLxepVtmxZNWvWTKtWrbJ3eg/s559/1qhRo5SQkJCr+HPnzun555+Xh4eHZs6cqfnz58vLyyvf8rv1+ru4uOjhhx9W79699fvvv+fbcQEgC+OANXuPA3l1/ceNG6eYmJhs7du3b9eoUaMc7i9MAAoOxg1r9h433N3dFRgYqMjISH344Ye6ePFivh0bRYuLvRNA0TJmzBgFBwfLMAwlJSVp3rx5at26tb799lu1bdvW3undt59//lmjR49WeHh4rqr3e/bs0cWLF/Xee+8pIiIi/xP8/7Ku/9WrV7Vz507NmzdP33//vWJjY+Xu7m6zPAAUXYwDN9h7HMir6z9u3Dh16tRJUVFRVu3bt2/X6NGj1bt3b5UoUSJvkgdQJDFu3GDvceP69etKTEzU5s2bNWjQIE2dOlXLly/X448/bsa+/fbbeuONN2yWGwoHilKwqVatWqlOnTrmdp8+feTn56fFixcX6EHlXp05c0aS8vSL+qVLl+76a8nN179v374qXbq0JkyYoOXLl+v555/Ps1wA4HYYB25whHFAKnjX//Lly/L09LR3GgBsqKB/buUVRxk3RowYoY0bN6pt27Z65plndOjQIXl4eEiSXFxc5OJCiQH3htv3YFclSpSQh4dHtg+vS5cuaejQoQoKCpKbm5uqVKmiyZMnyzAMSdKVK1cUGhqq0NBQXblyxXzfn3/+qYCAADVs2FAZGRmSbtx7XLx4cf3666+KjIyUl5eXAgMDNWbMGHN/d/Kf//xHrVq1ko+Pj4oXL67mzZtr586dZv+8efP03HPPSZKaNWtmTnHdvHlzjvsLDw9Xr169JEl169aVxWJR7969zf6vv/5atWvXloeHh0qXLq3u3btnu8Uu65zi4uLUunVreXt7q1u3bnc9l1uFhYVJkuLi4sy2a9euaeTIkapdu7Z8fX3l5eWlsLAwbdq0yeq9CQkJslgsmjx5smbOnKlHHnlEnp6eatGihU6ePCnDMPTee++pXLly8vDw0LPPPqs///zznnMEULgxDth3HLjd9Z88ebIaNmyoUqVKycPDQ7Vr19aSJUusYiwWiy5duqTPPvvMPOfevXtr1KhRGjZsmCQpODjY7Lv5FpUFCxaY51iyZEl17txZJ0+ezHadqlevrr1796pJkyby9PTUm2++qV69eql06dK6fv16tvNp0aKFqlSpcs/XAUDBwbhh33FDkp5++mm98847On78uBYsWGC257Sm1Lp169S4cWOVKFFCxYsXV5UqVfTmm2+a/Zs3b5bFYtGXX36pN998U/7+/vLy8tIzzzyTbVzISW7Gq6ZNm+qJJ57I8f1VqlRRZGTkvZw+8hhFKdhUcnKy/vjjD509e1YHDx7UK6+8otTUVHXv3t2MMQxDzzzzjKZNm6aWLVtq6tSpqlKlioYNG6YhQ4ZIkjw8PPTZZ5/p2LFjeuutt8z3DhgwQMnJyZo3b56cnZ3N9oyMDLVs2VJ+fn6aOHGiateurXfffVfvvvvuHfM9ePCgwsLC9N///lfDhw/XO++8o/j4eIWHh2vXrl2SpCZNmujVV1+VJL355puaP3++5s+fr6pVq+a4z7feekv9+vWTdGM67Pz58/XSSy9JujFAPf/883J2dtb48eP14osvKjo6Wo0bN862Lkd6eroiIyNVtmxZTZ48WR07dszNvwIrWX9BeOihh8y2lJQU/d///Z/Cw8M1YcIEjRo1SmfPnlVkZKT279+fbR8LFy7UrFmz9Pe//11Dhw7Vli1b9Pzzz+vtt9/W6tWr9frrr6tfv3769ttv9dprr91zjgAKF8YB+44Dubn+kjRjxgzVqlVLY8aM0bhx4+Ti4qLnnntOK1euNGPmz58vNzc3hYWFmef80ksvqUOHDurSpYskadq0aWZfmTJlJEnvv/++evbsqcqVK2vq1KkaNGiQNmzYoCZNmmQ7x3PnzqlVq1aqWbOmpk+frmbNmqlHjx46d+6c1qxZYxWbmJiojRs3ZjsXAAUb44Zj/f0hS48ePSRJa9euveO1aNu2rdLS0jRmzBhNmTJFzzzzjH744Ydsse+//75Wrlyp119/Xa+++qrWrVuniIgIqwJiTnIzXvXo0UM//fSTYmNjrd67Z88e/fLLL4wb9mYANjB37lxDUraXm5ubMW/ePKvYmJgYQ5IxduxYq/ZOnToZFovFOHbsmNk2YsQIw8nJydi6davx9ddfG5KM6dOnW72vV69ehiTj73//u9mWmZlptGnTxnB1dTXOnj1rtksy3n33XXM7KirKcHV1NeLi4sy2U6dOGd7e3kaTJk3Mtqxjb9q06Z6ux549e8y2a9euGWXLljWqV69uXLlyxWxfsWKFIckYOXJktnN644037ul469evN86ePWucPHnSWLJkiVGmTBnDzc3NOHnypBmbnp5upKWlWb3//Pnzhp+fn/G3v/3NbIuPjzckGWXKlDEuXLhgto8YMcKQZDzxxBPG9evXzfYuXboYrq6uxtWrV3OVM4DChXEg5+th63EgN9ffMAzj8uXLVtvXrl0zqlevbjz99NNW7V5eXkavXr2yvX/SpEmGJCM+Pt6qPSEhwXB2djbef/99q/YDBw4YLi4uVu1NmzY1JBlz5syxis3IyDDKlStn/PWvf7Vqnzp1qmGxWIxff/31ttcBQMHBuJHz9bD1uHHz8W7l6+tr1KpVy9x+9913jZtLDNOmTTMkWV2vW23atMmQZDz88MNGSkqK2f7VV18ZkowZM2ZYnUOFChWs3p+b8erChQuGu7u78frrr1vFvvrqq4aXl5eRmpp62/yQ/5gpBZuaOXOm1q1bp3Xr1mnBggVq1qyZ+vbtq+joaDPmu+++k7Ozs/nrQZahQ4fKMAyrp22MGjVKjz32mHr16qX+/furadOm2d6XZeDAgeafLRaLBg4cqGvXrmn9+vU5xmdkZGjt2rWKiorSI488YrYHBASoa9eu+v7775WSknJf1yEnP/74o86cOaP+/ftbLTrepk0bhYaGWlX7s7zyyiv3dIyIiAiVKVNGQUFB6tSpk7y8vLR8+XKVK1fOjHF2dparq6skKTMzU3/++afS09NVp04d7du3L9s+n3vuOfn6+prb9erVkyR1797dalp1vXr1dO3aNZ72BxRxjAO3Z4txIDfXX5K5PogknT9/XsnJyQoLC8txHLgX0dHRyszM1PPPP68//vjDfPn7+6ty5crZbhV3c3PTCy+8YNXm5OSkbt26afny5VZPf1q4cKEaNmyo4ODgB8oRgGNh3Lg9W4wbd1K8ePE7PoUva/2rZcuWKTMz84776tmzp7y9vc3tTp06KSAgQN99990d35eb8crX11fPPvusFi9ebN5+mZGRoS+//FJRUVH5+hRD3B1FKdjUU089pYiICEVERKhbt25auXKlqlWrZn7AS9Lx48cVGBho9aEkyZzOevz4cbPN1dVV//73vxUfH6+LFy9q7ty52e5jlm58gb15YJCkRx99VJJu+xjWs2fP6vLlyzmuTVG1alVlZmbm6j7n3Mo6r5y
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1200x800 with 5 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df2['Ram'] = pd.to_numeric(df2['Ram'])\n",
|
|||
|
"df2['Battery'] = pd.to_numeric(df2['Battery'])\n",
|
|||
|
"df2['Display'] = pd.to_numeric(df2['Display'])\n",
|
|||
|
"df2['Inbuilt_memory'] = pd.to_numeric(df2['Inbuilt_memory'])\n",
|
|||
|
"\n",
|
|||
|
"numeric_cols = df2[['Ram', 'Battery', 'Display', 'Inbuilt_memory', 'Price']].columns\n",
|
|||
|
"\n",
|
|||
|
"numeric_cols = [col for col in numeric_cols]\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(12, 8))\n",
|
|||
|
" \n",
|
|||
|
"\n",
|
|||
|
"for i, col in enumerate(numeric_cols, 1):\n",
|
|||
|
" if col == 'id':\n",
|
|||
|
" continue\n",
|
|||
|
" Q1 = df2[col].quantile(0.25)\n",
|
|||
|
" Q3 = df2[col].quantile(0.75)\n",
|
|||
|
" IQR = Q3 - Q1\n",
|
|||
|
" lower_bound = Q1 - 1.5 * IQR\n",
|
|||
|
" upper_bound = Q3 + 1.5 * IQR\n",
|
|||
|
" outliers = df2[col][(df2[col] < lower_bound) | (df2[col] > upper_bound)]\n",
|
|||
|
" print(f\"Выбросы в столбце '{col}':\\n{outliers}\\n\")\n",
|
|||
|
" plt.subplot(len(numeric_cols) // 3 + 1, 3, i) \n",
|
|||
|
" plt.boxplot(x=df2[col])\n",
|
|||
|
" plt.title(f'Boxplot for {col}')\n",
|
|||
|
"\n",
|
|||
|
"plt.tight_layout()\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Просачивания данных нет, т.к. никакой столбец не коррелирует с целевым признаком более, чем на 0,7"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 307,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"#Проверка кореляции\n",
|
|||
|
"price_col = 'Price' # Имя столбца с ценой\n",
|
|||
|
"for col1 in numeric_cols:\n",
|
|||
|
" if col1 != price_col:\n",
|
|||
|
" correlation = df2[col1].corr(df2[price_col])\n",
|
|||
|
" if abs(correlation) > 0.7:\n",
|
|||
|
" print(f\"Просачивание данных: Высокая корреляция ({correlation:.2f}) между столбцами '{col1}' и '{price_col}'\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Пропущенные значения есть в 3-х столбцах. Для этих столбцов возможно только задать какое-то константное значение, например \"Unknown\""
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 308,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Столбцы с null: ['Android_version', 'fast_charging', 'Processor']\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"C:\\Users\\ujijrujijr\\AppData\\Local\\Temp\\ipykernel_10056\\2788500696.py:10: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.\n",
|
|||
|
"The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.\n",
|
|||
|
"\n",
|
|||
|
"For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
" df2[col].fillna(\"Unknown\", inplace=True)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Проверка наличия пропущенных значений\n",
|
|||
|
"columns_with_nulls = []\n",
|
|||
|
"for col in df2.columns:\n",
|
|||
|
" if df2[col].isnull().sum() > 0: \n",
|
|||
|
" columns_with_nulls.append(col)\n",
|
|||
|
"print(f\"Столбцы с null: {columns_with_nulls}\")\n",
|
|||
|
"\n",
|
|||
|
"# Замена значений null на \"Unknown\" в столбцах с пропусками\n",
|
|||
|
"for col in columns_with_nulls:\n",
|
|||
|
" df2[col].fillna(\"Unknown\", inplace=True)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"**РАЗБИЕНИЕ НА ВЫБОРКИ**"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Обучающая выборка сбалансрована, т.к. график идёт достаточно ровно и нет \"перекоса\" количества телефонов в каком-то диапазоне цен. Поэтому аугментация данных не требуется "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 309,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Размер обучающей выборки: 1035\n",
|
|||
|
"Размер контрольной выборки: 129\n",
|
|||
|
"Размер тестовой выборки: 130\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Text(0.5, 1.0, 'Отсортированные цены в обучающей выборке')"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 309,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAz8AAAHDCAYAAAAKmqQIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABanElEQVR4nO3dd3xUVf7G8edOeu8NCCEUAQWkKV1QWEBRbKsLooK6givuWnbVxbWxqyL4W9eOoi6WRV2xiwoiIsjSkd6lSE1CgHRS5/z+SDJmSGhhkklmPu+XMTP3nrnznZubYZ6ce861jDFGAAAAAODhbO4uAAAAAADqA+EHAAAAgFcg/AAAAADwCoQfAAAAAF6B8AMAAADAKxB+AAAAAHgFwg8AAAAAr0D4AQAAAOAVCD8AANSx3Nxc7d69W/n5+e4uBS6WlZWln3/+WaWlpe4uBcBpIPwAAOBixhhNmzZNPXv2VHBwsMLDw5Wamqr//Oc/7i6tUdi3b5/eeustx/3du3drxowZ7iuoipKSEk2ZMkXnn3++AgICFBUVpTZt2mjevHnuLg3AabCMMcbdRQCo2caNGzVp0iTNnz9fmZmZiomJ0cUXX6yHHnpI5513nrvLA3ACI0eO1H//+1+NHj1al19+uSIiImRZljp16qS4uDh3l9fg7d+/X+ecc44++eQTtW3bVg888ICio6P16quvurWuoqIiDR48WEuXLtUdd9yhgQMHKjg4WD4+PurWrZvCw8PdWh+AU/N1dwEAavbJJ59o5MiRio6O1m233abU1FTt3r1bb775pj766CN98MEHuvrqq91dJoDjvPPOO/rvf/+r//znP7rhhhvcXU6j1LRpU91+++0aOnSoJCkpKUk//PCDe4uSNHnyZC1btkxz5szRgAED3F0OgFqg5wdogHbs2KFOnTqpefPmWrhwodNfijMzM9WvXz/t3btX69atU8uWLd1YKYDjdezYUZ06dWowp2k1Zjt27FBmZqY6dOigkJAQt9ZSWlqq+Ph4/eEPf9CTTz7p1loA1B5jfoAG6JlnnlFBQYGmTZtW7RSZ2NhYvfbaa8rPz9eUKVMkSY8//rgsyzrpV9W/mi5btkyXXXaZoqKiFBISok6dOun55593ep7vv/9e/fr1U0hIiCIjI3XllVdq8+bNTm0qn3fLli26/vrrFR4erpiYGN19990qLCx0tDtVbZV/Qf3hhx+q1SpJw4YNk2VZevzxx8/4uaXyDy3/+Mc/1KpVKwUEBKhFixZ66KGHVFRU5NSuRYsWjppsNpsSExP1u9/9Tnv27HFq93//93/q3bu3YmJiFBQUpG7duumjjz6q9nO0LEt33XVXteWXX365WrRo4bi/e/duWZal//u//6vWtkOHDk5/Ya7cRzU9X6UxY8Y4bV+S7Ha7nnvuOZ133nkKDAxUQkKCxo0bp6NHj55wO1W3FxoaWm35Rx99VOPPq6ioSI899phat26tgIAAJScn64EHHqi2v12xfypVHg+nMmDAAKdjLzY2VsOGDdOGDRtO+VhJmjlzprp166agoCDFxsbqxhtv1P79+x3r8/PztWHDBiUnJ2vYsGEKDw9XSEiIBgwYoB9//NHRbufOnbIsS//617+qPcfixYtlWZbef/99R83H9zJU7pOq42LWrVunMWPGqGXLlgoMDFRiYqJuvfVWHT582Omxb731lizL0u7dux3L5syZo969eys4OFgRERG6/PLLq+2Tyn2cmZnpWLZy5cpqdUjVj9tK33zzjeN9JSwsTMOGDdPGjRud2lQ9flu1aqUePXroyJEjCgoKqlZ3TcaMGeP0M46Kiqq2/6Xy3/fLL7/8hNs5/v1o69atOnr0qMLCwtS/f/+T7itJWr16tS699FKFh4crNDRUAwcO1NKlS53aVP4sFi5cqHHjxikmJkbh4eG6+eabq/1utmjRQmPGjHFaNnbsWAUGBlb7HTyd/Qx4K057AxqgL7/8Ui1atFC/fv1qXH/RRRepRYsW+uqrryRJ11xzjVq3bu1Yf++996p9+/YaO3asY1n79u0lSXPnztXll1+upKQk3X333UpMTNTmzZs1a9Ys3X333ZKk7777Tpdeeqlatmypxx9/XMeOHdOLL76oPn366Keffqr2wfr6669XixYtNGnSJC1dulQvvPCCjh49qnfeeUeS9O677zra/vjjj5o2bZr+9a9/KTY2VpKUkJBwwn2xcOFCff311ydcf6rnlqTf//73evvtt/Xb3/5Wf/7zn7Vs2TJNmjRJmzdv1qeffuq0vX79+mns2LGy2+3asGGDnnvuOR04cMDpg9Pzzz+v4cOHa9SoUSouLtYHH3yg6667TrNmzdKwYcNOWKs7jRs3Tm+99ZZuueUW/elPf9KuXbv00ksvafXq1frf//4nPz8/lzyP3W7X8OHDtWjRIo0dO1bt27fX+vXr9a9//Uvbtm3TZ5995pLnORvt2rXT3/72NxljtGPHDj377LO67LLLqoXc41XuvwsuuECTJk1Senq6nn/+ef3vf//T6tWrFRkZ6QgakydPVmJiou6//34FBgbq9ddf16BBgzR37lxddNFFatmypfr06aMZM2bo3nvvdXqeGTNmKCwsTFdeeeUZva65c+dq586duuWWW5SYmKiNGzdq2rRp2rhxo5YuXXrCcPjjjz/qsssuU0pKih577DGVlJTolVdeUZ8+fbRixQqdc845Z1THibz77rsaPXq0hgwZosmTJ6ugoEBTp05V3759tXr16mrvK1U9+uij1f6ocTKxsbGOYLlv3z49//zzuuyyy7R3715FRkbWqv7Kn+2ECRPUpk0bTZw4UYWFhXr55Zer7auNGzeqX79+Cg8P1wMPPCA/Pz+99tprGjBggBYsWKAePXo4bfuuu+5SZGSkHn/8cW3dulVTp07VL7/84ghgNXnsscf05ptv6r///a9T0Dyb/Qx4BQOgQcnKyjKSzJVXXnnSdsOHDzeSTE5OTrV1KSkpZvTo0dWWl5aWmtTUVJOSkmKOHj3qtM5utztud+7c2cTHx5vDhw87lq1du9bYbDZz8803O5Y99thjRpIZPny407buvPNOI8msXbu2Wg3Tp083ksyuXbuqrZs/f76RZObPn+9Y1qNHD3PppZcaSeaxxx474+des2aNkWR+//vfO7X7y1/+YiSZ77//3rGspv12ww03mODgYKdlBQUFTveLi4tNhw4dzCWXXOK0XJIZP358tdc5bNgwk5KS4ri/a9cuI8k888wz1dqed955pn///o77lfto5syZ1dpWGj16tNP2f/zxRyPJzJgxw6nd7Nmza1xe0/ZCQkKqLZ85c2a1n9e7775rbDab+fHHH53avvrqq0aS+d///udY5or9U6nyeDiV/v37O+1PY4x56KGHjCSTkZFxwscVFxeb+Ph406FDB3Ps2DHH8lmzZhlJ5tFHH3Wq1d/f32zbts3R7tChQyYmJsZ069bNsey1114zkszmzZudnic2NtbpOLz44ovNRRdd5FRP5fNMnz7dsez449IYY95//30jySxcuNCx7PjfwW7dupmIiAiTlpbmaLNt2zbj5+dnrr32Wseyyn186NAhx7IVK1ZUq8OY6sdtbm6uiYyMNLfffrtTu7S0NBMREeG0/Pjjd8OGDcZmszneB2p676jq+McbY8y0adOMJLN8+XLHspSUFDNs2LATbuf496PK+7GxsSYzM9PRrqZ9ddVVVxl/f3+zY8cOx7IDBw6YsLAwp59l5c+iW7dupri42LF8ypQpRpL5/PPPneqtPC4qj50XX3zRqeYz2c+At+K0N6CByc3NlSSFhYWdtF3l+pycnNPe9urVq7Vr1y7dc8891f76WfnXxYMHD2rNmjUaM2aMoqOjHes7deqk3/zmNzX2wowfP97p/h//+EdJOmmPzen45JNPtGLFCj399NMnbHOq5678ft999zm1+/Of/yxJjt6zSkVFRcrMzFRGRobmzp2r77//XgMHDnRqExQU5Lh99OhRZWdnq1+/fvrpp5+q1VdYWKjMzEynr5KSkhpfS0FBQbW2ZWVlNbbNzc1VZmamsrKyalxf1cy
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x500 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data=df2[['Ram', 'Battery', 'Display','Price', 'Inbuilt_memory']].copy()\n",
|
|||
|
"data['Price'] = pd.to_numeric(data['Price'], errors='coerce')\n",
|
|||
|
"# сначала разделение записей на 80% и 20%, где 80% - обучающая выборка\n",
|
|||
|
"train_data, temp_data = train_test_split(data, test_size=0.2, random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"# потом разделение остальных 20% поровну на контрольную и тестовую выборки\n",
|
|||
|
"val_data, test_data = train_test_split(temp_data, test_size=0.5, random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"# Проверка размеров выборок\n",
|
|||
|
"print(\"Размер обучающей выборки:\", len(train_data))\n",
|
|||
|
"print(\"Размер контрольной выборки:\", len(val_data))\n",
|
|||
|
"print(\"Размер тестовой выборки:\", len(test_data))\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"sort_train_data=train_data.sort_values(by='Price')['Price'].values\n",
|
|||
|
"plt.figure(figsize=(10, 5))\n",
|
|||
|
"plt.plot(sort_train_data)\n",
|
|||
|
"plt.title('Отсортированные цены в обучающей выборке')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## **ДАТАСЕТ 3**\n",
|
|||
|
"\n",
|
|||
|
"https://www.kaggle.com/datasets/shivam2503/diamonds\n",
|
|||
|
"\n",
|
|||
|
"Проблемная оласть: цены на бриллианты\n",
|
|||
|
"\n",
|
|||
|
"Объект наблюдения: бриллиант\n",
|
|||
|
"\n",
|
|||
|
"Атрибуты:\n",
|
|||
|
"* carat: Вес в каратах\n",
|
|||
|
"* cut: Качество огранки\n",
|
|||
|
"* color: Цвет\n",
|
|||
|
"* clarity: Чистота\n",
|
|||
|
"* depth: Процент глубины \n",
|
|||
|
"* table: Процент ширины\n",
|
|||
|
"* price: Цена в долларах\n",
|
|||
|
"* x: Длина в миллиметрах\n",
|
|||
|
"* y: Ширина в миллиметрах\n",
|
|||
|
"* z: Глубина в миллиметрах\n",
|
|||
|
"\n",
|
|||
|
"Объект только 1, но в нём есть связь между ценой и всеми остальными характеристиками (чем лучше какая-либо характеристика, тем дороже бриллиант)\n",
|
|||
|
"\n",
|
|||
|
"Бизнес-цель: Предсказать оптимальную стоимость бриллианта на основе его характеристик. Эффект для бизнеса: ювелиры смогут предлагать конкурентоспособные цены, что потенциально увеличить прибыль. \n",
|
|||
|
"\n",
|
|||
|
"Цель технического проекта: Построить модель машинного обучения для прогнозирования цены бриллианта на основе его характеристик. Вход: характеристики бриллианта (вес, огранка, цвет, чистота, размеры). Целевой признак: цена"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 290,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Index(['Unnamed: 0', 'carat', 'cut', 'color', 'clarity', 'depth', 'table',\n",
|
|||
|
" 'price', 'x', 'y', 'z'],\n",
|
|||
|
" dtype='object')\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df3 = pd.read_csv(\"..//static//csv//diamonds.csv\")\n",
|
|||
|
"print(df3.columns)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Оценка всех числовых признаков показывает, что в датасете довольно много шума. В большинстве своём он полезные, т.к. бриллианты могут иметь абсолютно разные значения характеристик, и их важно учитывать. Однако есть одиночные выбросы, из-за которых модель может некорректно обучиться. Это данные, у которых значение:\n",
|
|||
|
"* по параметру table больше 90\n",
|
|||
|
"* по параметру x около 0\n",
|
|||
|
"* по параметру y значение более 30 и около 0\n",
|
|||
|
"* по параметру z значение более 30\n",
|
|||
|
"\n",
|
|||
|
"Имеет смысл удалить данные выбросы.\n",
|
|||
|
"\n",
|
|||
|
"Большинство данных смещено в следующую сторону:\n",
|
|||
|
"* меньше 3 карат\n",
|
|||
|
"* по проценту глубины между 50 и 70\n",
|
|||
|
"* по проценту ширины между 50 и 60\n",
|
|||
|
"* по длине между 4 и 9 мм\n",
|
|||
|
"* по ширине между 5 и 10 мм\n",
|
|||
|
"* по глубине между 2 и 5 мм "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 291,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKUAAAMWCAYAAAAgRDUeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADAF0lEQVR4nOzdeVxWdf7//+cFyCaLS8qSKOQGueQ6Kuoo5kQWjoTWaFraWPp1qXFtpElzS9I0rdzSKbXMajSjsjTNj6YmmmJOMm5okpqATQm4gl6c3x/+OOMVqKjAdV3wuN9u5xbnvF/nfV7X+XzmvL1e1znvYzEMwxAAAAAAAABQhlzsnQAAAAAAAAAqHopSAAAAAAAAKHMUpQAAAAAAAFDmKEoBAAAAAACgzFGUAgAAAAAAQJmjKAUAAAAAAIAyR1EKAAAAAAAAZY6iFAAAAAAAAMocRSkAAAAAAACUOYpSwB2yWCyaOHFimRxr3bp1atasmTw9PWWxWJSVlVUmxwUAFFbRrv8TJ06UxWIp8+MW6Ny5sxo3bmy34wPAjVS0MeFWhIaGKiYm5qZxmzdvlsVi0ebNm0s/KTgMilJwWEuXLpXFYrFZatasqaioKK1du9be6d2x/fv3a+LEiUpLSytW/K+//qrHHntMXl5emjdvnt577z1Vrly5dJN0IKdOndLEiRO1d+9ee6cCoJRx/bdVka7/XOsB/B5jgq2yHhO2b9+uiRMnOnzhC87Lzd4JADczefJkhYWFyTAMZWZmaunSpXrooYf0+eefF6vi7qj279+vSZMmqXPnzgoNDb1p/K5du3T27FlNmTJFXbt2Lf0EHcypU6c0adIkhYaGqlmzZvZOB0AZ4Pp/VUW6/nOtB3A9jAlXlfWYsH37dk2aNEkDBgxQlSpVSv14qHgoSsHhdevWTa1atTLXBw4cqICAAH3wwQdOPQDdqtOnT0tSiQ4G58+ft9uv7ZcuXZK7u7tcXLhhE0DRuP5fVd6u/wBwOxgTrmJMQHnDt0E4nSpVqsjLy0tubrY11fPnz2v06NEKCQmRh4eHGjZsqJkzZ8owDEnSxYsXFR4ervDwcF28eNHc77ffflNQUJAiIyNltVolSQMGDJCPj49+/PFHRUdHq3LlygoODtbkyZPN/m7k+++/V7du3eTn5ycfHx/df//92rFjh9m+dOlSPfroo5KkqKgo81bk6z0/3blzZ/Xv31+S1Lp1a1ksFg0YMMBsX7lypVq2bCkvLy/ddddd6tevn37++WebPgo+09GjR/XQQw/J19dXffv2veHn+PnnnzVw4EAFBwfLw8NDYWFhGjJkiPLy8sxzN2bMGDVp0kQ+Pj7y8/NTt27d9O9//9umn4Lnwz/88EO9+OKLuvvuu+Xt7a2cnJxi9bF582a1bt1akvTUU0+Z52vp0qU3zB9A+cL1v+yu/9u2bVPr1q3l6empunXr6q233rpu7PLly80cqlWrpt69e+vEiROFPkfjxo2VnJysyMhIeXl5KSwsTAsXLjRjinut379/v6KiouTt7a27775bM2bMuOFnAVA+MSaU/pgwceJEjR07VpIUFhZm5lfwqOGSJUvUpUsX1axZUx4eHrr33nu1YMGC656P9evXm3Nh3XvvvVq9evUNz1+BnTt36sEHH5S/v7+8vb3VqVMnffvtt8XaF07AABzUkiVLDEnG119/bfzyyy/G6dOnjZSUFGPw4MGGi4uLsX79ejM2Pz/f6NKli2GxWIynn37amDt3rtG9e3dDkjFixAgzbseOHYarq6sxcuRIc1vv3r0NLy8v49ChQ+a2/v37G56enkb9+vWNJ554wpg7d64RExNjSDLGjx9vk6ck46WXXjLXU1JSjMqVKxtBQUHGlClTjFdeecUICwszPDw8jB07dhiGYRhHjx41nnvuOUOS8cILLxjvvfee8d577xkZGRlFnov169cbgwYNMiQZkydPNt577z1j+/btNuepdevWxuzZs41x48YZXl5eRmhoqHHmzBmbz+Th4WHUrVvX6N+/v7Fw4ULj3Xffve75//nnn43g4GDD29vbGDFihLFw4UJj/PjxRkREhNnvrl27jLp16xrjxo0z3nrrLWPy5MnG3Xffbfj7+xs///yz2demTZsMSca9995rNGvWzHjttdeMhIQE4/z588XqIyMjw5g8ebIhyRg0aJB5vo4ePXrd/AE4L67//2OP6/8PP/xgeHl5GbVr1zYSEhKMKVOmGAEBAUbTpk2N3//TcerUqYbFYjH+8pe/GPPnzzcmTZpk3HXXXYVy6NSpkxEcHGzUrFnTGD58uPHGG28YHTp0MCQZb7/9tmEYN7/WF/QREhJi/O1vfzPmz59vdOnSxZBkfPnll9f9PACcG2PC/5T1mPDvf//b6NOnjyHJmD17tpnfuXPnDMMwjNatWxsDBgwwZs+ebbz55pvGAw88YEgy5s6da9NPnTp1jAYNGhhVqlQxxo0bZ7z22mtGkyZNCv3fr+A7w6ZNm8xtGzduNNzd3Y127doZs2bNMmbPnm00bdrUcHd3N3bu3Flk3nAuFKXgsAourL9fPDw8jKVLl9rEJiYmGpKMqVOn2mzv1auXYbFYjCNHjpjb4uPjDRcXF2PLli3GypUrDUnGnDlzbPbr37+/Icl49tlnzW35+fnGww8/bLi7uxu//PKLuf33A1BsbKzh7u5uUzA5deqU4evra/zxj380txUc+9qLbnHOx65du8xteXl5Rs2aNY3GjRsbFy9eNLevWbPGkGRMmDCh0GcaN25csY735JNPGi4uLjbHK5Cfn28YhmFcunTJsFqtNm3Hjh0zPDw8jMmTJ5vbCgaYe+65x7hw4YJNfHH72LVrlyHJWLJkSbHyB+C8uP4XfT7K6vofGxtreHp6Gj/99JO5bf/+/Yarq6tNUSotLc1wdXU1Xn75ZZv99+3bZ7i5udls79SpkyHJmDVrlrktNzfXaNasmVGzZk0jLy/PMIwbX+sL+rj2y1Nubq4RGBho9OzZs1ifDYDzYUwo+nyU1Zjw6quvGpKMY8eOFWr7/b/rDcMwoqOjjXvuucdmW506dQxJxscff2xuy87ONoKCgozmzZub235flMrPzzfq169vREdHm98/Co4bFhZm/OlPfyrWZ4Bj4/E9OLx58+Zpw4YN2rBhg5YvX66oqCg9/fTTNrd7fvnll3J1ddVzzz1ns+/o0aNlGIbNmzkmTpyoRo0aqX///ho6dKg6depUaL8Cw4cPN/+2WCwaPny48vLy9PXXXxcZb7VatX79esXGxuqee+4xtwcFBenxxx/Xtm3blJOTc1vnoSi7d+/W6dOnNXToUHl6eprbH374YYWHh+uLL74otM+QIUNu2m9+fr4SExPVvXt3m2f3CxS8EtzDw8OcE8pqterXX3+Vj4+PGjZsqD179hTar3///vLy8rLZdqt9AKg4uP5fX2ld/61Wq7766ivFxsaqdu3a5vaIiAhFR0fbxK5evVr5+fl67LHH9N///tdcAgMDVb9+fW3atMkm3s3NTYMHDzbX3d3dNXjwYJ0+fVrJycnF+tw+Pj7q16+fTR9/+MMf9OOPPxZrfwDOizHh+kprTLiZa/9dn52drf/+97/q1KmTfvzxR2VnZ9vEBgcH65FHHjHX/fz89OSTT+r7779XRkZGkf3v3btXqampevzxx/Xrr7+a48z58+d1//33a8uWLcrPz7/jzwH7YqJzOLw//OEPNoWRPn36qHnz5ho+fLhiYmLk7u6un376ScHBwfL19bXZNyIiQpL0008/mdvc3d31zjvvmHNlLFmyxCyyXMvFxcVmEJGkBg0aSNJ1X9n6yy+/6MKFC2rYsGGhtoiICOXn5+vEiRNq1KhR8T78TRR8rqKOFx4erm3bttlsc3NzU61atW7a7y+//KKcnBw1btz4hnH5+fl6/fXXNX/+fB07dsx8/l6SqlevXig+LCzsjvsAUHFw/b++0rz+X7x4UfXr1y/U1rBhQ3355ZfmempqqgzDKDJWkipVqmSzHhwcXGgi3WvPa9u2bW+aX61atQr936xq1ar64YcfbrovAOfGmHB9pTUm3My3336
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1200x800 with 7 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"numeric_cols = df3.select_dtypes(include=['number']).columns\n",
|
|||
|
"\n",
|
|||
|
"#все столбцы, кроме Unnamed (с индексом)\n",
|
|||
|
"numeric_cols = [col for col in numeric_cols if 'Unnamed' not in col]\n",
|
|||
|
"\n",
|
|||
|
"# столбец 'id' также исключен\n",
|
|||
|
"numeric_cols = [col for col in numeric_cols if col != 'id']\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(12, 8))\n",
|
|||
|
" \n",
|
|||
|
"\n",
|
|||
|
"for i, col in enumerate(numeric_cols, 1):\n",
|
|||
|
" if col == 'id':\n",
|
|||
|
" continue\n",
|
|||
|
" Q1 = df3[col].quantile(0.25)\n",
|
|||
|
" Q3 = df3[col].quantile(0.75)\n",
|
|||
|
" IQR = Q3 - Q1\n",
|
|||
|
" lower_bound = Q1 - 1.5 * IQR\n",
|
|||
|
" upper_bound = Q3 + 1.5 * IQR\n",
|
|||
|
" outliers = df3[col][(df3[col] < lower_bound) | (df3[col] > upper_bound)]\n",
|
|||
|
" plt.subplot(len(numeric_cols) // 3 + 1, 3, i) \n",
|
|||
|
" plt.boxplot(x=df3[col])\n",
|
|||
|
" plt.title(f'Boxplot for {col}')\n",
|
|||
|
"\n",
|
|||
|
"plt.tight_layout()\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"По числовым данным видно, что цена имеет прямую зависимость от веса и размеров бриллианта. Такая корреляции между столбцами carat, x, y, z и price является естественной и ожидаемой, так как чем больше бриллиант, тем он дороже"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 292,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Просачивание данных: Высокая корреляция (0.92) между столбцами 'carat' и 'price'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.88) между столбцами 'x' и 'price'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.87) между столбцами 'y' и 'price'\n",
|
|||
|
"Просачивание данных: Высокая корреляция (0.86) между столбцами 'z' и 'price'\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"#Проверка кореляции\n",
|
|||
|
"\n",
|
|||
|
"price_col = 'price' # Имя столбца с ценой\n",
|
|||
|
"for col1 in numeric_cols:\n",
|
|||
|
" if col1 != price_col:\n",
|
|||
|
" correlation = df3[col1].corr(df3[price_col])\n",
|
|||
|
" if abs(correlation) > 0.7:\n",
|
|||
|
" print(f\"Просачивание данных: Высокая корреляция ({correlation:.2f}) между столбцами '{col1}' и '{price_col}'\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Набор данных информативный, т.к. содержит основные характеристики бриллиантов, которые влияют на их цену\n",
|
|||
|
"\n",
|
|||
|
"Степень покрытия высокая, т.к. содержатся сведения о более 50000 бриллиантах\n",
|
|||
|
"\n",
|
|||
|
"Все метки согласованы, но 'depth' и 'x', 'y', 'z' могли быть названы немного подробнее"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 293,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Количество записей: 53940\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"print(f\"Количество записей: {df3.shape[0]}\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Столбцов со значениями null нет, поэтому решать проблему пропущенных данных не надо"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 294,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Столбцы с null: []\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"columns_with_nulls = []\n",
|
|||
|
"for col in df3.columns:\n",
|
|||
|
" if df3[col].isnull().sum() > 0: \n",
|
|||
|
" columns_with_nulls.append(col)\n",
|
|||
|
"print(f\"Столбцы с null: {columns_with_nulls}\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"**РАЗБИЕНИЕ НА ВЫБОРКИ**\n",
|
|||
|
"\n",
|
|||
|
"train_data - обучающая выборка\n",
|
|||
|
"\n",
|
|||
|
"val_data - контрольная выборка\n",
|
|||
|
"\n",
|
|||
|
"test_data - тестовая выборка"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Обучающая выборка сбалансрована, т.к. график идёт достаточно ровно и нет \"перекоса\" количества бриллиантов в каком-то диапазоне цен. Поэтому аугментация данных не требуется "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 295,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"['Ideal' 'Premium' 'Good' 'Very Good' 'Fair']\n",
|
|||
|
"['E' 'I' 'J' 'H' 'F' 'G' 'D']\n",
|
|||
|
"['SI2' 'SI1' 'VS1' 'VS2' 'VVS2' 'VVS1' 'I1' 'IF']\n",
|
|||
|
"Размер обучающей выборки: 43152\n",
|
|||
|
"Размер контрольной выборки: 5394\n",
|
|||
|
"Размер тестовой выборки: 5394\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Text(0.5, 1.0, 'Отсортированные цены в обучающей выборке')"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 295,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1AAAAHDCAYAAAAqdvv1AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABxj0lEQVR4nO3deVxU5f4H8M8MMDNsM+wMKAJuuIG7iOaWJipu7S6llmWWlmnXzOqW1r3X0l/d7Faat9IWy7LFSs3EXRM3FBEX3MANhkWWYWeW5/cHcq4joKDgYfm8X695yZzznHO+53AY+fCc8xyFEEKAiIiIiIiIbkkpdwFEREREREQNBQMUERERERFRNTFAERERERERVRMDFBERERERUTUxQBEREREREVUTAxQREREREVE1MUARERERERFVEwMUERERERFRNTFAERERNQB5eXlITk5GQUGB3KVQLcvJycHZs2dhNpvlLoWIqoEBioiIqB4SQmDFihXo3bs3nJycoNVqERwcjG+++Ubu0hqEy5cvY9WqVdL75ORkrF69Wr6CrmMymbB48WJ07twZarUa7u7uaNOmDbZu3Sp3aURUDQohhJC7CCKqO8ePH8eiRYuwfft2ZGZmwtPTE4MGDcKrr76Kjh07yl0eEVVh/Pjx+P777zF58mSMHDkSOp0OCoUCYWFh8Pb2lru8eu/KlSto27Ytfv75Z4SEhODll1+Gh4cHli9fLmtdJSUlGDp0KPbt24fp06dj8ODBcHJygp2dHbp37w6tVitrfUR0a/ZyF0BEdefnn3/G+PHj4eHhgalTpyI4OBjJycn4/PPP8eOPP2LNmjW4//775S6TiG7w1Vdf4fvvv8c333yDCRMmyF1Og9SsWTM8/fTTGDZsGADAz88PO3bskLcoAO+++y7279+PP//8EwMHDpS7HCK6DeyBImqkzp07h7CwMLRo0QK7du2y+Yt1ZmYm+vXrh0uXLiE+Ph4tW7aUsVIiulFoaCjCwsLqzSVnDdm5c+eQmZmJTp06wdnZWdZazGYzfHx88Oyzz+Kf//ynrLUQ0e3jPVBEjdSSJUtQWFiIFStWVLjcx8vLC59++ikKCgqwePFiAMCCBQugUChu+rr+r7f79+/HiBEj4O7uDmdnZ4SFhWHp0qU229m2bRv69esHZ2dnuLm5YcyYMTh58qRNm/Ltnjp1Co888gi0Wi08PT0xa9YsFBcXS+1uVVv5X3J37NhRoVYAiIqKgkKhwIIFC2q8baDsF5+3334brVq1glqtRlBQEF599VWUlJTYtAsKCpJqUiqV0Ov1ePTRR3Hx4kWbdv/3f/+HPn36wNPTE46OjujevTt+/PHHCt9HhUKBmTNnVpg+cuRIBAUFSe+Tk5OhUCjwf//3fxXadurUyeYv3eXHqLLtlZsyZYrN+gHAarXigw8+QMeOHaHRaODr64tnnnkG2dnZVa7n+vW5uLhUmP7jjz9W+v0qKSnBm2++idatW0OtViMgIAAvv/xyheNdG8enXPn5cCsDBw60Ofe8vLwQFRWFhISEWy4LAGvXrkX37t3h6OgILy8vPPbYY7hy5Yo0v6CgAAkJCQgICEBUVBS0Wi2cnZ0xcOBA7N69W2p3/vx5KBQK/Pvf/66wjb1790KhUOC7776Tar6xt6P8mFx/n1B8fDymTJmCli1bQqPRQK/X48knn8TVq1dtll21ahUUCgWSk5OlaX/++Sf69OkDJycn6HQ6jBw5ssIxKT/GmZmZ0rRDhw5VqAOoeN6W++OPP6TPFVdXV0RFReH48eM2ba4/f1u1aoXw8HBkZWXB0dGxQt2VmTJlis332N3dvcLxB8p+3keOHFnlem78PEpMTER2djZcXV0xYMCAmx4rADhy5AiGDx8OrVYLFxcXDB48GPv27bNpU/692LVrF5555hl4enpCq9Vi0qRJFX42g4KCMGXKFJtp06ZNg0ajqfAzWJ3jTNRU8RI+okbq999/R1BQEPr161fp/P79+yMoKAgbNmwAADzwwANo3bq1NH/27Nlo3749pk2bJk1r3749ACA6OhojR46En58fZs2aBb1ej5MnT2L9+vWYNWsWAGDLli0YPnw4WrZsiQULFqCoqAj/+c9/0LdvXxw+fLjCL+ePPPIIgoKCsGjRIuzbtw8ffvghsrOz8dVXXwEAvv76a6nt7t27sWLFCvz73/+Gl5cXAMDX17fKY7Fr1y5s3Lixyvm32jYAPPXUU/jyyy/x0EMP4aWXXsL+/fuxaNEinDx5Er/88ovN+vr164dp06bBarUiISEBH3zwAVJSUmx++Vq6dClGjx6NiRMnorS0FGvWrMHDDz+M9evXIyoqqspa5fTMM89g1apVeOKJJ/DCCy8gKSkJH330EY4cOYK//voLDg4OtbIdq9WK0aNHY8+ePZg2bRrat2+PY8eO4d///jdOnz6NdevW1cp27kS7du3w2muvQQiBc+fO4f3338eIESMqBOUblR+/nj17YtGiRUhLS8PSpUvx119/4ciRI3Bzc5PCyrvvvgu9Xo+5c+dCo9Hgv//9L4YMGYLo6Gj0798fLVu2RN++fbF69WrMnj3bZjurV6+Gq6srxowZU6P9io6Oxvnz5/HEE09Ar9fj+PHjWLFiBY4fP459+/ZVGTB3796NESNGIDAwEG+++SZMJhM++eQT9O3bFwcPHkTbtm1rVEdVvv76a0yePBmRkZF49913UVhYiGXLluGee+7BkSNHKnyuXO+NN96o8IeRm/Hy8pLC6eXLl7F06VKMGDECly5dgpub223VX/69nT9/Ptq0aYOFCxeiuLgYH3/8cYVjdfz4cfTr1w9arRYvv/wyHBwc8Omnn2LgwIHYuXMnwsPDbdY9c+ZMuLm5YcGCBUhMTMSyZctw4cIFKcRV5s0338Tnn3+O77//3ias3slxJmoSBBE1Ojk5OQKAGDNmzE3bjR49WgAQRqOxwrzAwEAxefLkCtPNZrMIDg4WgYGBIjs722ae1WqVvu7SpYvw8fERV69elaYdPXpUKJVKMWnSJGnam2++KQCI0aNH26zrueeeEwDE0aNHK9SwcuVKAUAkJSVVmLd9+3YBQGzfvl2aFh4eLoYPHy4AiDfffLPG246LixMAxFNPPWXT7m9/+5sAILZt2yZNq+y4TZgwQTg5OdlMKywstHlfWloqOnXqJO69916b6QDEjBkzKuxnVFSUCAwMlN4nJSUJAGLJkiUV2nbs2FEMGDBAel9+jNauXVuhbbnJkyfbrH/37t0CgFi9erVNu02bNlU6vbL1OTs7V5i+du3aCt+vr7/+WiiVSrF7926btsuXLxcAxF9//SVNq43jU678fLiVAQMG2BxPIYR49dVXBQCRnp5e5XKlpaXCx8dHdOrUSRQVFUnT169fLwCIN954w6ZWlUolTp8+LbXLyMgQnp6eonv37tK0Tz/9VAAQJ0+etNmOl5eXzXk4aNAg0b9/f5t6yrezcuVKadqN56UQQnz33XcCgNi1a5c07cafwe7duwudTicMBoPU5vTp08LBwUE8+OCD0rTyY5yRkSFNO3jwYIU6hKh43ubl5Qk3Nzfx9NNP27QzGAxCp9PZTL/x/E1ISBBKpVL6HKjss+N6Ny4vhBArVqwQAMSBAwekaYGBgSIqKqrK9dz4eVT+3svLS2RmZkrtKjtWY8eOFSqVSpw7d06alpKSIlxdXW2+l+Xfi+7du4vS0lJp+uLFiwUA8euvv9rUW35elJ87//nPf2xqrslxJmqqeAkfUSOUl5cHAHB1db1pu/L5RqOx2us+cuQIkpKS8OKLL1b4K2z5XzlTU1MRFxeHKVOmwMPDQ5ofFhaG++67r9LeoBkzZti8f/755wHgpj1H1fHzzz/j4MGDeOedd6psc6ttl/87Z84cm3YvvfQSAEi9eOVKSkqQmZmJ9PR0REdHY9u2bRg8eLBNG0dHR+nr7Oxs5Obmol+/fjh8+HCF+oqLi5GZmWnzMplMle5LYWFhhbYWi6XStnl5ecjMzEROTk6l86+3du1a6HQ63HfffTbr7t69O1xcXLB9+/ZbrqO61q5di/bt26Ndu3Y227r33ns
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x500 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"#приведение категориальных данных в числовые\n",
|
|||
|
"print(df3['cut'].unique())\n",
|
|||
|
"cut_mapping = {'Fair': 1, \n",
|
|||
|
" 'Good': 2, \n",
|
|||
|
" 'Very Good': 3, \n",
|
|||
|
" 'Premium': 4, \n",
|
|||
|
" 'Ideal': 5}\n",
|
|||
|
"df3['cut'] = df3['cut'].map(cut_mapping)\n",
|
|||
|
"\n",
|
|||
|
"print(df3['color'].unique())\n",
|
|||
|
"color_mapping = {'D': 1, \n",
|
|||
|
" 'E': 2, \n",
|
|||
|
" 'F': 3, \n",
|
|||
|
" 'G': 4, \n",
|
|||
|
" 'H': 5, \n",
|
|||
|
" 'I': 6, \n",
|
|||
|
" 'J': 7} \n",
|
|||
|
"df3['color'] = df3['color'].map(color_mapping)\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"print(df3['clarity'].unique())\n",
|
|||
|
"clarity_mapping = {\n",
|
|||
|
" 'IF': 1, \n",
|
|||
|
" 'VVS1': 2, \n",
|
|||
|
" 'VVS2': 3, \n",
|
|||
|
" 'VS1': 4, \n",
|
|||
|
" 'VS2': 5, \n",
|
|||
|
" 'SI1': 6, \n",
|
|||
|
" 'SI2': 7, \n",
|
|||
|
" 'I1': 8} \n",
|
|||
|
"df3['clarity'] = df3['clarity'].map(clarity_mapping)\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"data=df3.copy()\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"# сначала разделение записей на 80% и 20%, где 80% - обучающая выборка\n",
|
|||
|
"train_data, temp_data = train_test_split(data, test_size=0.2, random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"# потом разделение остальных 20% поровну на контрольную и тестовую выборки\n",
|
|||
|
"val_data, test_data = train_test_split(temp_data, test_size=0.5, random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"# Проверка размеров выборок\n",
|
|||
|
"print(\"Размер обучающей выборки:\", len(train_data))\n",
|
|||
|
"print(\"Размер контрольной выборки:\", len(val_data))\n",
|
|||
|
"print(\"Размер тестовой выборки:\", len(test_data))\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"sort_train_data=train_data.sort_values(by='price')['price'].values\n",
|
|||
|
"plt.figure(figsize=(10, 5))\n",
|
|||
|
"plt.plot(sort_train_data)\n",
|
|||
|
"plt.title('Отсортированные цены в обучающей выборке')"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "aimenv",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.12.5"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|