{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Бизнес цели:\n", "1. Оптимизация ценовой стратегии: анализ факторов, влияющих на стоимость недвижимости, чтобы помочь продавцам устанавливать конкурентоспособные цены и увеличивать прибыль.\n", "2. Улучшение инвестиционных решений: предоставление аналитики для инвесторов, чтобы они могли определить наиболее выгодные районы и типы недвижимости для вложений." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Цели технического проекта:\n", "1. Создание модели машинного обучения для прогнозирования стоимости недвижимости на основе таких характеристик, как площадь дома, количество спален и ванных комнат, расположение, возраст дома, наличие бассейна и других факторов.\n", "2. Разработка системы, которая анализирует волатильность цен (показатель изменчивости цены актива за определённый период времени) на недвижимость в разных районах, учитывая исторические данные о продажах, сезонные колебания и демографические изменения, чтобы выявить наиболее стабильные и перспективные зоны для инвестиций." ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " id date price bedrooms bathrooms \\\n", "0 7129300520 20141013T000000 221900.0 3 1.00 \n", "1 6414100192 20141209T000000 538000.0 3 2.25 \n", "2 5631500400 20150225T000000 180000.0 2 1.00 \n", "3 2487200875 20141209T000000 604000.0 4 3.00 \n", "4 1954400510 20150218T000000 510000.0 3 2.00 \n", "... ... ... ... ... ... \n", "21608 263000018 20140521T000000 360000.0 3 2.50 \n", "21609 6600060120 20150223T000000 400000.0 4 2.50 \n", "21610 1523300141 20140623T000000 402101.0 2 0.75 \n", "21611 291310100 20150116T000000 400000.0 3 2.50 \n", "21612 1523300157 20141015T000000 325000.0 2 0.75 \n", "\n", " sqft_living sqft_lot floors waterfront view ... grade \\\n", "0 1180 5650 1.0 0 0 ... 7 \n", "1 2570 7242 2.0 0 0 ... 7 \n", "2 770 10000 1.0 0 0 ... 6 \n", "3 1960 5000 1.0 0 0 ... 7 \n", "4 1680 8080 1.0 0 0 ... 8 \n", "... ... ... ... ... ... ... ... \n", "21608 1530 1131 3.0 0 0 ... 8 \n", "21609 2310 5813 2.0 0 0 ... 8 \n", "21610 1020 1350 2.0 0 0 ... 7 \n", "21611 1600 2388 2.0 0 0 ... 8 \n", "21612 1020 1076 2.0 0 0 ... 7 \n", "\n", " sqft_above sqft_basement yr_built yr_renovated zipcode lat \\\n", "0 1180 0 1955 0 98178 47.5112 \n", "1 2170 400 1951 1991 98125 47.7210 \n", "2 770 0 1933 0 98028 47.7379 \n", "3 1050 910 1965 0 98136 47.5208 \n", "4 1680 0 1987 0 98074 47.6168 \n", "... ... ... ... ... ... ... \n", "21608 1530 0 2009 0 98103 47.6993 \n", "21609 2310 0 2014 0 98146 47.5107 \n", "21610 1020 0 2009 0 98144 47.5944 \n", "21611 1600 0 2004 0 98027 47.5345 \n", "21612 1020 0 2008 0 98144 47.5941 \n", "\n", " long sqft_living15 sqft_lot15 \n", "0 -122.257 1340 5650 \n", "1 -122.319 1690 7639 \n", "2 -122.233 2720 8062 \n", "3 -122.393 1360 5000 \n", "4 -122.045 1800 7503 \n", "... ... ... ... \n", "21608 -122.346 1530 1509 \n", "21609 -122.362 1830 7200 \n", "21610 -122.299 1020 2007 \n", "21611 -122.069 1410 1287 \n", "21612 -122.299 1020 1357 \n", "\n", "[21613 rows x 21 columns]\n", "0 16356\n", "1 16413\n", "2 16491\n", "3 16413\n", "4 16484\n", " ... \n", "21608 16211\n", "21609 16489\n", "21610 16244\n", "21611 16451\n", "21612 16358\n", "Name: date_numeric, Length: 21613, dtype: int64\n" ] } ], "source": [ "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "from imblearn.under_sampling import RandomUnderSampler\n", "\n", "df = pd.read_csv(\"data/kc_house_data.csv\")\n", "print(df)\n", "\n", "# Преобразование даты продажи в числовой формат (кол-во дней с 01.01.1970)\n", "df['date'] = pd.to_datetime(df['date'])\n", "df['date_numeric'] = (df['date'] - pd.Timestamp('1970-01-01')).dt.days\n", "print(df['date_numeric'])\n" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | bathrooms_0.5 | \n", "bathrooms_0.75 | \n", "bathrooms_1.0 | \n", "bathrooms_1.25 | \n", "bathrooms_1.5 | \n", "bathrooms_1.75 | \n", "bathrooms_2.0 | \n", "bathrooms_2.25 | \n", "bathrooms_2.5 | \n", "bathrooms_2.75 | \n", "... | \n", "bedrooms_3 | \n", "bedrooms_4 | \n", "bedrooms_5 | \n", "bedrooms_6 | \n", "bedrooms_7 | \n", "bedrooms_8 | \n", "bedrooms_9 | \n", "bedrooms_10 | \n", "bedrooms_11 | \n", "bedrooms_33 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
1 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
2 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
3 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
4 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
21608 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "... | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
21609 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
21610 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
21611 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "... | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
21612 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
21613 rows × 41 columns
\n", "