Ames Housing Price Analysis

This blog post includes EDA, Modeling and how to improve model performance and how Model Assessment and Diagnostics is done. Here, I am not predicting any price, in this blog you just get to know how to do analysis of price of the house with the help of single linear model.

Jaykumar A Patel
2021-05-04

Problem Statement

Univariate Analysis

Exploring the univariate distribution of the response variable Sale price.

Find outlier(s) in your original response variable

Bivariate Analysis

Heat Map to find highly correlated features.

We can see top 3 numerical feature that are directly correlated to the response variable.

  1. Gr_Liv_Area Above Ground Living Area in sqft
  2. Garage_Area Garage Area in sqft
  3. Total_Bsmt_SF Total basement Size in sqft

Now, We look at each feature one by one and see if we can improve the relation?

Relation between Sale Price and Above ground living area size

Relation between Sale Price and Size of garage

Relation between Sale Price and Total Basement size

Conclusion of EDA

Data Modeling

Model 1 (Sale Price and Above ground living area size)

# A tibble: 2 x 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)    5906.   3352.        1.76  0.0782
2 Gr_Liv_Area     117.      2.16     54.1   0     

\(\hat{y}= 5905.812 + 116.874 \times Above-ground-living-area-size\)

Model 2 (Sale Price and Size of garage)

# A tibble: 2 x 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)   46925.   3341.        14.0 2.40e- 43
2 Garage_Area     277.      6.30      44.1 1.35e-321

\(\hat{y}= 46925.28 + 277.37 \times Size-of-garage\)

Model 3 (Sale Price and Total Basement size)

# A tibble: 2 x 5
  term          estimate std.error statistic  p.value
  <chr>            <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)     35394.   3309.        10.7 3.36e-26
2 Total_Bsmt_SF     137.      2.90      47.2 0       

\(\hat{y}= 35393.94 + 137 \times Total-Basement-Size\)

Model Assessment

Above ground living area size Size of garage Total Basement size
(Intercept) 5905.812 46925.278 35393.941
(3352.336) (3340.582) (3309.416)
Gr_Liv_Area 116.874
(2.161)
Garage_Area 277.371
(6.296)
Total_Bsmt_SF 136.994
(2.900)
Num.Obs. 2904 2766 2844
R2 0.502 0.413 0.440
R2 Adj. 0.502 0.412 0.440
AIC 71512.6 68800.1 70599.3
BIC 71530.5 68817.9 70617.1
Log.Lik. -35753.310 -34397.051 -35296.640
F 2924.609 1940.784 2231.943

As we can see in the table,

The reason why we considered Adjusted R squared value to select best model is that,

Model Diagnostics

Now that you have chosen our best model, we need to determine if it is reliable or not:

  1. Linearity

  2. Distribution of Residuals

  3. Constant Variability of residuals

# A tibble: 2,904 x 8
   Sale_Price Gr_Liv_Area .fitted  .resid     .hat .sigma     .cooksd
        <int>       <int>   <dbl>   <dbl>    <dbl>  <dbl>       <dbl>
 1     215000        1656 199450.  15550. 0.000394 53815. 0.0000165  
 2     105000         896 110625.  -5625. 0.000896 53816. 0.00000491 
 3     172000        1329 161232.  10768. 0.000382 53816. 0.00000765 
 4     244000        2110 252510.  -8510. 0.000983 53816. 0.0000123  
 5     189900        1629 196294.  -6394. 0.000380 53816. 0.00000268 
 6     195500        1604 193372.   2128. 0.000369 53816. 0.000000289
 7     213500        1338 162284.  51216. 0.000377 53807. 0.000171   
 8     191500        1280 155505.  35995. 0.000409 53812. 0.0000917  
 9     236500        1616 194775.  41725. 0.000374 53810. 0.000112   
10     189000        1804 216747. -27747. 0.000513 53813. 0.0000683  
# ... with 2,894 more rows, and 1 more variable: .std.resid <dbl>

Checking Linearity

Checking Distribution of Residuals

Checking Constant Variability of residuals

Citation

For attribution, please cite this work as

Patel (2021, May 4). Jaykumar Patel: Ames Housing Price Analysis. Retrieved from https://jaykumar-patel.netlify.app/posts/2021-05-04-ames-housing-price-analysis/

BibTeX citation

@misc{patel2021ames,
  author = {Patel, Jaykumar A},
  title = {Jaykumar Patel: Ames Housing Price Analysis},
  url = {https://jaykumar-patel.netlify.app/posts/2021-05-04-ames-housing-price-analysis/},
  year = {2021}
}