Skip to content

7.3.4 Applications with Examples - IV

Introduction

  • Objective: Explore the application of simple and multiple linear regression to model selling prices of flats based on area in square feet and potentially other variables.
  • Context: Vijay Gowda, a real estate analyst working for Magic Bricks, uses a dataset from Jayalakshmi Puram in Mysuru to determine market rates for flats.

Data Overview

  • Dataset Characteristics: Consists of 30 observations including variables such as selling price, area (square feet), number of bedrooms and bathrooms, and features like premium locality, gated complex, and amenities.
  • Initial Analysis: Begins with a simple linear regression model focusing on area as the independent variable to predict selling prices.

Initial Model Development

  • Scatter Plot Analysis:
  • Reveals a rough linear trend but indicates potential outliers.
  • Trend Line Equation: \(y = 28.37 + 0.0968x\), where \(x\) is the area in square feet.
  • Coefficient of Determination (\(R^2\)): Initially found to be 0.44, suggesting moderate explanatory power.

Comprehensive Regression Analysis

  • Tool Used: Excel's Analysis ToolPak.
  • Regression Details:
  • Dependent Variable (Y): Selling price.
  • Independent Variable (X): Area in square feet.
  • Outputs to Generate: Regression results, residuals, standardized residuals, and their plots.

Regression Results and Interpretation

  • \(R^2\) Value Confirmation: Confirmed at 0.44 from Excel, aligning with the preliminary analysis.
  • Adjusted \(R^2\): Stands at approximately 0.43, consistent across models indicating the model's stability.
  • Standard Error: Noted at 70.57, representing an estimate of the standard deviation of the regression errors.

ANOVA Table Insights

  • SSR (Sum of Squares due to Regression): 111539.
  • SSE (Sum of Squares due to Error): 139462.
  • SST (Total Sum of Squares): 251001, with SSR accounting for about 44% of SST.

Further Statistical Measures

  • MSR (Mean Square Regression): Equal to SSR in this single-variable model at 111539.
  • MSE (Mean Square Error): Computed as 4981, based on 28 degrees of freedom.
  • F-Statistic: Calculated as 22.39, indicating the regression provides a significant fit to the data.
  • P-Value: Close to zero, strongly supporting the rejection of the null hypothesis (\(\beta_1 = 0\)).

Hypothesis Testing for Slope

  • Coefficient \(b_1\): Indicates a price increase of 0.09 lakhs (90,000) per square foot increase.
  • T-Statistic for \(b_1\): 4.73, with 28 degrees of freedom, suggesting significant impact of area on selling price.

Residual Analysis

  • Standardized Residuals: Some exceed \(\pm3\), potential outliers.
  • Residual Plots: Show randomness, supporting the model’s assumptions of homogeneity and independence.

Conclusion

  • Model Evaluation: Demonstrates a significant relationship between area and selling price, with room for model improvement by adding more variables.
  • Next Steps: Proposes developing a multiple linear regression model to include additional predictive factors, enhancing explanatory power and accuracy.