7.3.4 Applications with Examples - IV¶
Introduction¶
- Objective: Explore the application of simple and multiple linear regression to model selling prices of flats based on area in square feet and potentially other variables.
- Context: Vijay Gowda, a real estate analyst working for Magic Bricks, uses a dataset from Jayalakshmi Puram in Mysuru to determine market rates for flats.
Data Overview¶
- Dataset Characteristics: Consists of 30 observations including variables such as selling price, area (square feet), number of bedrooms and bathrooms, and features like premium locality, gated complex, and amenities.
- Initial Analysis: Begins with a simple linear regression model focusing on area as the independent variable to predict selling prices.
Initial Model Development¶
- Scatter Plot Analysis:
- Reveals a rough linear trend but indicates potential outliers.
- Trend Line Equation: \(y = 28.37 + 0.0968x\), where \(x\) is the area in square feet.
- Coefficient of Determination (\(R^2\)): Initially found to be 0.44, suggesting moderate explanatory power.
Comprehensive Regression Analysis¶
- Tool Used: Excel's Analysis ToolPak.
- Regression Details:
- Dependent Variable (Y): Selling price.
- Independent Variable (X): Area in square feet.
- Outputs to Generate: Regression results, residuals, standardized residuals, and their plots.
Regression Results and Interpretation¶
- \(R^2\) Value Confirmation: Confirmed at 0.44 from Excel, aligning with the preliminary analysis.
- Adjusted \(R^2\): Stands at approximately 0.43, consistent across models indicating the model's stability.
- Standard Error: Noted at 70.57, representing an estimate of the standard deviation of the regression errors.
ANOVA Table Insights¶
- SSR (Sum of Squares due to Regression): 111539.
- SSE (Sum of Squares due to Error): 139462.
- SST (Total Sum of Squares): 251001, with SSR accounting for about 44% of SST.
Further Statistical Measures¶
- MSR (Mean Square Regression): Equal to SSR in this single-variable model at 111539.
- MSE (Mean Square Error): Computed as 4981, based on 28 degrees of freedom.
- F-Statistic: Calculated as 22.39, indicating the regression provides a significant fit to the data.
- P-Value: Close to zero, strongly supporting the rejection of the null hypothesis (\(\beta_1 = 0\)).
Hypothesis Testing for Slope¶
- Coefficient \(b_1\): Indicates a price increase of 0.09 lakhs (90,000) per square foot increase.
- T-Statistic for \(b_1\): 4.73, with 28 degrees of freedom, suggesting significant impact of area on selling price.
Residual Analysis¶
- Standardized Residuals: Some exceed \(\pm3\), potential outliers.
- Residual Plots: Show randomness, supporting the model’s assumptions of homogeneity and independence.
Conclusion¶
- Model Evaluation: Demonstrates a significant relationship between area and selling price, with room for model improvement by adding more variables.
- Next Steps: Proposes developing a multiple linear regression model to include additional predictive factors, enhancing explanatory power and accuracy.