7.3.5 Applications with Examples - V¶
Introduction¶
- Objective: This session aims to enhance the predictive power of the regression model by incorporating additional quantitative variables into the analysis.
- Context: Vijay Gowda, a real estate analyst for Magic Bricks, is analyzing property prices in Jayalakshmi Puram, Mysore. He aims to improve his previous model's R-squared value of 0.44 by adding more variables.
Data and Initial Analysis¶
- Data Set: Includes 30 observations with variables like selling price, area in square feet, number of bedrooms, and number of bathrooms.
- Initial Model: Simple linear regression with area as the sole predictor, yielding an R-squared of 0.44, indicating that 44% of the variability in selling prices is explained by area alone.
Enhancing the Model¶
- New Variables: Consideration of additional quantitative variables such as the number of bedrooms and bathrooms to possibly improve the model.
- Initial Thoughts: Bedrooms and bathrooms are initially considered quantitative variables, but are clarified to be qualitative due to their categorical nature (i.e., their numeric values signify categories without intrinsic ordering).
Multiple Linear Regression Development¶
- Variable Selection: Area in square feet and number of bedrooms are selected for the new MLR model.
- Excel Analysis ToolPak Usage: The selling price is set as the dependent variable (Y), with area and number of bedrooms as independent variables (X1 and X2).
Model Outputs and Analysis¶
- Regression Output: The regression analysis is conducted, focusing on computing residuals, standardized residuals, and residual plots.
- R-squared Analysis: The new MLR model yields an R-squared slightly higher than the simple model but still under expectations, prompting further analysis.
Diagnostics and Learning¶
- Adjusted R-squared: Remains close to the R-squared, confirming the modest increase in explanatory power.
- Standard Error: Comparable to the simple linear regression, indicating consistency in the model's predictive error.
- ANOVA Output: Shows an increase in SSR (sum of squares regression) with the addition of bedrooms, although not dramatically.
Hypothesis Testing and Coefficients¶
- F-Statistic: Indicates a significant overall model fit, suggesting that the model variables collectively influence selling prices.
- Coefficient Analysis: Examines the impact of each variable, with area showing a significant relationship with selling price, whereas bedrooms do not show a statistically significant effect, possibly due to multicollinearity.
Conclusion and Next Steps¶
- Model Assessment: The inclusion of bedrooms did not significantly improve the model as expected, highlighting the potential issue of multicollinearity.
- Future Directions: Suggests the inclusion of qualitative variables in future models and plans to cover this approach in the next module.
Ask Hive Chat
Hive Chat
Hi, I'm Hive Chat, an AI assistant created by CollegeHive.
How can I help you today?
How can I help you today?