ESP Journal of Engineering & Technology Advancements |
© 2021 by ESP JETA |
Volume 1 Issue 1 |
Year of Publication : 2021 |
Authors : Madan Mohan Tito Ayyalasomayajula, Santhosh Bussa, Sailaja Ayyalasomayajula |
: 10.56472/25832646/ESP-V1I1P114 |
Madan Mohan Tito Ayyalasomayajula, Santhosh Bussa, Sailaja Ayyalasomayajula, 2021. "Forecasting Home Prices Employing Machine Learning Algorithms: XGBoost, Random Forest, and Linear Regression", ESP Journal of Engineering & Technology Advancements 1(1): 125-133.
Accurate forecasting of home prices is crucial for all stakeholders in the real estate market, including buyers, sellers, and investors. This study examines the efficacy of various machine learning algorithms in predicting house prices by analyzing large datasets that encompass diverse property attributes such as size, location, and bedroom count. Linear Regression is a baseline among the models investigated due to its simplicity and interpretability. Random Forest, known for its capability to model complex, non-linear relationships between features, provides a robust ensemble approach. Enhancing prediction accuracy further, XGBoost, a gradient-boosting technique, demonstrates superior performance. Implementing these models utilizes Python with libraries such as Scikit-learn for model development and Pandas for data processing. Model performance is evaluated through metrics like Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The comparative analysis reveals that while Linear Regression offers straightforward interpretability, XGBoost consistently surpasses Random Forest in prediction accuracy. The study emphasizes the significance of feature engineering in enhancing model performance and highlights the importance of selecting the appropriate model for reliable home value forecasting. These insights hold practical value for the real estate sector, contributing to more precise and effective predictive models.
[1] Mariano, C.; Mónica, B. A random forest-based algorithm for data-intensive spatial interpolation in crop yield mapping. Comput. Electron. Agric. 2021, 184, 106094.
[2] Zhu, D.; Cheng, X.; Zhang, F.; Yao, X.; Gao, Y.; Liu, Y. Spatial interpolation using conditional generative adversarial neural networks. Int. J. Geogr. Inf. Sci. 2020, 34, 735–758.
[3] Hu, Q.; Li, Z.; Wang, L.; Huang, Y.; Wang, Y.; Li, L. Rainfall Spatial Estimations: A Review from Spatial Interpolation to Multi-Source Data Merging. Water 2019, 11, 579.
[4] Nghiep, N.; Al, C. Predicting Housing Value: A Comparison of Multiple Regression Analysis and Artificial Neural Networks. J. Real Estate Res. 2001, 22, 313–336.
[5] Lin, G.-F.; Chen, L.-H. A spatial interpolation method based on radial basis function networks incorporating a semivariogram model. J. Hydrol. 2004, 288, 288–298.
[6] Armaghani, D. J., Raja, R. S. N. S. B., Faizi, K., & Rashid, A. S. A. (2017). Developing a hybrid PSO–ANN model for estimating the ultimate bearing capacity of rock-socketed piles. Neural Computing and Applications, 28(2), 391–405.
[7] Bahia, I. S. H. (2013). A data mining model by using ANN for predicting real estate market: Comparative study. International Journal of Intelligence Science, 3(4), 162–169.
[8] Chaphalkar, N., & Sandbhor, S. (2013). Use of artificial intelligence in real property valuation. International Journal of Engineering and Technology, 5(3), 2334– 2337.
[9] Chau, K. W., & Chin, T. (2003). A critical review of literature on the hedonic price model. International Journal for Housing Science and Its Applications, 27(2), 145–165.
[10] Fanning, F. Stephen. (2014) “Market Analysis for Real Estate. Concepts and Applications in Valuation and Highest and Best Use.” Appraisal Institute, Chicago, Il.
[11] Braun, A. David. (2012) “Market Delineation.” The Appraisal Journal 80(2):122-129.
[12] Emerson, M. Don. (2008) “Subdivision Market Analysis and Absorption Forecasting.” The Appraisal Journal 76(4): 377-390.
[13] Dell, George. (2017) “Regression, Critical Thinking, and the Valuation Problem Today.” The Appraisal Journal 85(3): 217-230.
[14] “Big Data Interoperability Framework” (2015) National Institute of Standards and Technology, NIST (Washington, DC: US Department of Commerce, September 16, 2015): 8. Iwona Foryś et al. / Procedia Computer Science 207 (2022) 435–445 445 Author name / Procedia Computer Science 00 (2021) 000–000
[15] Wolverton, L. Marvin. (2009) “Introduction to Statistics for Appraisers.” Appraisal Institute, Chicago. 16. Isakson, R. Hans. (1998) “The review of real estate appraisals using multiple regression analysis.” Journal of Real Estate Research 15(2): 177-190.
[16] Mark, Jonathan, Goldberg, Michael. (1988). “Multiple regression analysis and mass assessment: a review of the issues.” The Appraisal Journal 56(1):89–109. 18. Radermacher, Walter. (2013) ”Handbook on Residential Property Prices Indices (RPPIs).” Statistical Office of the European Union (Eurostat), Belgium.
[17] Shiller, J. Robert. (1991) “Arithmetic Repeat Sales Price Estimators.” Journal of Housing Economics 1(1):110– 126.
[18] Foryś, Iwona. (2012) “Mix-adjustment method of determining residential real estate price indices on the example of cooperative premises.” Studies and Materials of the Scientific Society for Real Estate 20(1): 41–52.
[19] Darshan Sangani, Kelby Erickson and Mohammad Al Hasan, "Predicting Zillow Estimation Error Using Linear Regression and Gradient Boosting", IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), pp. 530-534.
[20] Azme Bin Khamis and Nur Khalidah Khalilah Binti Kamarudin, "Comparative Study On Estimate House
[21] Price Using Statistical and Neural Network", International journal of scientific and technology research, vol. 3, no. 12, pp. 126-131, December 2014.
[22] Adyan Nur Alfiyatin, Hilman Taufiq, Ruth Ema Febrita and Wayan Firdaus Mahmudy, "Modeling House Price Prediction using Regression Analysis and Particle Swarm Optimization Case Study: Malang East Java Indonesia", (IJACSA) International Journal of Advanced Computer Science and Applications, vol. 8, no. 10, pp. 323-326, 2017.
[23] Nageswara Rao Moparthi and Dr. N. Geenthanjali, "Design and implementation of hybrid phase based ensemble technique for defect discovery using SDLC software metrics", An International Conference by IEEE, pp. 268-274, 2016.
[24] Nihar Bhagat and Ankit Mohokar, "Shreyash House Price Forecasting using Data Mining", International Journal of Computer Applications, vol. 152, no. 2, pp. 23-26, October 2016.
[25] Valeria Fonti, Feature Selection using LASSO Research Paper in Business Analytics, VU Amsterdam, March 2017.
[26] E.-S. M. El-Kenawy, A. Ibrahim, S. Mirjalili, M. M. Eid and S. E. Hussein, "Novel feature selection and voting classifier algorithms for covid-19 classification in ct images", IEEE access, vol. 8, pp. 179317-179335, 2020.
[27] X. Shi, C. Prins, G. Van Pottelbergh, P. Mamouris, B. Vaes and B. De Moor, "An automated data cleaning method for electronic health records by incorporating clinical knowledge", BMC Medical Informatics and Decision Making, vol. 21, pp. 1-10, 2021.
[28] S. Ray, "A quick review of machine learning algorithms", 2019 International conference on machine learning big data cloud and parallel computing (COMITCon), pp. 35-39, 2019.
[29] C. Deb and A. Schlueter, "Review of data-driven energy modelling techniques for building retrofit", Renewable and Sustainable Energy Reviews, vol. 144, pp. 110990, 2021.
[30] Ayyalasomayajula, M., & Chintala, S. (2020). Fast Parallelizable Cassava Plant Disease Detection using Ensemble Learning with Fine Tuned AmoebaNet and ResNeXt-101. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 11(3), 3013–3023.
[31] Ayyalasomayajula, M. M. T., Chintala, S., & Sailaja, A. (2019). A Cost-Effective Analysis of Machine Learning Workloads in Public Clouds: Is AutoML Always Worth Using? International Journal of Computer Science Trends and Technology (IJCST), 7(5), 107–115.
[32] Chintala, S. ., & Ayyalasomayajula, M. M. T. . (2019). OPTIMIZING PREDICTIVE ACCURACY WITH GRADIENT BOOSTED TREES IN FINANCIAL FORECASTING. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 10(3), 1710–1721. https://doi.org/10.61841/turcomat.v10i3.14707
[33] Tito Ayyalasomayajula, Madan Mohan, and Sailaja Ayyalasomayajula. “Improving Machine Reliability With Recurrent Neural Networks”. International Journal for Research Publication and Seminar, vol. 11, no. 4, Dec. 2020, pp. 253-79, doi:10.36676/jrps.v11.i4.1500.
Machine Learning, Real Estate Analytics, Linear Regression, Random Forest, XGBoost, Predictive Modeling, Feature Engineering, Python, Data Preprocessing, Scikit-learn, RMSE (Root Mean Square Error).