| ESP Journal of Engineering & Technology Advancements |
| © 2022 by ESP JETA |
| Volume 2 Issue 1 |
| Year of Publication : 2022 |
| Authors : Santosh Kumar Singu |
: 10.56472/25832646/ESP-V2I1P110 |
Santosh Kumar Singu, 2022. "ETL Process Automation: Tools and Techniques ", ESP Journal of Engineering & Technology Advancements, 2(1): 74-85.
ETL stands for Extract, Transform, and load, is a fundamental process in the operation of data warehousing and business intelligence. It refers to a process whereby data is pulled from different sources and is transformed and then uploaded in a data warehouse. Some of the specific justification for automating the ETL process includes the following; Organizations dealing with large and complex data need to have it automated in order to ensure efficiency, scalability and accuracy of the process. This article goes deeper into understanding the tools and technologies employed in automating the ETL processes, and the major ones include Apache Nifi, Talend Informatica and others. We address their strengths, the effects of automation, and considerations for ETL automation. The article also contains an example of implementing an automated ETL process and the analysis of the outcomes; the advantages and the shortcomings of the approach are also mentioned. By conducting a literature review and statistical investigation, the author’s objective is to present a systematic guide for organizations that consider automating the ETL process.
[1] Radhakrishna, V., SravanKiran, V., & Ravikiran, K. (2012, December). Automating ETL process with scripting technology. In 2012 Nirma University International Conference on Engineering (NUiCONE) (pp. 1-4). IEEE.
[2] Mondal, K. C., Biswas, N., & Saha, S. (2020, January). Role of machine learning in ETL automation. In Proceedings of the 21st International Conference on Distributed Computing and Networking (pp. 1-6).
[3] Petrović, M., Vučković, M., Turajlić, N., Babarogić, S., Aničić, N., & Marjanović, Z. (2017). Automating ETL processes using the domain-specific modeling approach. Information Systems and e-Business Management, 15, 425-460.
[4] Mali, N., & Bojewar, S. (2015). A survey of ETL tools. International Journal of Computer Techniques, 2(5), 20-27.
[5] Muñoz, L., Mazón, J. N., & Trujillo, J. (2009, November). Automatic generation of ETL processes from conceptual models. In Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP (pp. 33-40).
[6] Albrecht, A., & Naumann, F. (2008). Managing ETL Processes. NTII, 8(2008), 12-15.
[7] El-Sappagh, S. H. A., Hendawi, A. M. A., & El Bastawissy, A. H. (2011). A proposed model for data warehouse ETL processes. Journal of King Saud University-Computer and Information Sciences, 23(2), 91-104.
[8] Knap, T., Skoda, P., Klímek, J., & Necaský, M. (2015, April). UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management. In DATESO (pp. 111-120).
[9] Gour, V., Sarangdevot, S. S., Tanwar, G. S., & Sharma, A. (2010). Improve performance of extract, transform and load (ETL) in data warehouse. Int. Journal on Comp. Sci. and Eng, 2(3), 786-789.
[10] Pham, P. (2020). A case study in developing an automated ETL solution: concept and implementation.
[11] Berkani, N., Bellatreche, L., & Guittet, L. (2018). ETL processes in the era of variety. Transactions on Large-Scale Data and Knowledge-Centered Systems XXXIX: Special Issue on Database-and Expert-Systems Applications, 98-129.
[12] Vuka, E., & Petritaj, O. (2018). A Review on Traditionally ETL Process for Better Approach in Business Intelligence. RTA-CSIT, 17-23.
[13] Sun, K., & Lan, Y. (2012, October). SETL: A scalable and high performance ETL system. In 2012 3rd International Conference on System Science, Engineering Design and Manufacturing Informatization (Vol. 1, pp. 6-9). IEEE.
[14] Figueiras, P., Costa, R., Guerreiro, G., Antunes, H., Rosa, A., Jardimgonçalves, R., & Eng, D. D. (2017). User Interface Support for a Big ETL Data Processing Pipeline.
[15] Pogiatzis, A., & Samakovitis, G. (2020). An event-driven serverless ETL pipeline on AWS. Applied Sciences, 11(1), 191.
[16] Ali, S. M. F., & Wrembel, R. (2019). Towards a cost model to optimize user-defined functions in an ETL workflow based on user-defined performance metrics. In Advances in Databases and Information Systems: 23rd European Conference, ADBIS 2019, Bled, Slovenia, September 8–11, 2019, Proceedings 23 (pp. 441-456). Springer International Publishing.
[17] Qu, W., Shankar, S., Ganza, S., & Dessloch, S. (2015, August). HBelt: Integrating an incremental ETL pipeline with a big data store for real-time analytics. In East European Conference on Advances in Databases and Information Systems (pp. 123-137). Cham: Springer International Publishing.
[18] Ali, S. M. F., & Wrembel, R. (2017). From conceptual design to performance optimization of ETL workflows: current state of research and open problems. The VLDB Journal, 26(6), 777-801.
[19] Liu, X., Thomsen, C., & Pedersen, T. B. (2012). CloudETL: scalable dimensional ETL for hadoop and hive. History.
[20] Berkani, N., & Bellatreche, L. (2017, August). A variety-sensitive ETL processes. In International Conference on Database and Expert Systems Applications (pp. 201-216). Cham: Springer International Publishing.
[21] Santosh Kumar Singu, 2021. "Designing Scalable Data Engineering Pipelines Using Azure and Databricks", ESP Journal of Engineering & Technology Advancements, 1(2): 176-187.
[22] Santosh Kumar Singu, 2021. "Real-Time Data Integration: Tools, Techniques, and Best Practices", ESP Journal of Engineering & Technology Advancements 1(1): 158-172.
ETL Automation, Data Integration, Data Transformation, Data Warehousing, Apache Nifi, Talend, Informatica.