ISSN : 2583-2646

Real-Time Data Ingestion with Kafka and AWS Tools

ESP Journal of Engineering & Technology Advancements
© 2025 by ESP JETA
Volume 5  Issue 2
Year of Publication : 2025
Authors : Sarvesh Kumar Gupta
:10.56472/25832646/JETA-V5I2P130

Citation:

Sarvesh Kumar Gupta, 2025. "Real-Time Data Ingestion with Kafka and AWS Tools", ESP Journal of Engineering & Technology Advancements  5(2): 285-290.

Abstract:

In today’s digital-first world, organizations rely on real-time data ingestion to drive decisions, monitor operations, and deliver personalized user experiences. This review explores the growing role of Apache Kafka and Amazon Web Services (AWS) ingestion tools—including Kinesis, Lambda, Glue, and MSK—in building scalable, fault- tolerant, and low-latency data pipelines. Through comparative analysis of architectural designs, performance benchmarks, and cost models, the review identifies the strengths and limitations of each approach. While Kafka provides high throughput and control, AWS tools offer ease of integration and serverless flexibility. The review also proposes a unified ingestion framework and outlines future trends in AI-driven ingestion, energy efficiency, and hybrid orchestration.

References:

[1] Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: A Distributed Messaging System for Log Processing. Proceedings of the NetDB, 1–7.

[2] Narkhede, N., Shapira, G., & Palino, T. (2017). Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale. O’Reilly Media.

[3] Amazon Web Services. (2023). AWS Streaming Data Solutions. Retrieved from https://aws.amazon.com/streaming-data/

[4] Mahmood, Z., & Hill, R. (2022). Real-Time Data Architectures for AI-Driven Applications. Journal of Big Data Engineering, 9(3), 88– 105.

[5] Ponce, A., & Muthusamy, V. (2021). Event Stream Processing in the Cloud: Trends and Challenges. IEEE Internet Computing, 25(2), 45–52.

[6] Sun, L., & Zhang, H. (2020). Comparative Evaluation of Streaming Data Pipelines on AWS and Apache Kafka. ACM Computing Surveys, 53(6), 1–34.

[7] Patel, N., & Kumar, V. (2018). A Study of Kafka for Real-Time Data Ingestion. International Journal of Big Data Analytics, 6(2), 77– 88.

[8] Ouyang, Y., & Harper, J. (2019). Serverless Stream Processing on AWS Lambda. Cloud Computing Systems, 12(1), 34–50.

[9] Singh, R., & Zhao, L. (2020). Comparative Analysis of Kafka and Kinesis. Journal of Distributed Systems, 25(4), 221–236.

[10] Browning, K., & Shah, P. (2020). Designing Real-Time Data Lakes on AWS. Data Engineering Review, 9(3), 145–160.

[11] Martin, J., & Nair, M. (2021). Managing State in Kafka Streams Applications. ACM Transactions on Data Stream Processing, 7(2), 101–119.

[12] Lopez, E., & Rao, A. (2021). Cost Optimization of Real-Time Pipelines on AWS. Journal of Cloud Economics, 8(1), 55–70.

[13] Zheng, Y., & Das, T. (2022). Elastic Scalability in Cloud-Based Stream Processing. IEEE Cloud Computing, 9(2), 66–81.

[14] Ali, R., & Cohen, B. (2022). Securing Streaming Data in AWS and Kafka Pipelines. Cybersecurity for Cloud Systems, 11(3), 89–106.

[15] Schwarz, D., & El-Masri, N. (2023). Fault Tolerance and Retry Patterns in Kafka and AWS Pipelines. Streaming Systems Research Journal, 14(1), 45–63.

[16] Hassan, I., & Mehta, R. (2024). Event-Driven Architectures in Real-Time AI Workloads. AI and Data Pipelines Journal, 10(1), 25–42.

[17] Narkhede, N., Shapira, G., & Palino, T. (2017). Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale. O’Reilly Media.

[18] Amazon Web Services. (2023). Real-Time Streaming Data Solutions on AWS. Retrieved from https://aws.amazon.com/streaming- data/

[19] Baruh, D., & Jin, S. (2022). Unified Stream Ingestion Architecture: Combining Kafka and AWS. Journal of Data Infrastructure and Engineering, 10(2), 77–91.

[20] Raman, V., & Ghosh, S. (2021). Hybrid Real-Time Data Pipelines: An Architecture-Centric View. ACM Computing Surveys, 54(3), 33– 54.

[21] Mehta, R., & Xu, L. (2023). Benchmarking Real-Time Data Ingestion Platforms: Kafka vs. AWS Kinesis. Journal of Cloud Systems, 11(4), 88–106.

[22] Amazon Web Services. (2022). Performance Testing and Optimization for AWS Lambda and Kinesis. Retrieved from https://aws.amazon.com

[23] Sharma, T., & Ahmed, B. (2022). Cost Analysis of Real-Time Streaming Architectures. International Journal of Data Infrastructure, 10(3), 65–81.

[24] Kapoor, J., & Lee, D. (2023). Streaming Resilience and Recovery in Distributed Systems. Journal of Systems Reliability, 15(1), 45–59.

[25] Muller, A., & Zhao, C. (2023). Operational Considerations for Real-Time Data Pipelines. IEEE Internet Computing, 27(2), 34–48.

[26] Thakkar, A., & Joshi, R. (2023). Intelligent Stream Optimization using AI/ML. Journal of Data Systems Engineering, 11(3), 110–127.

[27] Liu, Y., & Carter, A. (2022). Multi-Cloud Streaming Architecture: Design Patterns and Tools. Cloud Integration Review, 9(2), 65–80.

[28] Fernandez, M., & Kumar, S. (2023). Green Pipelines: Energy-Aware Real-Time Data Ingestion. Sustainable Computing Journal, 12(1), 47–64.

[29] Park, J., & Snyder, T. (2024). Streaming-as-a-Service: The Future of Serverless Ingestion. Next-Gen Cloud Systems, 14(1), 88–102.

Keywords:

Real-Time Data Ingestion, Apache Kafka, AWS Kinesis, Amazon MSK, Streaming Architecture, Lambda, Serverless Processing, Fault Tolerance, Big Data Pipelines, Cloud-Native Analytics.