FlowMind: Mining Real User Telemetry to Power LLM-Driven Autonomous App Testing

Sachin Francis

FlowMind: Mining Real User Telemetry to Power LLM-Driven Autonomous App Testing

ESP Journal of Engineering & Technology Advancements

Volume 5 Issue 4

Year of Publication : 2025

Authors : Sachin Francis

:10.56472/25832646/JETA-V5I4P119

Citation:

Sachin Francis, 2025. "FlowMind: Mining Real User Telemetry to Power LLM-Driven Autonomous App Testing", ESP Journal of Engineering & Technology Advancements 5(4): 128-137.

Abstract:

Automated testing is the cornerstone of software reliability. But authoring and maintaining functional regression tests continues to demand significant manual effort. Traditional approaches such as Espresso and Appium require engineers to script explicit user interactions into the tests. This rapidly becomes brittle as product features evolve. At the same time, every modern application already captures extensive telemetry data. Those include, but not limited to, screen impressions, navigations and user interactions. This represents a detailed record of real user behavior on the app.FlowMind leverages this untapped data source to enable autonomous regression testing without manual test derivation and coding / scripting. By mining tracking and telemetry logs to identify the most frequent user flows, FlowMind generates a structured schema describing real interaction sequences. A semantic repository links telemetry identifiers to human-understandable UI components, allowing a large language model driven agent to interpret and execute these flows directly within the application. The system autonomously navigates, validates, and adapts to UI changes, achieving realistic and evolving test coverage aligned with production usage patterns. A prototype implementation demonstrates that FlowMind achieves comparable coverage to manually authored tests while reducing creation and maintenance effort by more than 80%. FlowMind points toward a new paradigm of tracking / telemetry-driven, self-evolving testing.

References:

[1] Z. Liu, C. Chen, J. Wang, M. Chen, B. Wu, X. Che, D. Wang, and Q. Wang, “Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing,” arXiv preprint, May 2023. [Online]. Available: https://arxiv.org/abs/2305.09434. [Accessed: Oct. 24, 2025].

[2] J. Yoon, R. Feldt, and S. Yoo, “Intent-Driven Mobile GUI Testing with Autonomous Large Language Model Agents,” Proc. 2024 IEEE/ACM Int. Conf. Softw. Eng. (ICSE) Workshop, 2024. [Online]. Available: https://coinse.github.io/publications/pdfs/Yoon2024aa.pdf. [Accessed: Oct. 24, 2025].

[3] B. Ju, J. Yang, T. Yu, T. Abdullayev, Y. Wu, D. Wang, and Y. Zhao, “A Study of Using Multimodal LLMs for Non-Crash Functional Bug Detection in Android Apps,” arXiv preprint, Jul. 2024. [Online]. Available: https://arxiv.org/abs/2407.19053. [Accessed: Oct. 24, 2025].

[4] X. Li, J. Cao, Y. Liu, S.-C. Cheung, and H. Wang, “ReuseDroid: A VLM-Empowered Android UI Test Migrator Boosted by Active Feedback,” arXiv preprint, Apr. 2025. [Online]. Available: https://arxiv.org/abs/2504.02357. [Accessed: Oct. 24, 2025].

[5] C. Wang, T. Liu, Y. Zhao, M. Yang, and H. Wang, “LLMDroid: Enhancing Automated Mobile App GUI Testing Coverage with Large Language Model Guidance,” Proc. 2025 ACM SIGSOFT/FSE Research Papers, June 2025. DOI: 10.1145/3715763. [Online]. Available: https://doi.org/10.1145/3715763. [Accessed: Oct. 24, 2025].

[6] “LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects,” Preprints.org, Jan. 2025. [Online]. Available: https://www.preprints.org/manuscript/202501.0413/v1. [Accessed: Oct. 24, 2025].

[7] GitHub repository, “droidrun/droidrun: Automate your mobile devices with natural-language LLM agents,” GitHub. [Online]. Available: https://github.com/droidrun/droidrun. [Accessed: Oct. 24, 2025].

[8] GitHub documentation, “DroidAgent – LLM-based Android device control,” [Online]. Available: https://docs.droidrun.ai/v2/concepts/agent

[9] GitHub project, “coinse/droidagent: Intent-Driven Mobile GUI Testing with Autonomous LLM Agents,” [Online]. Available: https://github.com/coinse/droidagent. [Accessed: Oct. 24, 2025].

[10] “Stop Writing Mobile UI Tests by Hand — Let DroidRun Do It For You,” Medium Blog by J. (“Jannis”), Oct. 2025. [Online]. Available: https://medium.com/%40PowerUpSkills/stop-writing-mobile-ui-tests-by-hand-let-droidrun-do-it-for-you-4615a0294adf. [Accessed: Oct. 24, 2025].

[11] S. Elbaum, A. Malishevsky, and G. Rothermel, “Test case prioritization: A family of empirical studies,” IEEE Transactions on Software Engineering, vol. 28, no. 2, pp. 159–182, Feb. 2002

[12] Microsoft Learn, “Test Impact Analysis in Azure Pipelines,” Microsoft Documentation. [Online]. Available: https://learn.microsoft.com/en-us/azure/devops/pipelines/test/test-impact-analysis. [Accessed: Oct. 24, 2025].

[13] M. Gligoric, L. Eloussi, and D. Marinov, “Ekstazi: Lightweight test selection,” in Proc. 37th Int. Conf. Software Engineering (ICSE), Florence, Italy, May 2015. [Online]. Available: https://users.ece.utexas.edu/~gligoric/papers/GligoricETAL15Ekstazi.pdf. [Accessed: Oct. 24, 2025].

Keywords:

Automated Testing; Regression Testing; User Telemetry; Test Generation; Large Language Models; Software Quality Assurance; Autonomous Agents; Mobile Applications; Android Testing.

ISSN : 2583-2646