ISSN : 2583-2646

Rule-Based Sensitive Data Classification & Masking for Hybrid Environments

ESP Journal of Engineering & Technology Advancements
© 2022 by ESP JETA
Volume 2  Issue 1
Year of Publication : 2022
Authors : Narasimha Chaitanya Samineni
: 10.56472/25832646/ESP-V2I1P123

Citation:

Narasimha Chaitanya Samineni , 2022. "Rule-Based Sensitive Data Classification & Masking for Hybrid Environments", ESP Journal of Engineering & Technology Advancements, 2(1): 197-205.

Abstract:

Hybrid environments combining on-premises and cloud platforms introduce complexity in how sensitive data is identified, governed, and protected. Sensitive attributes such as PII, PCI, PHI, and financial identifiers often exist across distributed systems with inconsistent controls, increasing the risk of exposure and regulatory non-compliance [2], [4]. This paper proposes a rule-based sensitive data classification and masking framework designed for hybrid architectures. The framework uses metadata rules, pattern-based detection, and policy-driven masking to ensure consistent protection across databases, cloud warehouses, ETL pipelines, and analytical platforms. Compared to probabilistic or machine-learning approaches, rule-based methods provide deterministic, explainable, and audit-ready results aligned with enterprise governance standards [1], [6]. The study demonstrates the framework’s effectiveness in improving classification accuracy, reducing manual effort, and supporting compliance across hybrid workloads.

References:

[1] R. Maddali, “Automating Data Quality Assurance Using Machine Learning in ETL Pipelines,” International Journal of Leading Research Publication, vol. 2, no. 6, pp. 1–11, Jun. 2021, doi: 10.5281/zenodo.15107533.

[2] A. Cavoukian, Privacy by Design: The 7 Foundational Principles, Information and Privacy Commissioner of Ontario, 2011.

[3] NIST, Guide to Protecting the Confidentiality of Personally Identifiable Information (PII), NIST Special Publication 800-122, 2010.

[4] European Union, General Data Protection Regulation (GDPR), EU Regulation 2016/679, 2018.

[5] PCI Security Standards Council, PCI DSS: Data Security Standard Requirements and Testing Procedures, v3.2.1, 2018.

[6] ISO/IEC 27018, Protection of Personally Identifiable Information (PII) in Public Clouds Acting as PII Processors, ISO, 2019.

[7] HIPAA, The HIPAA Privacy Rule, U.S. Department of Health & Human Services, 2013.

[8] D. Loshin, Enterprise Knowledge Management: The Data Quality Approach, Morgan Kaufmann, 2010.

[9] A. P. Moore, R. J. Ellison, and R. C. Linger, “Attack Modeling for Information Security and Survivability,” Software Engineering Institute, 2001.

[10] Gartner, Best Practices for Data Masking and Sensitive Data Protection, Gartner Research Report, 2020.

[11] Oracle, Data Masking and Subsetting Guide, Oracle Documentation Library, 2019.

[12] IBM, Sensitive Data Discovery and Classification for Hybrid Cloud, IBM Redbooks, 2020.

[13] McKinsey & Company, Modernizing Data Governance for Hybrid Data Architectures, McKinsey Insights, 2020.

[14] M. Bishop, Computer Security: Art and Science, 2nd ed., Addison-Wesley, 2018.

Keywords:

Sensitive Data Classification, Hybrid Cloud, Rule-Based Detection, Data Masking, Privacy Engineering, Enterprise Governance, Metadata-Driven Security.