Collaborative Data Anonymization Framework With Energy Industry
DOI:
https://doi.org/10.52825/ocp.v9i.3314Keywords:
Anonymization, Synthetic Data, Data Sharing, Energy Data ManagementAbstract
Data anonymization is essential for data privacy, enabling organizations, especially in the energy sector with IoT data, to comply with regulations while using sensitive information. Despite many existing anonymization methods and tools, it lacks clarity on integrating these tools into a complete process and ensuring effective collaboration with data providers. To overcome these gaps, this paper proposes a collaborative data anonymization framework that efficiently chains open-source tools to streamline the process and improve data provider involvement. The findings indicates while these tools can detect sensitive information and generate anonymized
data, there are still limitations in metadata detection.
Downloads
References
[1] Z. Pan, G. Gürses-Tran, C. Speck, P. Jaquart, M. Niebisch, and A. Monti, "Transparency and involvement of the energy-related industry in a data sharing platform", in Proceedings of the Conference on Research Data Infrastructure, vol. 1, 2023.
[2] C. Speck, S. Herrmann, Z. Pan et al., "Drivers and Challenges of Open Data from an Energy Industry Perspective", in NFDI4Energy Conference, Feb. 2024. DOI: 10.5281/zenodo.10658531.
[3] Z. Pan, Y. Gao, S. Foroogh et al., "Process for contributing and accessing FAIR data", Zenodo, Aug. 2025. DOI: 10.5281/zenodo.16735836. [Online]. Available: https://doi.org/10.5281/zenodo.16735836.
[4] A. Majeed, "Attribute-centric and synthetic data based privacy preserving methods: A systematic review", Journal of Cybersecurity and Privacy, vol. 3, no. 3, pp. 638–661, 2023.
[5] M. Giomi, F. Boenisch, C. Wehmeyer, and B. Tasnádi, "A unified framework for quantifying privacy risk in synthetic data", arXiv preprint arXiv:2211.10459, 2022.
[6] M. Goyal, and Q. H. Mahmoud, "A systematic review of synthetic data generation techniques using generative AI", Electronics, vol. 13, no. 17, p. 3509, 2024.
[7] G. Soltana, M. Sabetzadeh, and L. C. Briand, "Synthetic data generation for statistical testing", in 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2017, pp. 872–882.
[8] D. P. Kingma, and M. Welling, "Auto-encoding variational bayes", arXiv preprint arXiv:1312.6114, 2013.
[9] I. Goodfellow, J. Pouget-Abadie, M. Mirza et al., "Generative adversarial networks", Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
[10] Synthetic Data Metrics, DataCebo, Inc., DataCebo, Inc., Oct. 2023, Version 0.12.0. [Online]. Available: https://docs.sdv.dev/sdmetrics/.
[11] L. Caruccio, D. Desiato, G. Polese, G. Tortora, and N. Zannone, "A decision-support framework for data anonymization with application to machine learning processes", Information Sciences, vol. 613, pp. 1–32, 2022.
[12] B. Kaabachi, J. Despraz, T. Meurers et al., "A scoping review of privacy and utility metrics in medical synthetic data", NPJ digital medicine, vol. 8, no. 1, p. 60, 2025.
[13] Z. Qian, B. Cebere, and M. van der Schaar, "Synthcity: facilitating innovative use cases of synthetic data in different data modalities", arXiv preprint arXiv:2301.07573, 2023.
[14] Y. Gao, Z. Pan, S. Foroogh, and A. Monti, "Collaborative Data Anonymization Process", Zenodo, Aug. 2025. DOI: 10.5281/zenodo.16735882. [Online]. Available: https://doi.org/10.5281/zenodo.16735882.
[15] O. Mendels, C. Peled, N. Vaisman Levy et al., "Microsoft Presidio: Context aware, pluggable and customizable PII anonymization service for text and images", 2018. [Online]. Available: https://microsoft.github.io/presidio.
[16] SDV Developers, "Numerical Transformers", 2024. Accessed: 2025-11-25. [Online]. Available: https://docs.sdv.dev/rdt/transformers-glossary/numerical.
[17] SDMetrics Documentation, "KSComplement metric — SDMetrics", 2025. Accessed: 2026-02-27. [Online]. Available: https://docs.sdv.dev/sdmetrics/metrics/quality-metrics/kscomplement.
[18] SDMetrics Documentation, "TVComplement metric — SDMetrics", 2025. Accessed: 2026-02-27. [Online]. Available: https://docs.sdv.dev/sdmetrics/metrics/metrics-glossary/tvcomplement.
[19] SDMetrics Documentation, "CorrelationSimilarity metric — SDMetrics", 2025. Accessed: 2026-02-27. [Online]. Available: https://docs.sdv.dev/sdmetrics/metrics/quality-metrics/correlationsimilarity.
[20] SDMetrics Documentation, "ContingencySimilarity metric — SDMetrics", 2025. Accessed: 2026-02-27. [Online]. Available: https://docs.sdv.dev/sdmetrics/metrics/quality-metrics/contingencysimilarity.
Downloads
Published
How to Cite
Conference Proceedings Volume
Section
License
Copyright (c) 2026 Yuting Gao, Zhiyu Pan, Antonello Monti

This work is licensed under a Creative Commons Attribution 4.0 International License.
Funding data
-
Deutsche Forschungsgemeinschaft
Grant numbers 501865131