Document Type : Research Article
Authors
1 Information Systems and Security Lab. (ISSL), Sharif University of Tech., Tehran, Iran
2 Information Systems and Security Lab. (ISSL), Sharif University of Tech., Tehran, Iran
Abstract
To enhance the accuracy of learning models, it becomes imperative to train them on more extensive datasets. Unfortunately, access to such data is often restricted because data providers are hesitant to share their data due to privacy concerns. Hence, it is critical to develop obfuscation techniques that empower data providers to transform their datasets into new ones that ensure the desired level of privacy. In this paper, we present an approach where data providers utilize a neural network based on the autoencoder architecture to safeguard the sensitive components of their data while preserving the utility of the remaining parts. More specifically, within the autoencoder framework and after the encoding process, a classifier is used to extract the private feature from the dataset. This feature is then decorrelated from the other remaining features and subsequently subjected to noise. The proposed method is flexible, allowing data providers to adjust their desired level of privacy by changing the noise level. Additionally, our approach demonstrates superior performance in achieving the desired trade-off between utility and privacy compared to similar methods, all while maintaining a simpler structure.
Keywords
[2] Hangyu Zhu, Rui Wang, Yaochu Jin, Kaitai Liang, and Jianting Ning. Distributed additive encryption and quantization for privacy preserv-ing federated deep learning. Neurocomputing, 463:309–327, 2021.
[3] Kang Wei, Jun Li, Ming Ding, Chuan Ma, Howard H Yang, Farhad Farokhi, Shi Jin, Tony QS Quek, and H Vincent Poor. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security, 15:3454–3469, 2020.
[4] Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE symposium on security and privacy
(SP), pages 739–753. IEEE, 2019.
[5] Minghong Fang, Xiaoyu Cao, Jinyuan Jia, and Neil Gong. Local model poisoning attacks to {Byzantine-Robust} federated learning. In 29th USENIX security symposium (USENIX Security 20), pages 1605–1622, 2020.
[6] Vale Tolpegin, Stacey Truex, Mehmet Emre Gursoy, and Ling Liu. Data poisoning attacks against federated learning systems. In Computer Security–ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I 25, pages 480–501. Springer, 2020.
[7] Mojtaba Shirinjani, Siavash Ahmadi, Taraneh Eghlidos, and Mohammad R Aref. Private federated learning: An adversarial sanitizing perspective. ISeCure, 15(3), 2023.
[8] Latanya Sweeney. k-anonymity: A model for protecting privacy. International journal of uncertainty, fuzziness and knowledge-based systems, 10(05):557–570, 2002.
[9] Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasub-ramaniam. ℓ-diversity: Privacy beyond kanonymity. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):3–es, 2007.
[10] Ninghui Li, Tiancheng Li, and Suresh Venkata-subramanian. t-closeness: Privacy beyond kanonymity and ℓ-diversity. In 2007 IEEE 23rd international conference on data engineering, pages 106–115. IEEE, 2006.
[11] Ehsan Hesamifard, Hassan Takabi, and Mehdi Ghasemi. Cryptodl: Deep neural networks over encrypted data. arXiv preprint arXiv:1711.05189, 2017.
[12] Fatih Emekçi, Ozgur D Sahin, Divyakant Agrawal, and Amr El Abbadi. Privacy preserving decision tree learning over multiple parties. Data & Knowledge Engineering, 63(2):348–361, 2007.
[13] Payman Mohassel and Yupeng Zhang. Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE symposium on security and privacy (SP), pages 19–38. IEEE, 2017.
[14] Anh-Tu Tran, The-Dung Luong, Jessada Karnjana, and Van-Nam Huynh. An efficient approach for privacy preserving decentralized deep learning models based on secure multi-party computation. Neurocomputing, 422:245–262, 2021.
[15] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
[16] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
[17] Nicolas Papernot, Martín Abadi, Ulfar Erlingsson, Ian Goodfellow, and Kunal Talwar. Semi-supervised knowledge transfer for deep learning from private training data. arXiv preprint arXiv:1610.05755, 2016.
[18] Chong Huang, Peter Kairouz, Xiao Chen, Lalitha Sankar, and Ram Rajagopal. Context-aware generative adversarial privacy. Entropy, 19(12):656, 2017.
[19] Peter Kairouz, Jiachun Liao, Chong Huang, Maunil Vyas, Monica Welfert, and Lalitha Sankar. Generating fair universal representations using adversarial models. IEEE Transactions on Information Forensics and Security, 17:1970–1985, 2022.
[20] Ang Li, Yixiao Duan, Huanrui Yang, Yiran Chen, and Jianlei Yang. Tiprdc: task-independent privacy-respecting data crowdsourcing framework for deep learning with anonymized intermediate representations. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 824–832, 2020.
[21] Abhishek Singh, Ethan Garza, Ayush Chopra, Praneeth Vepakomma, Vivek Sharma, and Ramesh Raskar. Decouple-and-sample: Protecting sensitive information in task agnostic data release. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIII, pages 499–517. Springer, 2022.
[22] Seyed Ali Osia, Ali Taheri, Ali Shahin Shamsabadi, Kleomenis Katevas, Hamed Haddadi, and Hamid R Rabiee. Deep private-feature extraction. IEEE Transactions on Knowledge and Data Engineering, 32(1):54–66, 2018.
[23] Seyed Ali Osia, Ali Shahin Shamsabadi, Sina Sajadmanesh, Ali Taheri, Kleomenis Katevas, Hamid R Rabiee, Nicholas D Lane, and Hamed Haddadi. A hybrid deep learning architecture for privacy-preserving mobile analytics. IEEE Internet of Things Journal, 7(5):4505–4518, 2020.
[24] Hung Nguyen, Di Zhuang, Pei-Yuan Wu, and Morris Chang. Autogan-based dimension reduction for privacy preservation. Neurocomputing, 384:94–103, 2020.
[25] Bishwas Mandal, George Amariucai, and Shuangqing Wei. Uncertainty-autoencoder-based privacy and utility preserving data type conscious transformation. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2022.
[26] Mohammad Ali Jamshidi, Hadi Veisi, Mohammad Mahdi Mojahedian, and Mohammad Reza Aref. Adjustable privacy using autoencoder-based learning structure. arXiv preprint arXiv:2304.03538, 2023.
[27] Florian Tramer and Dan Boneh. Differentially private learning needs better features (or much more data). arXiv preprint arXiv:2011.11660, 2020.
[28] Ian Goodfellow. Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2016.
[29] Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt Kira. On convergence and stability of gans. arXiv preprint arXiv:1705.07215, 2017.
[30] Samuel A Barnett. Convergence problems with generative adversarial networks (gans). arXiv preprint arXiv:1806.11382, 2018.
[31] Nisarg Raval, Ashwin Machanavajjhala, and Landon P Cox. Protecting visual secrets using adversarial nets. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1329–1332. IEEE, 2017.
[32] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pages 3730–3738, 2015.
[33] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[34] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
[35] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.