Document Type : Research Article

Authors

1 Computer Engineering Department, Kır¸sehir Ahi Evran University, Kır¸sehir, Turkey

2 Computer Engineering Department, S¨uleyman Demirel University, Isparta, Turkey

Abstract

The widespread use of web applications and running on sensitive data has made them one of the most significant targets of cyber attackers. One of the most crucial security measures that can be taken is the detection and closure of vulnerabilities on web applications before attackers. In this study, a web application vulnerability scanner was developed based on dynamic analysis and artificial intelligence, which could test web applications using GET and POST methods and had test classes for 21 different vulnerability types. The developed vulnerability scanner was tested on a web application test laboratory, which was created within the scope of this study and had 262 different web applications. A data set was created from the results of the tests performed using the developed vulnerability scanner. In this study, as a first stage, web page classification was made using the mentioned data set. The highest success rate in the page classification process was determined by 95.39% using the Random Forest Algorithm. The second operation performed using the dataset was the association analysis between vulnerabilities. The proposed model saved the 21% time than the standard scanning model. The page classification process was also used in the crawling of the web application in this study.

Keywords

[1] Verizon Enterprise data breach investigations report 2022. https://www.verizon.com/business/resources/reports/dbir/. Accessed: 2023-04-15.
[2] ITRC identity theft resource center breach report hits record high in 2021. https://www.idtheftcenter.org/publication/2022-data-breach-report/. Accessed: 2023-04-16.
[3] G Deepa and P Santhi Thilagam. Securing web applications from injection and logic vulnerabilities: Approaches and challenges. Information and Software Technology, 74:160–180, 2016.
[4] Stefan Kals, Engin Kirda, Christopher Kruegel, and Nenad Jovanovic. Secubat: a web vulnerability scanner. In Proceedings of the 15th international conference on World Wide Web, pages 247–256, 2006.
[5] Yuji Kosuga, Kenji Kono, Miyuki Hanaoka, Miho Hishiyama, and Yu Takahama. Sania: Syntactic and semantic analysis for automated testing against sql injection. In Twenty-Third Annual Computer Security Applications Conference (AC-SAC 2007), pages 107–117. IEEE, 2007.
[6] William GJ Halfond, Shauvik Roy Choudhary, and Alessandro Orso. Penetration testing with improved input vector identification. In 2009 International Conference on Software Testing Verification and Validation, pages 346–355. IEEE, 2009.
[7] Marco Vieira, Nuno Antunes, and Henrique Madeira. Using web security scanners to detect vulnerabilities in web services. In 2009 IEEE/IFIP International Conference on Dependable Systems & Networks, pages 566–571. IEEE, 2009.
[8] Jan-Min Chen and Chia-Lun Wu. An automated vulnerability scanner for injection attack based on injection point. In 2010 International Computer Symposium (ICS2010), pages 113–118. IEEE, 2010.
[9] Eduardo Gal´an, Almudena Alcaide, Agust´ın Orfila, and Jorge Blasco. A multi-agent scanner to detect stored-xss vulnerabilities. In 2010 International Conference for Internet Technology and Secured Transactions, pages 1–6. IEEE, 2010.
[10] Abdul Bashah Mat Ali, Mohd Syazwan Abdullah, Jasem Alostad, et al. Sql-injection vulnerability scanning tool for automatic creation of sqlinjection attacks. Procedia Computer Science, 3:453–458, 2011.
[11] Avinash Kumar Singh and Sangita Roy. A network based vulnerability scanner for detecting sqli attacks in web applications. In 2012 1st international conference on recent advances in information technology (RAIT), pages 585–590.
IEEE, 2012.
[12] Zoran Djuric. A black-box testing tool for detecting sql injection vulnerabilities. In 2013 Second international conference on informatics & applications (ICIA), pages 216–221. IEEE, 2013.
[13] Soyoung Lee, Seongil Wi, and Sooel Son. Link: Black-box detection of cross-site scripting vulnerabilities using reinforcement learning. In Proceedings of the ACM Web Conference 2022, pages 743–754, 2022.
[14] TIAN Xiaopeng and TANG Di. A distributed vulnerability scanning on machine learning. In 2019 6th International Conference on Information Science and Control Engineering (ICISCE), pages 32–35. IEEE, 2019.
[15] Patrick Dave P Woogue, Gabriel Andrew A Pineda, and Christian V Maderazo. Automatic web page categorization using machine learning and educational-based corpus. Int. J. Comput.Theory Eng, 9(6):427–432, 2017.
[16] Luca Deri, Maurizio Martinelli, Daniele Sartiano, and Loredana Sideri. Large scale web-content classification. In 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management(IC3K), volume 1, pages 545–554. IEEE, 2015.
[17] WA Awad. Machine learning algorithms in web page classification. International Journal of Computer Science & Information Technology(IJCSIT), 4(5):93–101, 2012.
[18] Myungsook Klassen. A frame work for search forms classification. In 2012 IEEE International Conference on Systems, Man, and Cybernetics(SMC), pages 1029–1034. IEEE, 2012.
[19] Yanbo Ru and Ellis Horowitz. Automated classification of html forms on e-commerce web sites. Online Information Review, 31(4):451–466, 2007.
[20] Yılmaz Vural. Enterprise information security and penetration testing, June 2007. Available at https://tez.yok.gov.tr/UlusalTezMerkezi/tezDetay.jsp?id=8MxC_qYwjgBUX1uTtgOjhg&no=ULsA81dv3hIIQ38dGrZdeA.
[21] Khairul Anwar Sedek, Norlis Osman, Mohd Nizam Osman, and Jusoff Hj Kamaruzaman. Developing a secure web application using owasp guidelines. Comput. Inf. Sci., 2(4):137–143, 2009.
[22] ¨Ozlem Akar and Oguz G¨ung¨or. Classification of multispectral images using random forest algorithm. Journal of Geodesy and Geoinformation, 1(2):105–112, 2012.
[23] Ebru Korkem. Random forest and na¨ıve bayes approach in microarray gene expressions data sets, June 2013.
[24] H¨ulya Yılmaz. Studying the missing data problem in random forestsmethod and an application in health field, June 2014.
[25] Chih-Hsuan Wang and Su-Hau Nien. Combining multiple correspondence analysis with association rule mining to conduct user-driven product design of wearable devices. Computer Standards & Interfaces, 45:37–44, 2016.
[26] Yen-Liang Chen, Jen-Ming Chen, and Ching Wen Tung. A data mining approach for retail knowledge discovery with consideration of the effect of shelf-space adjacency on sales. Decision support systems, 42(3):1503–1520, 2006.
[27] D Ay and ˙I C¸ il. The use of association rules in store layout planning at migros t¨urk a. ¸s.End¨ustri M¨uhendisli˘gi Dergisi, 21(2):14–29, 2008.
[28] Emre G¨ung¨or, Nesibe Yal¸cın, and Nil¨ufer Yurtay. Apriori algoritması ile teknik se¸cmeli ders se¸cim analizi. In Akademik Bili¸sim, 2013.
[29] Jason Bau, Elie Bursztein, Divij Gupta, and John Mitchell. State of the art: Automated black-box web application vulnerability testing. In 2010 IEEE symposium on security and privacy, pages 332–345. IEEE, 2010.
[30] T Andrew Yang, Kwok-Bun Yue, Morris Liaw, George Collins, Jayaraman T Venkatraman, Swati Achar, Karthik Sadasivam, and Ping Chen. Design of a distributed computer security lab. Journal of Computing Sciences in Colleges, 20(1):332–346, 2004.
[31] David Basin, Patrick Schaller, and Michael Schl¨apfer. Applied information security: a hands-on approach. Springer, 2011.
[32] Wenliang Du. Seed: hands-on lab exercises for computer security education. IEEE Security & Privacy, 9(5):70–73, 2011.
[33] Li-Chiou Chen and Lixin Tao. Teaching web security using portable virtual labs. In 2011 IEEE 11th International Conference on Advanced Learning Technologies, pages 491–495. IEEE, 2011.
[34] Wu Qianqian and Liu Xiangjun. Research and design on web application vulnerability scanning service. In 2014 IEEE 5th International conference on software engineering and service science, pages 671–674. IEEE, 2014.
[35] Debadri Basak, Dhruv Ramani, and Ghanshyam Singh. Enhancement of unsupervised object detection using supervised method. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICC-CNT), pages 1–9. IEEE, 2020.
[36] Ravi Kant Jain, Bikash Gupta, Mustaq Ansari, and Partha Pratim Ray. Iot enabled smart drip irrigation system using web/android applications. In 2020 11th international conference on computing, communication and networking technologies(ICCCNT), pages 1–6. IEEE, 2020.
[37] Lilan Hu, Jie Chang, Ze Chen, and Botao Hou. Web application vulnerability detection method based on machine learning. In Journal of Physics: Conference Series, volume 1827, page 012061. IOP Publishing, 2021.
[38] Jahanzeb Shahid, Muhammad Khurram Hameed, Ibrahim Tariq Javed, Kashif Naseer Qureshi, Moazam Ali, and Noel Crespi. A comparative study of web application security parameters: Current trends and future directions. Applied Sciences, 12(8):4077, 2022.
[39] Chanchala Joshi and Umesh Kumar Singh. Performance evaluation of web application security scanners for more effective defense. International Journal of Scientific and Research Publications(IJSRP), 6(6):660–667, 2016.
[40] Yao-Wen Huang, Shih-Kun Huang, Tsung-Po Lin, and Chung-Hung Tsai. Web application security assessment by fault injection and behavior monitoring. In Proceedings of the 12th international conference on World Wide Web, pages 148–159, 2003.
[41] Rishi Rabheru, Hazim Hanif, and Sergio Maffeis. A hybrid graph neural network approach for detecting php vulnerabilities. In 2022 IEEE Conference on Dependable and Secure Computing(DSC), pages 1–9. IEEE, 2022.
[42] Salman Sherin, Muhammad Zohaib Iqbal, Muhammad Uzair Khan, and Atif Aftab Jilani. Comparing coverage criteria for dynamic web application: An empirical evaluation. Computer Standards & Interfaces, 73:103467, 2021.
[43] Kevin W Hamlen and Bhavani Thuraisingham. Data security services, solutions and standards for outsourcing. Computer Standards & Interfaces, 35(1):1–5, 2013.