Anomaly Detection in Logs: A Comparative Analysis of Unsupervised Algorithms

de Moura, Alysson C. E.; Filho, Geraldo P. Rocha; Caetano, Marcos F.; Gondim, João J. C.; Araujo, Aleteia; Marotta, Marcelo A.; Bondan, Lucas

doi:10.4230/OASIcs.SLATE.2024.12

Abstract

This study explores anomaly detection through unsupervised Machine Learning applied to banking systems' log records. The diversity in formatting and types of logs poses significant challenges for automating anomaly detection. We propose a workflow using Natural Language Processing (NLP) techniques for anomaly identification, which in further analysis can lead to identifying root causes of failures and vulnerabilities. We evaluate the performance of eight different models using Blue Gene/L log records. The most effective models were selected and subsequently validated with Microsoft Configuration Manager (MCM) logs collected from a financial institution, demonstrating their practical applicability in real-world scenarios. Experimental results highlighted the effectiveness of neural network models, specifically Self-Organizing Maps (SOM) and Autoencoders (AE), with F1-Scores of 0.86 and 0.80, respectively, when applied to MCM logs collected from the financial institution.

Mohiuddin Ahmed, Abdun Naser Mahmood, and Md Rafiqul Islam. A Survey of Anomaly Detection Techniques in Financial Domain. Future Generation Computer Systems, 55:278-288, February 2016. URL: https://doi.org/10.1016/J.FUTURE.2015.01.001.
Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. LOF: Identifying Density-Based Local Outliers. ACM Special Interest Group on Management of Data, 29:93-104, 2000. URL: https://doi.org/10.1145/335191.335388.
Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly Detection: A Survey. ACM Computing Surveys , 41(3):1-58, 2009. URL: https://doi.org/10.1145/1541880.1541882.
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. ACM Special Interest Group on Security, Audit and Control, pages 1285-1298, 2017. URL: https://doi.org/10.1145/3133956.3134015.
Markus Goldstein and Andreas Dengel. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. German Conference on Artificial Intelligence: Poster and Demo Track, 1:59-63, 2012.
Pinjia He, Jieming Zhu, Shilin He, Jian Li, and Michael R Lyu. An evaluation study on log parsing and its use in log mining. IEEE/IFIP international conference on dependable systems and networks, pages 654-661, 2016. URL: https://doi.org/10.1109/DSN.2016.66.
Zengyou He, Xiaofei Xu, and Shengchun Deng. Discovering Cluster-based Local Outliers. Pattern Recognition Letters, 24(9-10):1641-1650, 2003. URL: https://doi.org/10.1016/S0167-8655(03)00003-5.
Shaohan Huang, Carol Fung, Kui Wang, Polo Pei, Zhongzhi Luan, and Depei Qian. Using Recurrent Neural Networks Toward Black-Box System Anomaly Prediction. IEEE/ACM 24th International Symposium on Quality of Service, pages 1-10, 2016. URL: https://doi.org/10.1109/IWQOS.2016.7590435.
Shaohan Huang, Yi Liu, Carol Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan. Hitanomaly: Hierarchical Transformers for Anomaly Detection in System Log. IEEE Transactions on Network and Service Management, 17(4):2064-2076, 2020. URL: https://doi.org/10.1109/TNSM.2020.3034647.
Mia Hubert and Michiel Debruyne. Minimum covariance determinant. Wiley interdisciplinary reviews: Computational statistics, 2(1):36-43, 2010.
Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. Fasttext.zip: Compressing Text Classification Models. arXiv preprint arXiv:1612.03651, 2016.
Teuvo Kohonen. The Self-organizing Map. Proceedings of the IEEE, 78(9):1464-1480, 1990. URL: https://doi.org/10.1109/5.58325.
Yukyung Lee, Jina Kim, and Pilsung Kang. Lanobert: System Log Anomaly Detection Based on BERT Masked Language Model. Applied Soft Computing, 146:1-14, 2023.
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation Forest. IEEE International Conference on Data Mining, pages 413-422, 2008. URL: https://doi.org/10.1109/ICDM.2008.17.
Logpai. Drain: Online log parsing. https://github.com/logpai/logparser/blob/master/logparser/Drain/README.md, 2023. Accessed: 2023-02-13.
Siyang Lu, Xiang Wei, Yandong Li, and Liqiang Wang. Detecting anomaly in big data system logs using convolutional neural network. 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pages 151-158, 2018. URL: https://doi.org/10.1109/DASC/PICOM/DATACOM/CYBERSCITEC.2018.00037.
Microsoft. Microsoft System Center Configuration Manager, 2024. Accessed: 2023-07-21. URL: https://docs.microsoft.com/en-us/mem/configmgr/.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781, 2013.
Ali Bou Nassif, Manar Abu Talib, Qassim Nasir, and Fatima Mohamad Dakalbab. Machine Learning for Anomaly Detection: A Systematic Review. IEEE Access, 9:78658-78700, 2021. URL: https://doi.org/10.1109/ACCESS.2021.3083060.
Adam Oliner and Jon Stearley. What Supercomputers Say: A Study of Five System Logs. IEEE/IFIP International Conference on Dependable Systems and Networks, pages 575-584, 2007. URL: https://doi.org/10.1109/DSN.2007.103.
Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. Conference on Empirical Methods in Natural Language Processing, pages 1532-1543, 2014. URL: https://doi.org/10.3115/V1/D14-1162.
Douglas A Reynolds et al. Gaussian mixture models. Encyclopedia of biometrics, 741(659-663), 2009.
David E Rumelhart, Geoffrey E Hinton, Ronald J Williams, et al. Learning Internal Representations by Error Propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, pages 318-362, 1985.
Gerard Salton and Christopher Buckley. Term-weighting Approaches in Automatic Text Retrieval. Information Processing & Management, 24(5):513-523, 1988. URL: https://doi.org/10.1016/0306-4573(88)90021-0.
Hudan Studiawan, Ferdous Sohel, and Christian Payne. Anomaly Detection in Operating System Logs with Deep Learning-based Sentiment Analysis. IEEE Transactions on Dependable and Secure Computing, 18(5):2136-2148, 2020. URL: https://doi.org/10.1109/TDSC.2020.3037903.
Virginia Teller. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Computational Linguistics, 26:638-641, 2000.
Song Wang, Juan Fernando Balarezo, Sithamparanathan Kandeepan, Akram Al-Hourani, Karina Gomez Chavez, and Benjamin Rubinstein. Machine Learning in Network Anomaly Detection: A Survey. IEEE Access, 9:152379-152396, 2021. URL: https://doi.org/10.1109/ACCESS.2021.3126834.
Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, et al. Robust Log-based Anomaly Detection on Unstable Log Data. ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 807-817, 2019.
Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, and Michael R. Lyu. Tools and Benchmarks for Automated Log Parsing. IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice, pages 121-130, 2019.

Anomaly Detection in Logs: A Comparative Analysis of Unsupervised Algorithms

Authors Alysson C. E. de Moura , Geraldo P. Rocha Filho , Marcos F. Caetano , João J. C. Gondim , Aleteia Araujo , Marcelo A. Marotta , Lucas Bondan

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message