Anomaly Detection in Logs: A Comparative Analysis of Unsupervised Algorithms

Authors Alysson C. E. de Moura , Geraldo P. Rocha Filho , Marcos F. Caetano , João J. C. Gondim , Aleteia Araujo , Marcelo A. Marotta , Lucas Bondan



PDF
Thumbnail PDF

File

OASIcs.SLATE.2024.12.pdf
  • Filesize: 1 MB
  • 14 pages

Document Identifiers

Author Details

Alysson C. E. de Moura
  • Department of Computer Science, University of Brasília (UnB), Brazil
Geraldo P. Rocha Filho
  • Department of Computer Science, State University of Southwest Bahia (UESB), Vitória da Conquista, Brazil
Marcos F. Caetano
  • Department of Computer Science, University of Brasília (UnB), Brazil
João J. C. Gondim
  • Department of Computer Science, University of Brasília (UnB), Brazil
Aleteia Araujo
  • Department of Computer Science, University of Brasília (UnB), Brazil
Marcelo A. Marotta
  • Department of Computer Science, University of Brasília (UnB), Brazil
Lucas Bondan
  • Rede Nacional de Ensino e Pesquisa (RNP), Brasília, Brazil
  • Department of Computer Science, University of Brasília (UnB), Brazil

Cite AsGet BibTex

Alysson C. E. de Moura, Geraldo P. Rocha Filho, Marcos F. Caetano, João J. C. Gondim, Aleteia Araujo, Marcelo A. Marotta, and Lucas Bondan. Anomaly Detection in Logs: A Comparative Analysis of Unsupervised Algorithms. In 13th Symposium on Languages, Applications and Technologies (SLATE 2024). Open Access Series in Informatics (OASIcs), Volume 120, pp. 12:1-12:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/OASIcs.SLATE.2024.12

Abstract

This study explores anomaly detection through unsupervised Machine Learning applied to banking systems' log records. The diversity in formatting and types of logs poses significant challenges for automating anomaly detection. We propose a workflow using Natural Language Processing (NLP) techniques for anomaly identification, which in further analysis can lead to identifying root causes of failures and vulnerabilities. We evaluate the performance of eight different models using Blue Gene/L log records. The most effective models were selected and subsequently validated with Microsoft Configuration Manager (MCM) logs collected from a financial institution, demonstrating their practical applicability in real-world scenarios. Experimental results highlighted the effectiveness of neural network models, specifically Self-Organizing Maps (SOM) and Autoencoders (AE), with F1-Scores of 0.86 and 0.80, respectively, when applied to MCM logs collected from the financial institution.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Artificial intelligence
  • Computing methodologies → Natural language processing
  • Computing methodologies → Machine learning
Keywords
  • Anomaly Detection
  • Log Analysis
  • Natural Language Processing
  • Unsupervised Learning
  • Word Embeddings

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Mohiuddin Ahmed, Abdun Naser Mahmood, and Md Rafiqul Islam. A Survey of Anomaly Detection Techniques in Financial Domain. Future Generation Computer Systems, 55:278-288, February 2016. URL: https://doi.org/10.1016/J.FUTURE.2015.01.001.
  2. Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. LOF: Identifying Density-Based Local Outliers. ACM Special Interest Group on Management of Data, 29:93-104, 2000. URL: https://doi.org/10.1145/335191.335388.
  3. Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly Detection: A Survey. ACM Computing Surveys , 41(3):1-58, 2009. URL: https://doi.org/10.1145/1541880.1541882.
  4. Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. ACM Special Interest Group on Security, Audit and Control, pages 1285-1298, 2017. URL: https://doi.org/10.1145/3133956.3134015.
  5. Markus Goldstein and Andreas Dengel. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. German Conference on Artificial Intelligence: Poster and Demo Track, 1:59-63, 2012. Google Scholar
  6. Pinjia He, Jieming Zhu, Shilin He, Jian Li, and Michael R Lyu. An evaluation study on log parsing and its use in log mining. IEEE/IFIP international conference on dependable systems and networks, pages 654-661, 2016. URL: https://doi.org/10.1109/DSN.2016.66.
  7. Zengyou He, Xiaofei Xu, and Shengchun Deng. Discovering Cluster-based Local Outliers. Pattern Recognition Letters, 24(9-10):1641-1650, 2003. URL: https://doi.org/10.1016/S0167-8655(03)00003-5.
  8. Shaohan Huang, Carol Fung, Kui Wang, Polo Pei, Zhongzhi Luan, and Depei Qian. Using Recurrent Neural Networks Toward Black-Box System Anomaly Prediction. IEEE/ACM 24th International Symposium on Quality of Service, pages 1-10, 2016. URL: https://doi.org/10.1109/IWQOS.2016.7590435.
  9. Shaohan Huang, Yi Liu, Carol Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan. Hitanomaly: Hierarchical Transformers for Anomaly Detection in System Log. IEEE Transactions on Network and Service Management, 17(4):2064-2076, 2020. URL: https://doi.org/10.1109/TNSM.2020.3034647.
  10. Mia Hubert and Michiel Debruyne. Minimum covariance determinant. Wiley interdisciplinary reviews: Computational statistics, 2(1):36-43, 2010. Google Scholar
  11. Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. Fasttext.zip: Compressing Text Classification Models. arXiv preprint arXiv:1612.03651, 2016. Google Scholar
  12. Teuvo Kohonen. The Self-organizing Map. Proceedings of the IEEE, 78(9):1464-1480, 1990. URL: https://doi.org/10.1109/5.58325.
  13. Yukyung Lee, Jina Kim, and Pilsung Kang. Lanobert: System Log Anomaly Detection Based on BERT Masked Language Model. Applied Soft Computing, 146:1-14, 2023. Google Scholar
  14. Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation Forest. IEEE International Conference on Data Mining, pages 413-422, 2008. URL: https://doi.org/10.1109/ICDM.2008.17.
  15. Logpai. Drain: Online log parsing. https://github.com/logpai/logparser/blob/master/logparser/Drain/README.md, 2023. Accessed: 2023-02-13.
  16. Siyang Lu, Xiang Wei, Yandong Li, and Liqiang Wang. Detecting anomaly in big data system logs using convolutional neural network. 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pages 151-158, 2018. URL: https://doi.org/10.1109/DASC/PICOM/DATACOM/CYBERSCITEC.2018.00037.
  17. Microsoft. Microsoft System Center Configuration Manager, 2024. Accessed: 2023-07-21. URL: https://docs.microsoft.com/en-us/mem/configmgr/.
  18. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781, 2013. Google Scholar
  19. Ali Bou Nassif, Manar Abu Talib, Qassim Nasir, and Fatima Mohamad Dakalbab. Machine Learning for Anomaly Detection: A Systematic Review. IEEE Access, 9:78658-78700, 2021. URL: https://doi.org/10.1109/ACCESS.2021.3083060.
  20. Adam Oliner and Jon Stearley. What Supercomputers Say: A Study of Five System Logs. IEEE/IFIP International Conference on Dependable Systems and Networks, pages 575-584, 2007. URL: https://doi.org/10.1109/DSN.2007.103.
  21. Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. Conference on Empirical Methods in Natural Language Processing, pages 1532-1543, 2014. URL: https://doi.org/10.3115/V1/D14-1162.
  22. Douglas A Reynolds et al. Gaussian mixture models. Encyclopedia of biometrics, 741(659-663), 2009. Google Scholar
  23. David E Rumelhart, Geoffrey E Hinton, Ronald J Williams, et al. Learning Internal Representations by Error Propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, pages 318-362, 1985. Google Scholar
  24. Gerard Salton and Christopher Buckley. Term-weighting Approaches in Automatic Text Retrieval. Information Processing & Management, 24(5):513-523, 1988. URL: https://doi.org/10.1016/0306-4573(88)90021-0.
  25. Hudan Studiawan, Ferdous Sohel, and Christian Payne. Anomaly Detection in Operating System Logs with Deep Learning-based Sentiment Analysis. IEEE Transactions on Dependable and Secure Computing, 18(5):2136-2148, 2020. URL: https://doi.org/10.1109/TDSC.2020.3037903.
  26. Virginia Teller. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Computational Linguistics, 26:638-641, 2000. Google Scholar
  27. Song Wang, Juan Fernando Balarezo, Sithamparanathan Kandeepan, Akram Al-Hourani, Karina Gomez Chavez, and Benjamin Rubinstein. Machine Learning in Network Anomaly Detection: A Survey. IEEE Access, 9:152379-152396, 2021. URL: https://doi.org/10.1109/ACCESS.2021.3126834.
  28. Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, et al. Robust Log-based Anomaly Detection on Unstable Log Data. ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 807-817, 2019. Google Scholar
  29. Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, and Michael R. Lyu. Tools and Benchmarks for Automated Log Parsing. IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice, pages 121-130, 2019. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail