Machine Learning for Cybersecurity (MLC)

Keynote Speakers

“Data Wrangling” for Building ML Based Software Security Solutions: Lessons and Recommendations

Ali Babar, University of Adelaide

A large part of our research purports to develop and leverage knowledge and tools for improving software security using AI/ML based approaches to software security. It is well known that AI/ML based approaches are heavily reliant on the quality of data – “garbage in, garbage out”. Hence, “Data Wrangling” serves as an important, but expensive, phase of using AI/ML for software security. Like any AI/ML based effort, our R&D efforts for leveraging AI/ML for software security have encountered several significant challenges of “Data Wrangling”. Our pursuit of finding/devising reliable solutions to data quality challenges has taught us that the expensiveness and error-proneness of “Data Wrangling” activities can be a barrier to widespread industrial adoption of AI/ML based approaches to software security. We believe that it is important to engage the relevant stakeholders for developing and sharing knowledge and technologies aimed at improving software security data quality. To this end, we are not only systematically identifying and synthesizing the existing empirical literature on improving data quality, but also devising innovative solutions for easing the problems we experienced/observed during “Data Wrangling”. This talk will draw lessons and recommendations from our efforts of systematically reviewing the state-of-the-art and developing solutions for improving data quality while building and using AI/ML based software security solutions such as SVP models.

Evaluating Machine Learning Approaches on their Utility in Security Operations

Xinming (Simon) Ou, University of South Florida

Machine learning is a valuable tool in security operations and can be used in triaging intrusion alerts and malware analysis. The evolving nature of security threats calls for careful considerations when evaluating ML algorithms' effectiveness for their application in security. Typical evaluation methods commonly used in machine learning literature, such as k-fold cross validation, may not always be appropriate to evaluate ML for their utility on security operations. The class imbalance between malicious and benign data, and challenge in accurately labeling samples, must also be taken into account so the result will not be overly optimistic on how much the ML approaches can really help in practice. In this talk I will explain the important factors that need to be incorporated when designing experiments to evaluate ML approaches for security, using a number of examples. I will then focus on our recent work on evaluating deep learning's advantages (or not) in the problem of Android malware detection, where these important factors are taken into account.

Privacy Preservation in Collaborative learning

Guangdong Bai, University of Queensland

Collaborative learning enables two or more participants, each with their own training dataset, to collaboratively learn a joint model. It is desirable that the collaboration should not cause the disclosure of either the raw datasets of each individual owner or the local model parameters trained on them. Existing approaches based on differential privacy mechanisms or homomorphic encryption may introduce loss of model accuracy or imply significant computational overhead. In this talk, I will introduce our efforts in the privacy preservation in collaborative learning. We address this problem through the lightweight additive secret sharing technique and colony-based gradient descent. We aim to protect local data and local models while ensuring the correctness of training processes.

AI for Fuzzing: A Tale of Two Techniques

Yuekang Li, Nanyang Technological University

Fuzzing is a software testing technique for detecting vulnerabilities. It is one of the most practical security analysis techniques for detecting 0-day vulnerabilities as widely adopted by researchers and practitioners. In the meantime, AI techniques are widely used in various security analyses to improve effectiveness and efficiency. Both AI techniques and fuzzing techniques benefit from the manipulation of a big amount of data and combining them together can unleash a great potential of unveiling deeply hidden vulnerabilities. This talk will address three topics of combing AI and Fuzzing: 1) the common pitfalls of applying AI techniques for fuzzing, including the cases where AI techniques may not work well and the limitations of the AI-based approaches; 2) how to use various AI techniques to aid fuzzing from different aspects; 3) future research directions in this interdisciplinary field.

Call For Paper

In the past decades, cybersecurity threats have been among the most significant challenges for social development resulting in financial loss, violation of privacy, damages to infrastructures, etc. Organizations, governments, and cyber practitioners tend to leverage state-of-the-art Artificial Intelligence technologies to analyze, prevent, and protect their data and services against cyber threats and attacks. Due to the complexity and heterogeneity of security systems, cybersecurity researchers and practitioners have shown increasing interest in applying data mining methods to mitigate cyber risks in many security areas, such as malware detection and essential player identification in an underground forum. To protect the cyber world, we need more effective and efficient algorithms and tools capable of automatically and intelligently analyzing and classifying the massive amount of data in cybersecurity complex scenarios. This workshop will focus on empirical findings, methodological papers, and theoretical and conceptual insights related to data mining in the field of cybersecurity.

The workshop aims to bring together researchers from cybersecurity, data mining, and machine learning domains. We encourage a lively exchange of ideas and perceptions through the workshop, focused on cybersecurity and data mining. Topics of interest include, but are not limited to:

We are interested in the new applications of data mining and AI for cybersecurity. Submitted papers will be evaluated based on criteria such as technical originality, creativity, and applicability. Methodological topics of interest include, but are not limited to: Application areas of interest include, but are not limited to:

Important Dates

Paper Submission

All accepted workshop papers will be published in a formal IEEE proceedings, in the IEEE Computer Society Digital Library (CSDL) and the IEEE Xplore, and indexed by EI. Paper submissions should be limited to max 8 pages plus 2 extra pages (for references, appendix, etc.) and follow the IEEE ICDM format. More detailed information is available in the IEEE ICDM 2022 Submission Guidelines . All submissions will be triple-blind reviewed by the Program Committee based on technical quality, relevance to scope of the conference, originality, significance, and clarity. The following sections give further information for authors. Please submit your papers via the submission link .

Organizers

Steering Chairs (alphabetical order)

Program Chairs

Publicity Chairs

Program Committee