Machine Learning for Cybersecurity (MLC)

Keynote Speakers

Network-level Attacks in Federated Learning

Cristina Nita-Rotaru, Northeastern University

Federated learning is a popular strategy for training models on distributed, sensitive data, while preserving data privacy. In both centralized and peer-to-peer architectures communication between participants (clients and server or peers) plays a critical role for the learning task performance. We highlight how communication introduces another vulnerability surface in federated learning and study the impact of network-level adversaries on training federated learning models. In the first part of the talk we focus on centralized architectures and show that attackers dropping the network traffic from carefully selected clients can significantly decrease model accuracy on a target population. We then show the effectiveness of our server-side defense which mitigates the impact of our attacks by identifying and up-sampling clients likely to positively contribute towards target accuracy. In the second part of the talk we focus on peer-to-peer federated learning. We propose new backdoor attacks that leverage structural communication graph properties to select the malicious nodes, and achieve high attack success, while remaining stealthy. We evaluate our attacks under various realistic conditions, including multiple graph topologies, limited adversarial visibility of the network, and clients with non-IID data. Finally, we show the limitations of existing defenses adapted from centralized federated learning and design a new defense that successfully mitigates the backdoor attacks, without an impact on model accuracy.

Cristina Nita-Rotaru is a Professor of Computer Science at Northeastern University. She has served on the Technical Program Committee of numerous conferences in security, networking and distributed systems (IEEE S&P, USENIX Security, ACM CCS, NDSS, ACM Wisec, IEEE ICDCS, IEEE/IFIP DSN, ACM SIGCOMM, ACM CoNEXT, IEEE INFOCOM, IEEE ICNP, WWW, Eurosys. She is the Chair of the Steering Committee of ISOC NDSS and the Vice-Chair of the IEEE Technical Community on Dependable Computing and Fault Tolerance (TCFT).

Reshaping Cybersecurity Education: Navigating the AI-Driven Landscape

Giti Javidi, University of South Florida

This talk offers a comprehensive exploration of the ever-evolving landscape of cybersecurity education, acknowledging its historical journey while focusing on the profound influence of AI within both the cybersecurity field and the educational domain. In this talk, we will navigate the progression from past practices to modern paradigms and look toward a future of cybersecurity education shaped by AI. The transformative role of AI in reshaping the cybersecurity domain and, consequently, the educational landscape is a central theme in this talk. Recognizing the convergence of AI and cybersecurity, this presentation underscores the critical necessity for flexible and forward-looking education that equips students with the skills required to meet the continually evolving cybersecurity challenges.

Giti Javidi is a professor of Cybersecurity at the School of Information Systems and Management at University of South Florida. As a long-time advocate for diversity and inclusion in the cybersecurity workforce, she has spearheaded a number of national and international projects in this domain throughout her academic career. She directs the Applied Research Collaborative (ARC) lab on the Sarasota-Manatee campus where crossdisciplinarity faculty, students and industry partners come together to carry out applied cybersecurity research and training.

Path-Sensitive Code Embedding for Software Vulnerability Detection

Yulei Sui, University of New South Wales

Machine learning and its promising branch deep learning have shown success in a wide range of application domains. Recently, much effort has been expended on applying deep learning techniques (e.g., graph neural networks) to static vulnerability detection as an alternative to conventional bug detection methods. To obtain the structural information of code, current learning approaches typically abstract a program in the form of graphs (e.g., data-flow graphs, abstract syntax trees), and then train an underlying classification model based on the (sub)graphs of safe and vulnerable code fragments for vulnerability prediction. However, these models are still insufficient for precise bug detection, because the objective of these models is to produce classification results rather than comprehending the semantics of vulnerabilities, e.g., pinpoint bug triggering paths, which are essential for static bug detection. In this talk, I will present a selective yet precise contrastive value-flow embedding approach to statically detect software vulnerabilities. The novelty of ContraFlow lies in selecting and preserving feasible value-flow (aka program dependence) paths through a pretrained path embedding model using self-supervised contrastive learning, thus significantly reducing the amount of labeled data required for training expensive downstream models for path-based vulnerability detection.

Yulei Sui is a Scientia Associate Professor at School of Computer Science and Engineering, University of New South Wales (UNSW). He is broadly interested in Program Analysis, Secure Software Engineering and Machine Learning. In particular, his research focuses on building open-source frameworks for static analysis and verification techniques to improve the reliability and security of modern software systems. His recent interest lies at the intersection of programming languages, natural languages and machine learning. Specifically, his current research projects include secure machine learning, software analysis and verification for bug detection through data mining and deep learning.

Visually explaining and debugging the training process of deep learning models

LIN Yun, Shanghai Jiao Tong University

Given that the deep learning models are widely used in all walks of life, an increasing number of developers will train their own deep learning models. Thus, it is important for human developers to understand *how the prediction can be formed*. However, it is a challenging task to understand how the model weights are updated in each training epoch. In this work, we tackle this challenge by visualizing the representation space of a partially learned deep learning model. Specifically, we transform the high-dimensional representation space of a model into a visible low-dimensional canvas where classification landscape can be manifested. Further, we convert the model-training process into an animation where the movement of dots represents changes of sample representations. In addition, we further support interactive recommendation for the developers to locate their interested training events.

LIN Yun is an Associate Professor in Department of Computer Science and Engineering, Shanghai Jiao Tong University. Before joining SJTU, he was a Research Assistant Professor working with Prof. Dong Jin Song in National University of Singapore. His expertise ranges from AI/software debugging and analysis (ICSE'17, ASE'18, TSE'19, AAAI'22, IJCAI'22, ISSTA'22, NeurIPS'22, ICSE'23), software testing (ESEC/FSE'21, ISSTA'20, ICSE'20, ICSE'18), malicious webpage analysis (USENIX Security'21, USENIX Security'22, USENIX Security'23), code recommendation and analyis (ICSE'14, ICSME'14, FSE'15, FSE'16, ASE'17, EMNLP'22).

Effective Malware Detection and Classification Methods

Dima Rabadi, Penn State Shenango

Nowadays, and particularly after the COVID-19 pandemic, cybersecurity has become an indispensable commitment for everyone," Rabadi said. "Although most organizations are performing outstanding achievements in detecting cyber-attacks, individuals need to be more educated on strengthening their human firewalls, which are more targeted nowadays than regular computer firewalls. It is hectic, but a must!" The malware landscape has exhibited persistent evolution throughout its four-decade existence, rendering detection a progressively challenging endeavor. A triad of developments underscores the perpetuation of this challenge: the continuous emergence of novel malware types and families, the propagation of new variants stemming from existing malware, and the innovation of unknown propagation vectors across the internet, encompassing avenues such as app notifications, email attachments, USB drivers, and surreptitious software installations. The present era, compounded by the Covid-19 pandemic, has witnessed a discernible surge in cyberattacks, characterized by exploiting emotions such as fear and confusion to entice potential victims into engaging with malicious links or attachments. As elucidated by the state of malware report 2021 by Malwarebyte, the pandemic and the widespread adoption of remote work have endowed cybercriminals with heightened latitude to actualize their evil objectives.

Dima Rabadi is an Assistant Professor of Cybersecurity at Penn State Shenango, USA.her research expertise is highlighted by her engagement in diverse R&D malware-related projects, forging collaborations with prominent public and private entities. Dima's research converges at the crossroads of Cybersecurity and digital forensics, encompassing investigations into adversarial machine learning for malware detection and developing AI-based antivirus tools. She adeptly navigates the complexities of advanced persistent threat detection within enterprise security.

Call For Paper

In the past decades, cybersecurity threats have been among the most significant challenges for social development resulting in financial loss, violation of privacy, damages to infrastructures, etc. Organizations, governments, and cyber practitioners tend to leverage state-of-the-art Artificial Intelligence technologies to analyze, prevent, and protect their data and services against cyber threats and attacks. Due to the complexity and heterogeneity of security systems, cybersecurity researchers and practitioners have shown increasing interest in applying data mining methods to mitigate cyber risks in many security areas, such as malware detection and essential player identification in an underground forum. To protect the cyber world, we need more effective and efficient algorithms and tools capable of automatically and intelligently analyzing and classifying the massive amount of data in cybersecurity complex scenarios. This workshop will focus on empirical findings, methodological papers, and theoretical and conceptual insights related to data mining in the field of cybersecurity.

The workshop aims to bring together researchers from cybersecurity, data mining, and machine learning domains. We encourage a lively exchange of ideas and perceptions through the workshop, focused on cybersecurity and data mining. Topics of interest include, but are not limited to:

We are interested in the new applications of data mining and AI for cybersecurity. Submitted papers will be evaluated based on criteria such as technical originality, creativity, and applicability. Methodological topics of interest include, but are not limited to: Application areas of interest include, but are not limited to:

Important Dates

Paper Submission

All accepted workshop papers will be published in a formal IEEE proceedings, in the IEEE Computer Society Digital Library (CSDL) and the IEEE Xplore, and indexed by EI. Paper submissions should be limited to max 8 pages plus 2 extra pages (for references, appendix, etc.) and follow the IEEE ICDM format. More detailed information is available in the IEEE ICDM 2022 Submission Guidelines . All submissions will be triple-blind reviewed by the Program Committee based on technical quality, relevance to scope of the conference, originality, significance, and clarity. The following sections give further information for authors. Please submit your papers via the submission link .

Organizers

Steering Chairs

Program Chairs

Publicity Chairs

Program Committee