Software Intelligence

Recent Updates

Sep, 2025 Our ICSME'25 papers "Code Property Graph Meets Typestate: A Scalable Framework to Behavioral Bug Detection" won IEEE TCSE Distinguished Paper Award.
Jul, 2025 [AI] Our research paper "NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models" has been accepted by ACM Transactions on Software Engineering and Methodology (TOSEM).
Mar, 2024 Our papers "API Misuse Detection via Probabilistic Graphical Model" and "Model-less Is The Best Model: Generating Pure Code Implementations to Replace On-device DL Models" have been accepted by ISSTA 2024 research paper track.
Mar, 2024 Our SANER'24 paper "Investigating and Detecting Silent Bugs in PyTorch Programs" won IEEE TCSE Distinguished Paper Award.
Feb, 2024 Our ICSE'24 paper "Modularizing while Training: A New Paradigm for Modularizing DNN Models" won ACM SIGSOFT Distinguished Paper Award.
Jan, 2024 our paper "ProveNFix: Temporal Property guided Program Repair" has been accepted by the FSE'24 research papers track.
Dec, 2023 our paper "Investigating White-Box Attacks for On-Device Models" has been accepted by the ICSE'24 research papers track.
Nov, 2023 our paper "Reusing Convolutional Neural Network Models through Modularization and Composition" has been accepted by TOSEM as research paper.
Aug, 2023 our paper "Automated Fixing of Web UI Tests via Iterative Element Matching" has been accepted by the ASE'23 research papers track.
Jun, 2023 our paper "Modularizing while Training: A New Paradigm for Modularizing DNN Models" has been accepted by the ICSE'24 research papers track.
May, 2023 our paper "ModelObfuscator: Obfuscating Model Information to Protect Deployed ML-based Systems" has been accepted by the ISSTA'23 technical papers track.

Research

Research Directions

Software Supply Chain Security via AI4SE

To improve software development efficiency and save cost, when developing software, many enterprises rely on third-party software, which constitutes the software supply chain (SSC). Due to the dependency relations among software, vulnerabilities in SSC may cause more serious security threats than independent software systems. This poses new challenges for ensuring software security. Our research mainly focuses on designing novel intelligent software engineering and program analysis techniques, including but not limited to software dependency analysis, software vulnerability detection, vulnerability repair and etc. The ultimate goal is to increase the security of software supply chain and software ecosystem.

Usability and Robustness of DNN via SE4AI

Due to poor interpretability, a large number of parameters and data requirements, and poor reliability, the AI model, as the core of intelligent software systems, suffers from poor reusability, high testing overhead, and high security risks in development, testing, and deployment. AI models are considered as "Software 2.0". In this direction, we target the above problems from the perspective of software engineering. we aim to apply software engineering techniques/notions to AI model engineering to improve the models' usability and robustness.

AI/Crowd Intelligence

Crowd intelligence aims at achieving powerful intelligence through aggregating diverse contributions from many heterogeneous individuals, which is a typical form of AI. As a matter of fact, the sucess of many AI tehcnologies highly relies on large-scale datasets that are often built through crowd intelligence. We study various perspectives of crowd intelligence including theories, techniques and applications. Research topics include, but are not limited to, crowdsourcing, federated learning, human-in-the-loop AI and open source.

Selected Research Papers

Backdoor Defense via Enhanced Splitting and Trap Isolation
Hongrui Yu, Lu Qi, Wanyu Lin, Jian Chen, Hailong Sun, Chengbin Sun.
International Conference on Computer Vision (ICCV), 2025.
CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging
Zongzhen Yang, Binhang Qi, Hailong Sun, Wenrui Long, Ruobin Zhao, Xiang Gao.
The 42nd International Conference on Machine Learning (ICML), 2025.
Code Property Graph Meets Typestate: A Scalable Framework to Behavioral Bug Detection
Xingjing Deng, Zhengyao Liu, Xitong Zhong, Shuo Hong, Yixin Yang, Xiang Gao, Xuhui Yan, Hailong Sun.
The 41st IEEE International Conference on Software Maintenance and Evolution (ICSME), 2025.

IEEE Computer Society TCSE Distinguished Paper Award

NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models
Xiaohan Bi, Binhang Qi, Hailong Sun, Xiang Gao, Yue Yu, Xiaojun Liang.
ACM Transactions on Software Engineering and Methodology (TOSEM), 2025.
Reference-Based Retrieval Augmentation for Unit Test Generation
Zhe Zhang, Xingyu Liu, Yuanzhang Lin, Xiang Gao, Hailong Sun, Yuan Yuan.
ACM Transactions on Software Engineering and Methodology (TOSEM), 2025.
API Misuse Detection via Probabilistic Graphical Model
Yunlong Ma, Wentong Tian, Xiang Gao, Hailong Sun, Li Li.
International Symposium on Software Testing and Analysis (ISSTA), 2024.
FedEvalFair: A Privacy-Preserving and Statistically Grounded Federated Fairness Evaluation Framework
Zhongchi Wang, Hailong Sun, Zhengyang Zhao.
The 32nd ACM International Conference on Multimedia (MM), 2024.
Investigating and Detecting Silent Bugs in PyTorch Programs
Shuo Hong, Hailong Sun, Xiang Gao, Shin Hwei Tan.
The 31st IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2024.

IEEE Computer Society TCSE Distinguished Paper Award

ModelFoundry: A Tool for DNN Modularization and On-demand Model Reuse
Xiaohan Bi, Ruobing Zhao, Binhang Qi, Hailong Sun, Xiang Gao, Yue Yu, Xiaojun Liang.
The ACM International Conference on the Foundations of Software Engineering (FSE) - Demonstration track, 2024.
ModelGalaxy: A Versatile Model Retrieval Platform
Wenling Zhang, Yixiao Li, Zhaotian Li, Hailong Sun, Xiang Gao, Xudong Liu.
The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) - Demonstration track, 2024.
Modularizing while Training: A New Paradigm for Modularizing DNN Models
Binhang Qi, Hailong Sun, Hongyu Zhang, Ruobing Zhao, Xiang Gao.
International Conference on Software Engineering (ICSE), 2024.

ACM SIGSOFT Distinguished Paper Award

Investigating and Detecting Silent Bugs in PyTorch Programs
Shuo Hong, Hailong Sun, Xiang Gao#, Shin Hwei Tan.
International Conference on Software Analysis, Evolution and Reengineering (SANER), 2024.

IEEE TCSE Distinguished Paper Award

Automated Repair of Programs from Large Language Models
Zhiyu Fan, Xiang Gao#, Martin Mirchev, Abhik Roychoudhury, Shin Hwei Tan.
International Conference on Software Engineering (ICSE) 2023, 2023.
RA3: Human-in-the-loop Framework for Interpreting and Improving Image Captioning with Relation-Aware Attribution Analysis
Lei Chai, Lu Qi, Hailong Sun, Jingzheng Li.
The 40th IEEE International Conference on Data Engineering (ICDE), 2024.
Reusing Convolutional Neural Network Models through Modularization and Composition
Binhang Qi, Hailong Sun, Hongyu Zhang, Xiang Gao.
ACM Transactions on Software Engineering and Methodology (TOSEM), 2024.
Target Structure Learning Framework for Unsupervised Multi-Class Domain Adaptation
Jingzheng Li, Hailong Sun, Lei Chai, Jiyi Li.
ACM Transactions on Multimedia Computing Communications and Applications (TOMM), 2024.
AutoMRM: A Model Retrieval Method Based on Multimodal Query and Meta-learning
Zhaotian Li, Binhang Qi, Hailong Sun, Xiang Gao.
The 32nd ACM International Conference on Information and Knowledge Management (CIKM), 2023.
Black-Box Data Poisoning Attacks on Crowdsourcing
Pengpeng Chen, Yongqiang Yang, Dingqi Yang, Hailong Sun, Zhijun Chen, Peng Lin.
The 32nd International Joint Conference on Artificial Intelligence (IJCAI), 2023.
Learning from Noisy Crowd Labels with Logics
Zhijun Chen, Hailong Sun, Haoqian He, Pengpeng Chen.
The 39th IEEE International Conference on Data Engineering (ICDE), 2023.
Reusing Deep Neural Network Models through Model Re-engineering
Binhang Qi, Hailong Sun, Xiang Gao, Hongyu Zhang, Zhaotian Li, Xudong Liu.
The 45th International Conference on Software Engineering (ICSE), 2023.
Template-based Neural Program Repair
Xiangxin Meng, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu, Chunming Hu.
The 45th International Conference on Software Engineering (ICSE), 2023.
A Collaboration-Aware Approach to Profiling Developer Expertise with Cross-Community Data
Xiaotao Song, Jiafei Yan, Yuexin Huang, Hailong Sun, Hongyu Zhang.
The 22nd IEEE International Conference on Software Quality, Reliability, and Security (QRS), 2022.
Improving Fault Localization and Program Repair with Deep Semantic Features and Transferred Knowledge
Xiangxin Meng, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu.
The 44th International Conference on Software Engineering (ICSE), 2022.
Retrieval-based Neural Source Code Summarization
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu.
The 42nd International Conference on Software Engineering (ICSE), 2020.

Recent Updates

Research

Research Directions

Software Supply Chain Security via AI4SE

Usability and Robustness of DNN via SE4AI

AI/Crowd Intelligence

Selected Research Papers

Backdoor Defense via Enhanced Splitting and Trap Isolation

CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging

Code Property Graph Meets Typestate: A Scalable Framework to Behavioral Bug Detection

NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models

Reference-Based Retrieval Augmentation for Unit Test Generation

API Misuse Detection via Probabilistic Graphical Model

FedEvalFair: A Privacy-Preserving and Statistically Grounded Federated Fairness Evaluation Framework

Investigating and Detecting Silent Bugs in PyTorch Programs

ModelFoundry: A Tool for DNN Modularization and On-demand Model Reuse

ModelGalaxy: A Versatile Model Retrieval Platform

Modularizing while Training: A New Paradigm for Modularizing DNN Models

Investigating and Detecting Silent Bugs in PyTorch Programs

Automated Repair of Programs from Large Language Models

RA3: Human-in-the-loop Framework for Interpreting and Improving Image Captioning with Relation-Aware Attribution Analysis

Reusing Convolutional Neural Network Models through Modularization and Composition

Target Structure Learning Framework for Unsupervised Multi-Class Domain Adaptation

AutoMRM: A Model Retrieval Method Based on Multimodal Query and Meta-learning

Black-Box Data Poisoning Attacks on Crowdsourcing

Learning from Noisy Crowd Labels with Logics

Reusing Deep Neural Network Models through Model Re-engineering

Template-based Neural Program Repair

A Collaboration-Aware Approach to Profiling Developer Expertise with Cross-Community Data

Improving Fault Localization and Program Repair with Deep Semantic Features and Transferred Knowledge

Retrieval-based Neural Source Code Summarization

Teaching

Introduction to Open Source Software Development [Fall 2022,2023]

Object-Oriented Programming [Fall 2022]

Intelligent Software Engineering [Spring 2024]

Hiring