Software Intelligence

We are

Software Intelligence Group

Our research is focused on using program analysis and machine/deep learning to improve the secucrity, reliablity and quality of software systems and AI models.

Recent Updates


  • Sep, 2025 Our ICSME'25 papers "Code Property Graph Meets Typestate: A Scalable Framework to Behavioral Bug Detection" won IEEE TCSE Distinguished Paper Award.
  • Jul, 2025 [AI] Our research paper "NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models" has been accepted by ACM Transactions on Software Engineering and Methodology (TOSEM).
  • Mar, 2024 Our papers "API Misuse Detection via Probabilistic Graphical Model" and "Model-less Is The Best Model: Generating Pure Code Implementations to Replace On-device DL Models" have been accepted by ISSTA 2024 research paper track.
  • Mar, 2024 Our SANER'24 paper "Investigating and Detecting Silent Bugs in PyTorch Programs" won IEEE TCSE Distinguished Paper Award.
  • Feb, 2024 Our ICSE'24 paper "Modularizing while Training: A New Paradigm for Modularizing DNN Models" won ACM SIGSOFT Distinguished Paper Award.
  • Jan, 2024 our paper "ProveNFix: Temporal Property guided Program Repair" has been accepted by the FSE'24 research papers track.
  • Dec, 2023 our paper "Investigating White-Box Attacks for On-Device Models" has been accepted by the ICSE'24 research papers track.
  • Nov, 2023 our paper "Reusing Convolutional Neural Network Models through Modularization and Composition" has been accepted by TOSEM as research paper.
  • Aug, 2023 our paper "Automated Fixing of Web UI Tests via Iterative Element Matching" has been accepted by the ASE'23 research papers track.
  • Jun, 2023 our paper "Modularizing while Training: A New Paradigm for Modularizing DNN Models" has been accepted by the ICSE'24 research papers track.
  • May, 2023 our paper "ModelObfuscator: Obfuscating Model Information to Protect Deployed ML-based Systems" has been accepted by the ISSTA'23 technical papers track.

Research


Research Directions

To improve software development efficiency and save cost, when developing software, many enterprises rely on third-party software, which constitutes the software supply chain (SSC). Due to the dependency relations among software, vulnerabilities in SSC may cause more serious security threats than independent software systems. This poses new challenges for ensuring software security. Our research mainly focuses on designing novel intelligent software engineering and program analysis techniques, including but not limited to software dependency analysis, software vulnerability detection, vulnerability repair and etc. The ultimate goal is to increase the security of software supply chain and software ecosystem.

read more ...

Usability and Robustness of DNN via SE4AI

Due to poor interpretability, a large number of parameters and data requirements, and poor reliability, the AI model, as the core of intelligent software systems, suffers from poor reusability, high testing overhead, and high security risks in development, testing, and deployment. AI models are considered as "Software 2.0". In this direction, we target the above problems from the perspective of software engineering. we aim to apply software engineering techniques/notions to AI model engineering to improve the models' usability and robustness.

AI/Crowd Intelligence

Crowd intelligence aims at achieving powerful intelligence through aggregating diverse contributions from many heterogeneous individuals, which is a typical form of AI. As a matter of fact, the sucess of many AI tehcnologies highly relies on large-scale datasets that are often built through crowd intelligence. We study various perspectives of crowd intelligence including theories, techniques and applications. Research topics include, but are not limited to, crowdsourcing, federated learning, human-in-the-loop AI and open source.

Selected Research Papers

  • Backdoor Defense via Enhanced Splitting and Trap Isolation

    Hongrui Yu, Lu Qi, Wanyu Lin, Jian Chen, Hailong Sun, Chengbin Sun.

    International Conference on Computer Vision (ICCV), 2025.

  • CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging

    Zongzhen Yang, Binhang Qi, Hailong Sun, Wenrui Long, Ruobin Zhao, Xiang Gao.

    The 42nd International Conference on Machine Learning (ICML), 2025.

  • Code Property Graph Meets Typestate: A Scalable Framework to Behavioral Bug Detection

    Xingjing Deng, Zhengyao Liu, Xitong Zhong, Shuo Hong, Yixin Yang, Xiang Gao, Xuhui Yan, Hailong Sun.

    The 41st IEEE International Conference on Software Maintenance and Evolution (ICSME), 2025.

  • IEEE Computer Society TCSE Distinguished Paper Award

  • NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models

    Xiaohan Bi, Binhang Qi, Hailong Sun, Xiang Gao, Yue Yu, Xiaojun Liang.

    ACM Transactions on Software Engineering and Methodology (TOSEM), 2025.

  • Reference-Based Retrieval Augmentation for Unit Test Generation

    Zhe Zhang, Xingyu Liu, Yuanzhang Lin, Xiang Gao, Hailong Sun, Yuan Yuan.

    ACM Transactions on Software Engineering and Methodology (TOSEM), 2025.

  • API Misuse Detection via Probabilistic Graphical Model

    Yunlong Ma, Wentong Tian, Xiang Gao, Hailong Sun, Li Li.

    International Symposium on Software Testing and Analysis (ISSTA), 2024.

  • FedEvalFair: A Privacy-Preserving and Statistically Grounded Federated Fairness Evaluation Framework

    Zhongchi Wang, Hailong Sun, Zhengyang Zhao.

    The 32nd ACM International Conference on Multimedia (MM), 2024.

  • Investigating and Detecting Silent Bugs in PyTorch Programs

    Shuo Hong, Hailong Sun, Xiang Gao, Shin Hwei Tan.

    The 31st IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2024.

  • IEEE Computer Society TCSE Distinguished Paper Award

  • ModelFoundry: A Tool for DNN Modularization and On-demand Model Reuse

    Xiaohan Bi, Ruobing Zhao, Binhang Qi, Hailong Sun, Xiang Gao, Yue Yu, Xiaojun Liang.

    The ACM International Conference on the Foundations of Software Engineering (FSE) - Demonstration track, 2024.

  • ModelGalaxy: A Versatile Model Retrieval Platform

    Wenling Zhang, Yixiao Li, Zhaotian Li, Hailong Sun, Xiang Gao, Xudong Liu.

    The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) - Demonstration track, 2024.

  • Modularizing while Training: A New Paradigm for Modularizing DNN Models

    Binhang Qi, Hailong Sun, Hongyu Zhang, Ruobing Zhao, Xiang Gao.

    International Conference on Software Engineering (ICSE), 2024.

  • ACM SIGSOFT Distinguished Paper Award

  • Investigating and Detecting Silent Bugs in PyTorch Programs

    Shuo Hong, Hailong Sun, Xiang Gao#, Shin Hwei Tan.

    International Conference on Software Analysis, Evolution and Reengineering (SANER), 2024.

  • IEEE TCSE Distinguished Paper Award

  • Automated Repair of Programs from Large Language Models

    Zhiyu Fan, Xiang Gao#, Martin Mirchev, Abhik Roychoudhury, Shin Hwei Tan.

    International Conference on Software Engineering (ICSE) 2023, 2023.

  • RA3: Human-in-the-loop Framework for Interpreting and Improving Image Captioning with Relation-Aware Attribution Analysis

    Lei Chai, Lu Qi, Hailong Sun, Jingzheng Li.

    The 40th IEEE International Conference on Data Engineering (ICDE), 2024.

  • Reusing Convolutional Neural Network Models through Modularization and Composition

    Binhang Qi, Hailong Sun, Hongyu Zhang, Xiang Gao.

    ACM Transactions on Software Engineering and Methodology (TOSEM), 2024.

  • Target Structure Learning Framework for Unsupervised Multi-Class Domain Adaptation

    Jingzheng Li, Hailong Sun, Lei Chai, Jiyi Li.

    ACM Transactions on Multimedia Computing Communications and Applications (TOMM), 2024.

  • AutoMRM: A Model Retrieval Method Based on Multimodal Query and Meta-learning

    Zhaotian Li, Binhang Qi, Hailong Sun, Xiang Gao.

    The 32nd ACM International Conference on Information and Knowledge Management (CIKM), 2023.

  • Black-Box Data Poisoning Attacks on Crowdsourcing

    Pengpeng Chen, Yongqiang Yang, Dingqi Yang, Hailong Sun, Zhijun Chen, Peng Lin.

    The 32nd International Joint Conference on Artificial Intelligence (IJCAI), 2023.

  • Learning from Noisy Crowd Labels with Logics

    Zhijun Chen, Hailong Sun, Haoqian He, Pengpeng Chen.

    The 39th IEEE International Conference on Data Engineering (ICDE), 2023.

  • Reusing Deep Neural Network Models through Model Re-engineering

    Binhang Qi, Hailong Sun, Xiang Gao, Hongyu Zhang, Zhaotian Li, Xudong Liu.

    The 45th International Conference on Software Engineering (ICSE), 2023.

  • Template-based Neural Program Repair

    Xiangxin Meng, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu, Chunming Hu.

    The 45th International Conference on Software Engineering (ICSE), 2023.

  • A Collaboration-Aware Approach to Profiling Developer Expertise with Cross-Community Data

    Xiaotao Song, Jiafei Yan, Yuexin Huang, Hailong Sun, Hongyu Zhang.

    The 22nd IEEE International Conference on Software Quality, Reliability, and Security (QRS), 2022.

  • Improving Fault Localization and Program Repair with Deep Semantic Features and Transferred Knowledge

    Xiangxin Meng, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu.

    The 44th International Conference on Software Engineering (ICSE), 2022.

  • Retrieval-based Neural Source Code Summarization

    Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu.

    The 42nd International Conference on Software Engineering (ICSE), 2020.



Teaching


Introduction to Open Source Software Development [Fall 2022,2023]

Object-Oriented Programming [Fall 2022]

Intelligent Software Engineering [Spring 2024]



Hiring


We are looking for highly self-motivated intern/master/phd students to work with us. Please send email to sunhl@buaa.edu.cn or xiang_gao@buaa.edu.cn if you are interested. We also have postdoc positions available.