publications
(*) denotes equal contribution
2024
- When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided SearchYuzhou Nie*, Xuan Chen*, Wenbo Guo, and Xiangyu Zhang2024
Recent studies developed jailbreaking attacks, which construct jailbreaking prompts to “fool” LLMs into responding to harmful questions. Early-stage jailbreaking attacks require access to model internals or significant human efforts. More advanced attacks utilize genetic algorithms for automatic and black-box attacks. However, the random nature of genetic algorithms significantly limits the effectiveness of these attacks. In this paper, we propose RLbreaker, a black-box jailbreaking attack driven by deep reinforcement learning (DRL). We model jailbreaking as a search problem and design an RL agent to guide the search, which is more effective and has less randomness than stochastic search, such as genetic algorithms. Specifically, we design a customized DRL system for the jailbreaking problem, including a novel reward function and a customized proximal policy optimization (PPO) algorithm. Through extensive experiments, we demonstrate that RLbreaker is much more effective than existing jailbreaking attacks against six state-of-the-art (SOTA) LLMs. We also show that RLbreaker is robust against three SOTA defenses and its trained agents can transfer across different LLMs. We further validate the key design choices of RLbreaker via a comprehensive ablation study.
2023
2022
- Adversarial and Implicit Modality Imputation with Applications to Depression Early DetectionYuzhou Nie*, Chengyue Huang*, Hailun Liang, and Hongteng Xu2022
Depression early detection is a significant healthcare task that heavily relies on high-quality multi-modal medical data. In practice, however, learning a robust detection model is challenging because real-world data often suffers from serious modality-level missing issues caused by imperfect data collection and strict data sharing policies. In this study, we propose an Adversarial and Implicit Modality Imputation (AIMI) method to resolve this challenge. In particular, when training multi-modal predictive models, we learn an implicit mechanism to impute the missing modalities of training data at the same time. These two learning objectives are achieved jointly in an adversarial learning framework. Based on the UK Biobank dataset, we demonstrate the effectiveness and superiority of our method on the early detection of depression. Codes are available at https://github.com/rucnyz/AIMI
- Gromov-Wasserstein Multi-Modal Alignment and ClusteringFengjiao Gong*, Yuzhou Nie*, and Hongteng Xu2022
Multi-modal clustering aims at finding a clustering structure shared by the data of different modalities in an unsupervised way. Currently, solving this problem often relies on two assumptions: i) the multi-modal data own the same latent distribution, and ii) the observed multi-modal data are well-aligned and without any missing modalities. Unfortunately, these two assumptions are often questionable in practice and thus limit the feasibility of many multi-modal clustering methods. In this work, we develop a new multi-modal clustering method based on the Gromovization of optimal transport distance, which relaxes the dependence on the above two assumptions. In particular, given the data of different modalities, whose correspondence is unknown, our method learns the Gromov-Wasserstein (GW) barycenter of their kernel matrices. Driven by the modularity maximization principle, the GW barycenter helps to explore the clustering structure shared by different modalities. Moreover, the GW barycenter is associated with the GW distances between the different modalities to the clusters, and the optimal transport plans corresponding to the GW distances help to achieve the alignment and the clustering of the multi-modal data jointly. Experimental results show that our method outperforms state-of-the-art multi-modal clustering methods, especially when the data are (partially or completely) unaligned. The code is available at https://github.com/rucnyz/GWMAC.