Main page Research activities Publications Talks MSc thesis projects Courses Mentoring Hobby and spare time Write me This site uses
Google Analytics
Last updated on
06 August 2025

Publication details

P. Bushipaka, L. Passaro, T. Cucinotta. "Standard vs. Modular Sampling: Best Practices for Reliable LLM Unlearning," (to appear) Workshop on Innovation, Privacy-preservation, and Evaluations of machine Unlearning Techniques (WIPE-OUT), held jointly with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2025), September 15th, 2025, Porto, Portugal.

Abstract

A conventional LLM Unlearning setting consists of two subsets -"forget" and "retain", with the objectives of removing the undesired knowledge from the forget set while preserving the remaining knowledge from retain one. In privacy-focused unlearning research, a retain set is often further divided into neighbor sets, containing either directly or indirectly connected to the forget targets; and augmented by a general-knowledge set. A common practice in existing benchmarks is to employ only a single neighbor set, with general knowledge which fails to reflect the complexity of real-world data relationships. The implementation of LLM Unlearning typically involves 1:1 matching or cyclic iteration. However, the efficacy and stability of these de facto standards have not been critically examined. In this study, we systematically evaluate these common practices. Our findings reveal that relying on a single neighbor set is suboptimal and that a standard sampling approach can obscure performance trade-offs. Based on this analysis, we propose and validate an initial set of best practices: (1) Incorporation of diverse neighbor sets to balance forget efficacy and model utility, (2) Standard 1:1 sampling methods are inefficient and produce poor results, (3) Our proposed Modular Entity-Level Unlearning (MELU) strategy as an alternative to cyclic sampling. We demonstrate that this modular approach, combined with robust algorithms, provides clear and stable path towards effective unlearning. Our code can be found at MELU.


Main page Research activities Publications Talks MSc thesis projects Courses Mentoring Hobby and spare time Write me Last updated on
13 August 2025