Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models

1Silesian University of Technology 2Jagiellonian University 3IDEAS Research Institute
*Indicates Equal Contribution
Teaser image

Abstract

The impressive capability of modern text-to-image models to generate realistic visuals has come with a serious drawback: they can be misused to create harmful, deceptive or unlawful content. This has accelerated the push for machine unlearning. This new field seeks to selectively remove specific knowledge from a model's training data without causing a drop in its overall performance. However, it turns out that actually forgetting a given concept is an extremely difficult task. Models exposed to attacks using adversarial prompts show the ability to generate so-called unlearned concepts, which can be not only harmful but also illegal. In this paper, we present considerations regarding the ability of models to forget and recall knowledge, introducing the Memory Self-Regeneration task. Furthermore, we present MemoRa strategy, which we consider to be a regenerative approach supporting the effective recovery of previously lost knowledge. Moreover, we propose that robustness in knowledge retrieval is a crucial yet underexplored evaluation measure for developing more robust and effective unlearning techniques. Finally, we demonstrate that forgetting occurs in two distinct ways: short-term, where concepts can be quickly recalled, and long-term, where recovery is more challenging.

Methodology

Methodology overview

Our method aims to recover unlearned information by using only a few images containing removed concepts. We first expand the training set using DDIM inversion and diversify it via spherical interpolation. Next, we fine-tune a LoRA adapter to restore the erased concept. Results reveal two types of forgetting: short-term, where knowledge is quickly recovered, and long-term, where recovery is harder.

We hypothesize that short-term forgetting corresponds to parts of the manifold moving away (the class is erased, but other classes show lower FID), while long-term forgetting reflects a displacement along the manifold.

Results

Visualizations of images generated by SD v1.4 and its variants for the nudity, church, Van Gogh concepts.
First row: image generation within the unlearned models.
Second row: image generation using the MemoRa strategy.

Prompt Attack vs. MemoRa

Objects

Original

Original SD

Unlearned

Unlearned

Unlearned + DiffAtk

UnlearnDiffAtk

MemoRa

MemoRa

Nudity

Original

Original SD

Unlearned

Unlearned

Unlearned + DiffAtk

UnlearnDiffAtk

MemoRa

MemoRa

Van Gogh

Original

Original SD

Unlearned

Unlearned

Unlearned + DiffAtk

UnlearnDiffAtk

MemoRa

MemoRa

Multi-MemoRa

Multi-MemoRa

What is Multi-MemoRa?

Multi-MemoRa is used for relearning multiple concepts (in this case, famous people). It is an extension of the MemoRa strategy, which combines two adapters. Visualizations are presented for MACE that unlearned two celebrities.

BibTeX

@misc{polowczyk2025memoryselfregenerationuncoveringhidden,
      title={Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models}, 
      author={Agnieszka Polowczyk and Alicja Polowczyk and Joanna Waczyńska and Piotr Borycki and Przemysław Spurek},
      year={2025},
      eprint={2510.03263},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.03263}
}