
These five papers from 2022 up to 2025 discuss various **knowledge distillation techniques** aimed at transferring the capabilities of large language models (LLMs) to smaller, more efficient models, often without the need for explicit context during inference. One paper introduces **Contextualization Distillation** (CD) for Knowledge Graph Completion (KGC), demonstrating that utilizing LLMs like PaLM2 to generate descriptive context for triplets significantly enhances the performance of smaller, specialized KGC models, often outperforming direct use of LLMs for the task. Another source proposes **Context Distillation** as a general method for language models to internalize abstract instructions, step-by-step reasoning (scratch-pads), and concrete examples, effectively eliminating the need for lengthy prompts and improving inference efficiency. The third document details **In-context Learning Distillation**, a framework that combines in-context learning objectives with traditional language modeling to effectively transfer few-shot learning abilities from large to smaller models under different tuning paradigms. Finally, **Generative Prompt Internalization** (GenPI) is presented as a method to fully embed long, complex prompts into a smaller model by training it to generate the prompt content and the reasoning for its corresponding behavior, greatly increasing efficiency in agent-based applications.
2022: Learning by Distillation Context
https://arxiv.org/pdf/2209.15189
2022: In-context Learning Distillation: Transferring Few-shot
https://arxiv.org/pdf/2212.10670
2024: Contextualization Distillation from Large Language Model for Knowledge
Graph Completion
https://aclanthology.org/2024.findings-eacl.32.pdf
May 12, 2025: Efficient LLM Context Distillation
https://arxiv.org/pdf/2409.01930
March 25, 2025: Generative Prompt Internalization
https://arxiv.org/pdf/2411.15927