[TOC]

  1. Title: Knowledge Is a Region in Weight Space for Fine Tuned Language Model
  2. Author: Almog Gueta et. al.
  3. Publish Year: 12 Feb 2023
  4. Review Date: Wed, Mar 1, 2023
  5. url: https://arxiv.org/pdf/2302.04863.pdf

Summary of paper

image-20230301124703839

Motivation

  • relatively little is known a bout the relationships between different models, especially those trained or tested on different datasets.

Contribution

  • we demonstrate that fine-tuned models that were optimized for high performance, reside in well-defined regions in weight space, and vice versa
  • language models that have been fine-tuned on the same dataset form a tight cluster in the same weight space and that models fine-tuned on different datasets from the same underlying task form a looser cluster.
  • traversing around the region between the models reaches new models that perform comparably or even better than models found via fine-tuning
  • Our findings demonstrate that a model positioned between two similar models can acquire the knowledge of both. We leverage this finding and design a method to pick a better model for efficient fine-tuning.

more findings

  1. we show that after a pre-trained model is fine-tuned on similar datasets, the resulting fine-tuned models are close to each other in the weight space.
  2. models fine-tuned on the sae data are closer to each other than to to other models
  3. models that were fine-tuned on the same task also cluster together
  4. models fine-tuned on language tasks are not spread around the pre-trained space arbitrarily but rather correspond to a constrained region in weight space

Some key terms

rather than fine-tuning

  • Notably, such points in weight space might not necessarily be reached via fine-tuning, but rather via spatial transformation.

points on a line between the two points representing two models fine-tuned on the same dataset

  • we find that points on a line between the two points representing two models fine-tuned on the same dataset attain similar or even lower loss than the two individual models.

empirical findings

  • suggesting, for example, that the best models may not lie at the edges of the region, but rather closer to its center, while fine-tuning often yields models at the edge of the region
  • motivated by these findings, we demonstrate that a model created by averaging the weights of fine-tuned models from the same region outperforms the pre-trained model on a variety of tasks after subsequent fine-tuning.

Comparing models

  • comparing loss difference is the core idea but the loss of a given model is often incomparable across datasets or tasks
  • to define a loss that is comparable across models, we first adopt the typical perspective that the model $f_\theta$ consist of a representation encoder $f_w$ followed by a task-specific $f_\phi$, i.e., $f_\theta = f_\phi \circ f_w$
  • to calculate the loss we do the following
    • first, remove any existing masked language modeling layers or classification heads and replace them with a new randomly initialised classification head. This leaves the rest of the weights i.e., the encoder $f_w$ fixed.
    • We then perform linear probing i.e., we train only the new classification head on a desired target data and its label
    • Lastly, we pass the test data through the model and report the loss with respect to the label.

Projection by t-SNE

  • t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualising high-dimensional data by giving each datapoint a location in a two-or three dimensional map.