Almog_gueta Knowledge Is a Region in Weight Space for Fine Tuned Language Model 2023

[TOC]

Summary of paper

relatively little is known a bout the relationships between different models, especially those trained or tested on different datasets.

we demonstrate that fine-tuned models that were optimized for high performance, reside in well-defined regions in weight space, and vice versa
language models that have been fine-tuned on the same dataset form a tight cluster in the same weight space and that models fine-tuned on different datasets from the same underlying task form a looser cluster.
traversing around the region between the models reaches new models that perform comparably or even better than models found via fine-tuning
Our findings demonstrate that a model positioned between two similar models can acquire the knowledge of both. We leverage this finding and design a method to pick a better model for efficient fine-tuning.

more findings

we show that after a pre-trained model is fine-tuned on similar datasets, the resulting fine-tuned models are close to each other in the weight space.
models fine-tuned on the sae data are closer to each other than to to other models
models that were fine-tuned on the same task also cluster together
models fine-tuned on language tasks are not spread around the pre-trained space arbitrarily but rather correspond to a constrained region in weight space

rather than fine-tuning

Notably, such points in weight space might not necessarily be reached via fine-tuning, but rather via spatial transformation.

points on a line between the two points representing two models fine-tuned on the same dataset

we find that points on a line between the two points representing two models fine-tuned on the same dataset attain similar or even lower loss than the two individual models.

empirical findings

suggesting, for example, that the best models may not lie at the edges of the region, but rather closer to its center, while fine-tuning often yields models at the edge of the region
motivated by these findings, we demonstrate that a model created by averaging the weights of fine-tuned models from the same region outperforms the pre-trained model on a variety of tasks after subsequent fine-tuning.

comparing loss difference is the core idea but the loss of a given model is often incomparable across datasets or tasks
to define a loss that is comparable across models, we first adopt the typical perspective that the model $f_\theta$ consist of a representation encoder $f_w$ followed by a task-specific $f_\phi$, i.e., $f_\theta = f_\phi \circ f_w$
to calculate the loss we do the following
- first, remove any existing masked language modeling layers or classification heads and replace them with a new randomly initialised classification head. This leaves the rest of the weights i.e., the encoder $f_w$ fixed.
- We then perform linear probing i.e., we train only the new classification head on a desired target data and its label
- Lastly, we pass the test data through the model and report the loss with respect to the label.

Projection by t-SNE

t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualising high-dimensional data by giving each datapoint a location in a two-or three dimensional map.