CMU-CS-24-136
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-24-136

On Resource Efficient Transfer Learning
via End Task Aware Training

Lucio Mwinmaarong Dery

Ph.D. Thesis

July 2024

CMU-CS-24-136.pdf


Keywords: Transfer Learning, Auxiliary Learning, Meta-Learning, Machine Learning Efficiency, Structured Pruning

Transfer learning is a machine learning (ML) paradigm where performance on a desired end task1 is improved by exploiting "knowledge" from other tasks. The technique has become a critical workhorse driving many of the advances on the envelope of capabilities of machine learning models. The current formula is relatively simple – train a large model on large amounts of data from the transfer task(s); then applythe learned model either zero-shot or adapted to the desired downstream task(s).

This thesis recognizes that these powerful models are not developed in-vacuo but rather require non-trivial resources to train and deploy. As such, there are a wide range of salient problems and communities of researchers that the status-quo leaves behind. In the first part of this thesis, we will focus on the training time problem of data-efficient transfer learning. We will begin by making a case for exploiting advanced knowledge of the desired downstream task(s) – which is commonly the case in many ML settings – to inform different dimensions of transfer learning. We dub this end task aware transfer learning. Next, we will present a set of novel end task aware optimization algorithms that bias the learning trajectory towards data-efficient solutions with strong generalization on the end task. We will close this part by providing an automated approach to constructing and searching over task-relevant transfer objectives when only end task data is available and in limited amounts.

For the second section of this thesis, we will develop algorithms for compute and memory efficient transfer learning. Our goal will be to deliver a small and efficient yet performant task specific model for deployment seeded from a large, generalist model that has already been pre-trained on a transfer task (or set of tasks). Focusing on structured pruning as the technique for making models smaller, we will investigate pruning under two resource constrained settings: (1) limited task data, where we will exploit extra transfer tasks to learn pruning structures that, at the same task performance, lead to more compute and memory efficient models (2) settings of limited memory, where many of the classical pruning techniques break down because they require gradient-based optimization which can have prohibitive memory overhead.

This thesis concludes by presenting more avenues for future work on resource efficient transfer learning by building on our past work and suggesting novel branches of investigation.


______
1end task here may encompass an aggregated suite of tasks

157 pages

Thesis Committee:
Graham Neubig (Co-Chair)
Ameet Talwalkar (Co-Chair)
Zico Kolter
Luke Zettlemoyer (University of Washington / Meta)
Marc'Aurelio Ranzato (Google DeepMind)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu