Skip to content

Implement Early Exit #388

@gkielian

Description

@gkielian

This feature would monitor lm head overlaps for each of the earlier layers.

Two variations:

  1. learned lm heads per layer, and somehow included in backprop
  2. same as word embedding table (WTE), but not included in back prop.

Bonus to include early exit innovations from the SLED paper:
https://arxiv.org/pdf/2411.02433

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions