a.k.a. Joint Learning, Learning with auxiliary tasks.
2개 이상의 loss function 사용 ⇒ MTL!
<aside> 💡 “MTL improves generalization by leveraging the domain-specific information contained in the training signals of related tasks.”
</aside>
Although acceptable performance can be achieved by being totally focused on a single task, sometimes we ignore information that might help us do even better on the metric we care about. Specifically, this kind of information comes from the training signals of other related tasks. By sharing representations among these related tasks, we can achieve better generalization for our model on our intended task. This approach is known as multitask learning (MTL).
Sharing representation → generalization!
Hard Parameter Sharing
Hidden layer 공유하면서 task-specific output layer 구별
→ Overfitting 리스크 감소
Soft Parameter Sharing
각 task마다 own model with its own parameters
Parameter간의 distance를 regularize → parameter들끼리 비슷하게 만들어줌