In artificial neural networks, many models are trained for a narrow task using a specific dataset. They face difficulties in solving problems that include dynamic input/output data types and changing objective functions. Whenever the input/output tensor dimension or the data type is modified, the machine learning models need to be rebuilt and subsequently retrained from scratch. Furthermore, many machine learning algorithms that are trained for a specific objective, such as classification, may perform poorly at other tasks, such as reinforcement learning or quantification.
Even if the input/output dimensions and the objective functions remain constant, the algorithms do not generalize well across different datasets. For example, a neural network trained on classifying cats and dogs does not perform well on classifying humans and horses despite both of the datasets having the exact same image input1. Moreover, neural networks are highly susceptible to adversarial attacks2. A small deviation from the training dataset, such as changing one pixel, could cause the neural network to have significantly worse performance. This problem is known as the generalization problem3, and the field of transfer learning can help to solve it.
Transfer learning4,5,6,7,8,9,10 solves the problems presented above by allowing knowledge transfer from one neural network to another. A common way to use supervised transfer learning is obtaining a large pre-trained neural network and retraining it for a different but closely related problem. This significantly reduces training time and allows the model to be trained on a less powerful computer. Many researchers used pre-trained neural networks such as ResNet-5011 and retrained them to classify malicious software12,13,14,15. Another application of transfer learning is tackling the generalization problem, where the testing dataset is completely different from the training dataset. For example, every human has unique electroencephalography (EEG) signals due to them having distinctive brain structures. Transfer learning solves the generalization problem by pretraining on a general population EEG dataset and retraining the model for a specific patient16,17,18,19,20. As a result, the neural network is dynamically tailored for a specific person and can interpret their specific EEG signals properly. Labeling large datasets by hand is tedious and time-consuming. In semi-supervised transfer learning21,22,23,24, either the source dataset or the target dataset is unlabeled. That way, the neural networks can self-learn which pieces of information to extract and process without many labels.