As the name suggests, semi-supervised learning can be seen as a compromise between supervised and unsupervised learning because it uses both labeled and unlabeled data for training. In this sense, problems where you have a large amount of input data, and only some of the data is labeled, can be classified as semi-supervised learning problems.
Many real-world machine learning problems can be classified as semi-supervised because it can be very difficult, expensive, or time-consuming to label all of the data properly, whereas unlabeled data is easier to collect.
In these situations, only a small amount of the training data is labeled and you can explore both supervised and unsupervised learning techniques:
- You can use unsupervised learning techniques to discover and learn the structure in the input variables.
- You can use supervised learning techniques to train a classifier using the labeled data and, afterward, use this model to make predictions for the unlabeled data. At this point, you can feed that data back into the supervised learning algorithm as training data to iteratively increase the size of the labeled data and use the retrained model to make predictions on new unlabeled data.