Semi-supervised Learning(I) Label Propagation

Posted by Kaiyuan Chen on May 18, 2018

Semi-supervised Learning(I): Label Propagation

Context: SSL

Given a relative small labelled dataset {(x,y)} and a large unlabeled dataset {x}, one needs to devise ways to perform both from classification.

Traditional approach includes

  • GMM. Assume a Gaussian distribution and for unlabeled dataset
    • determine label of clusters by labelled examples
    • use labelled examples to determine mixture model
  • hard EM self-training
    • build a classifier with labelled data, then classify unlabeled data. Then the classifier is retrained the procedure is continued.

Then graph-based models gain their popularity. I think this is one variant of soft EM, which assumes the graphical structure and filling labels in a separate step, which is described in my previous blog about missing data(kinda connecting the dots here).

Does unlabeled data always help?

No. People have long realized that training Hidden Markov Model with unlabeled data (the Baum-Welsh algorithm, which by the way qualifies as semi-supervised learning on sequences) can reduce accuracy under certain initial conditions.

How does ssl use unlabeled data?

It can modify or reprioritize hypotheses obtained from labeled data alone, in which case it can identify P(x).

  • For generative models, P(x, y) = P(y|x)P(x), thus influence the results
  • discriminative models does not

Intuitive example from [2]

png

Algorithms

  • EM with generative mixtures
  • self-training
  • co-training
  • transudative SVM
  • graph based models

Label Propagation

Assumption: like knn, closer data points tend to have similar class labels

We have a fully connected graph where are data points, labelled and unlabeled. we propagate labels through edges. Larger edge weights allow labels to travel through more easily.

Algorithm Pseudocode

Transition Matrix T_{ij} with probability of transiting from i to j

  1. Propagate by f = Tf. Then the label propagate to neighbors.
  2. “Clamp labelled data”. i.e. filling f’s labelled entries
  3. Repeat until convergence

Next: Transition Matrix Formulation

References

A good thesis that describes everything in ssl: http://pages.cs.wisc.edu/~jerryzhu/pub/thesis.pdf

Comprehensive ssl survey http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf

Graph Construction

https://www.ijcai.org/Proceedings/15/Papers/619.pdf