FIXMATCH: SIMPLIFYING SEMI-SUPERVISED LEARNING WITH CONSISTENCY AND CONFIDENCE

Apr 15, 20215 min read

Updated: Aug 16, 2021

Authors : karthik konar, saranda, prasoon, prem,Rupesh Kumar

Deep Learning networks have become the most important and most used models in domains in computer vision. It is seen that larger the datasets that these models are used on, the better they perform. These deep networks perform very well in supervised learning, but they require large labeled datasets. Labeling datasets come at a high cost, since they sometimes require human labor, especially in domain specific datasets, like healthcare or medical applications, where the expertise of a professional is required. This is where semi-supervised learning comes in. Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. It leverages the already labeled data, with the unlabeled data to provide data for the model, and improve its performance.

Fig 1.1

FixMatch is an approach that uses semi-supervised learning, and uses a lot of processing methods from UDA and ReMixMatch. In simple terms, semi-supervised learning methods involves producing an artificial label for an unlabeled image, called a pseudo-label. The model is also trained so that, when it is fed unlabeled images, it must produce artificial labels for these input images.

SYNOPSIS:

This algorithm is a simplified version of the existing semi supervised learning methods. In the first step the FixMatch algorithm generates pseudo labels (pseudo labels are those labels which are predicted by the trained model when given unlabeled data as an input). When a specific image is given as input, the pseudo label is only retained when the model produces a high-confidence prediction. And, in the next step of the FixMatch algorithm, the model is retrained to predict the pseudo label by giving an input of the strongly augmented version of the same image. The fix-match algorithm gives an accuracy of 94.93% with 250 labels and 88.61% accuracy with 40 labels (4 labels per class).

Fig 1.2

Step 1: Train the model by feeding labeled data as an input.

Step 2: By using the train model, predict the labels for unlabeled data. The output which is obtained from this is called pseudo labels.

Step 3: Retrain the model using pseudo labels and labeled data.

FixMatch Model:

Fig 1.3

We use different versions of the same image (or augmented images), since the predictions of both of these images should be the same, and it will also improve the model generalization.

BACKGROUND:

FixMatch uses two approaches to semi-supervised learning (SSL):

Consistency Regularization
Pseudo-labeling

Consistency Regularization:

This is the use of the unlabeled images, based on the assumption that the model should predict the same label, when fed different perturbations of the same image (as shown above). The loss function defined for consistency regularization is:

[where α and pm are stochastic functions

Pseudo-Labeling:

First, the weakly augmented image is passed to the model, and a class prediction above a given threshold, is taken as the pseudo-label. Then, the strongly augmented image is passed to the model to get the model prediction. Then these two predictions are compared using cross-entropy loss, and then combined.

By taking

, loss function is :

ALGORITHM:

Loss functions for the algorithm is divided in two ways

(i) ls

(ii) lu

supervised and unsupervised respectively. They were given by the following equations:

AUGMENTATION METHODS:

As we have seen above, FixMatch uses two types of augmentations:

Weak augmentation
Strong augmentation

WEAK AUGMENTATION:

Flip images randomly by probability of 50% (this is only skipped in SVHN dataset)
Randomly translate images by up to 12.5%, vertically and horizontally.

STRONG AUGMENTATION:

AutoAugment: This a reinforcement learning algorithm, which learns the best augmentation strategy for a particular task. FixMatch uses two variants of this algorithm: RandAugment CTAugment
Cutout: This is an image processing technique that randomly masks out square parts of an image

MODEL ARCHITECTURE:

Wide ResNet-28-2 (depth = 28, widening factor = 2) is used. This has a total of 1.5 million parameters.

1. MAKING BATCHES:

Batches are made of size B labeled examples. And µB batches of unlabeled examples. Here, µ determines the relative sizes of the labeled and unlabeled images in a batch.

2. SUPERVISED LEARNING:

Training is done on labeled images, and we use cross-entropy loss H() as the loss function, during the classification of each input image x. Total loss of a batch is calculated by taking average of cross-entropy loss for each image in the batch.

3. PSEUDO-LABELING:

The weakly augmented unlabeled image is passed to the trained model, and the argmax of the highest class prediction is taken as the pseudo-label.

4. CONSISTENCY REGULATION:

This is where the strongly augmented images, are fed to the model, and this output, and the pseudo-label are compared to calculate cross-entropy loss H().

5. FINAL LOSS MINIMIZATION:

The final loss minimized by FixMatch is:

loss = ls + λulu

where,

ls = cross-entropy loss on the model’s predictions of the weakly augmented images,

λu = fixed scalar hyperparameter denoting relative weight of the unlabeled loss,

lu = cross-entropy loss on the model’s predictions of the strongly augmented images.

RESULTS:

1. CIFAR - 10, CIFAR - 100, SVHN

The results of FixMatch on CIFAR-10 and SVHN are state-of-the-art. Although ReMixMatch performs on CIFAR-100 better than FixMatch, because of a component called Distribution Alignment (DA), which influences the model to output in the same class distribution as the labeled set. So, when FixMatch is combined with DA, an error rate of 40.14% is reached, as opposed to 44.28% error rate of ReMixMatch.

2. STL – 10

This dataset has 5,000 labeled images, and 10,000 unlabeled images, and 10 classes. SSL algorithms were tested on 5 folds of 1,000 images each, and FixMatch (with CTAugment used) gave lowest error rate.

3. IMAGENET

10% of the ImageNet dataset was used as labeled images, and the rest was used as unlabeled images. Here, if FixMatch is used with ResNet-50 as the base network, and with RandAugment, then a top-1 error rate of 28:54 ± 0:52%, which is 2:68% better than UDA, is reached.

The following images give the respective error rates on these different datasets, using different types of augmentations, and different baseline models:

Fig 1.4

A SVM BASED SEMI-SUPERVISED LEARNING IMPLEMENTATION

Dataset:

Fig 1.5

The above dataset consists of 2 values namely hours and scores.Lets rename the values as “No_of_Hours_Studied” and “Scores_Obtained”. The variable “Scores_Obtained” is dependent on the variable “No_of_Hours_Studied” hence it is called as dependent variable. The variable “No_Of_Hours_Studied” is not dependent on anything and hence it is called an independent variable.

Steps to build a semi-supervised learning model

Step 1: Construct a classifier on the labelled data

Step 2: Use the classifier to predict the unlabelled data

Step 3: The observations obtained from step(2) ,note those observations and add those observations to your training data.This process is termed as generating “pseudo-label”.

Step 4: The data set obtained in step(3) use those dataset for training your model and use the newly built model

IMPLEMENTATION :

Step 1 : Divide your dataset into 3 parts:train(labelled),unl(unlabelled),test

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    x, y, test_size=0.3, random_state=1)
X_train, X_unl, y_train, y_unl = train_test_split(
    X_train, y_train, test_size=0.7, random_state=1)

Step 2 : Now simply train on the labelled portion by building an support vector machine classifier

from sklearn import svm
clf = svm.SVC(kernel='linear', probability=True,C=1).fit(X_train, y_train)
clf.score(X_test, y_test)

Step 3 : Now we predict on the unlabelled data


df = pd.DataFrame(columns = ['Hours', 'Score']) 
df['Hours']='Hours'
df['Scores']=y_unl
df['max']=df[["Hours", "Scores"]].max(axis=1)
df['max']

Step 4: Now we built a model.

lm.fit(X_train,y_train)
y_predict=lm.predict(X_test)
pd.crosstab(y_test,y_predict)

First we fit our linear regression model by passing our training data as input, and then we make the model to predict using the test data and then we print the results in the tabular format by passing test data,and predicted data as parameters.

Fig 1.6

From the above figure we can observe that the student who has scored 17 marks,studied for 5 hours,the student who has scored 24 marks studied for 14 hours(approx) and so on. So we can finally conclude that an increase in study hours leads to scoring more exams .Which proves our hypothesis that Marks_Scored is dependent on No_Of_Hours_Studied. And our model has given accurate predictions

GitHub :

https://github.com/karthik1998konar

References :

https://www.analyticsvidhya.com/blog/2017/09/pseudo-labelling-semi-supervised-learning-technique/

Madras Scientific Research Foundation

FIXMATCH: SIMPLIFYING SEMI-SUPERVISED LEARNING WITH CONSISTENCY AND CONFIDENCE

ALGORITHM:

AUGMENTATION METHODS:

MODEL ARCHITECTURE:

RESULTS:

Recent Posts

Comments