Handwriting recognition, also known as optical character recognition, is a process that converts a pre-handwritten document into digital form. Offline recognition has the advantage of being able to be performed at any time after the document has been written, even years later. However, it also holds the disadvantage of not functioning in real time, i.e. process the characters as the user writes them down.
Offline handwriting recognition has a wide range of applications, including reading postal addresses, bank check amounts, and forms. Furthermore, OCR is vital for digital libraries since it allows for the digitization, preservation, and recognition of image textual information into computers.
![](https://static.wixstatic.com/media/nsplsh_04f93483049e41a2bc3171ef5299c298~mv2.jpg/v1/fill/w_980,h_551,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/nsplsh_04f93483049e41a2bc3171ef5299c298~mv2.jpg)
Now let’s take a look at a Machine Learning model that does this for us using a Convolutional Neural Network. We will go through the steps gradually and in the end, have a model that recognizes the characters!
Dataset
The dataset we will be using is a Kaggle A-Z handwritten alphabet dataset which contains 26 folders (A-Z) containing handwritten images in size 2828 pixels, each alphabet in the image is centre fitted to a 2020 pixel box. Each image is stored as Gray-level and is labelled the actual alphabet that the image represents.
Let’s take a look at the .csv file.
![](https://static.wixstatic.com/media/6e3b57_f2a989fe4f7b4836a5beca8e13a13f79~mv2.jpeg/v1/fill/w_980,h_212,al_c,q_80,usm_0.66_1.00_0.01,enc_auto/6e3b57_f2a989fe4f7b4836a5beca8e13a13f79~mv2.jpeg)
FIG:1
It might not make a lot of sense right now. However, we will be taking a look at the dataset by plotting it later on. For now, the important information to keep in mind is that the first column represents the label, i.e. the alphabet, that is represented by the rest of the columns.
For convenience, let’s rename the first column as ‘label’.
data.rename(columns={'0':'label'}, inplace=True)
We are now ready to split it into training and testing datasets.
Splitting the dataset
First, we need to remove the ‘label’ column from the dataset consisting of the feature columns, and add it to the label set. Now, we split the dataset into a 80:20 ratio, where 80% of the data is used to train and the rest is used to test the model.
Scaling
We then transform the features by scaling each feature to a given range. This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.
caler = MinMaxScaler()
scaler.fit(X_train) #scaling data X_train =
scaler.transform(X_train)
X_test = scaler.transform
(X_test) X_train[1:10]
We then reshape the numpy arrays that we got into known shapes so as to have an idea about the input size of the first layer to our model.
X_train = np.reshape(X_train, (X_train.shape[0], 28,28,1)).astype('float32') X_test = np.reshape(X_test, (X_test.shape[0], 28,28,1)).astype('float32')
Convert the labels to categorical values.
As discussed in preprocessing of datasets, we convert numerical to categorical values in order to classify the values into one of 26 categories. We do this using the np_utils.to_categorical function.
y_train = np_utils.to_categorical(y_train,num_classes=26,dtype=int) y_test = np_utils.to_categorical(y_test,num_classes=26,dtype=int)
We then define a dictionary in order to convert numerical to letters during pictorial representation and plot some of the values.
letters_dict = {0:'A',1:'B',2:'C',3:'D',4:'E',5:'F',6:'G', 7:'H',8:'I',9:'J',10:'K',11:'L',12:'M',13:'N', 14:'O',15:'P',16:'Q',17:'R',18:'S',19:'T',20:'U', 21:'V',22:'W',23:'X', 24:'Y',25:'Z'} fig, axis = plt.subplots(3, 3, figsize=(20, 20)) for i, ax in enumerate(axis.flat): ax.imshow(X_train[i].reshape(28,28))
ax.axis('off') ax.set(title = f"Alphabet : {letters_dict[y_train[i].argmax()]}")
And now let’s take a look at what we plotted, to get a better view of our dataset.
![](https://static.wixstatic.com/media/6e3b57_6458d8ca96514cdd997efd0c27db29c8~mv2.jpeg/v1/fill/w_980,h_570,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/6e3b57_6458d8ca96514cdd997efd0c27db29c8~mv2.jpeg)
FIG:2
As you can see, we have the images with their respective labels inscribed on top of them.
That’s all the preprocessing this dataset needs! Let us now move on to the model definition and training part.
The CNN model
We now define a convolutional neural network in order to train the model to take pictorial inputs and classify the characters in the pictures as one of the 26 alphabets. Let’s begin by writing a Keras sequential model. For more information about the code for the following model, go to the Github page. For simplicity purposes, only the summary of the model is presented here.
![](https://static.wixstatic.com/media/6e3b57_73e2a6299c8c41249c55d1d5d41262d9~mv2.jpeg/v1/fill/w_980,h_824,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/6e3b57_73e2a6299c8c41249c55d1d5d41262d9~mv2.jpeg)
FIG:3
Here we have defined a model that takes in an input of (28,28,1) and produces an output as a 26-element numpy array of which the largest value represents what letter of the alphabet it is. Now let’s train the model for 10 epochs with batch size as 128.
import time start=time.time() history = model.fit(X_train, y_train, epochs=10,batch_size=128,verbose=2,validation_data = (X_test,y_test)) end=time.time() print('\n') print(f'Execution Time :{round((end-start)/60,3)} minutes')
![](https://static.wixstatic.com/media/6e3b57_e0d3eacfcf184879973808459eea8b1f~mv2.jpeg/v1/fill/w_980,h_514,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/6e3b57_e0d3eacfcf184879973808459eea8b1f~mv2.jpeg)
FIG:4
We see that we reach a validation accuracy of 0.99 and a validation loss of 0.05 which is quite impressive! You can tweek the hyperparameter to see how that changes the results.
Results
Now let’s check how the model works when we feed in some data.
preds = model.predict(X_test)
X_test_ = X_test.reshape(X_test.shape[0], 28, 28)
fig, axis = plt.subplots(3, 3, figsize=(20, 20))
for i, ax in enumerate(axis.flat): ax.imshow(X_test_[i])
ax.axis('off')
ax.set(title = f"Real Alphabet : {letters_dict[y_test[i].argmax()]}\nPredicted Alphabet : {letters_dict[preds[i].argmax()]}");
![](https://static.wixstatic.com/media/6e3b57_a39ffab60ca94131ba25a79e944cb99f~mv2.jpeg/v1/fill/w_980,h_775,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/6e3b57_a39ffab60ca94131ba25a79e944cb99f~mv2.jpeg)
FIG:5
As we can see, the model has predicted all the alphabets right and so, this was a successful run. In case the model isn’t accurate enough, you can always play around with the hyperparameters, or increase your database size in order to have more variety and less overfitting. If you want, you can save the results obtained in a .csv file as the last block of code suggests. However, that is purely optional.
That’s it! Now you have yourself a simple functional handwriting recognition system!
GitHub:
https://github.com/Viswonathan06/Handwriting-Recognition-System
References
https://www.kaggle.com/sachinpatel21/az-handwritten-alphabets-in-csv-format
https://www.sciencedirect.com/topics/computer-science/handwriting-recognition#:~:text=Applications%20of%20offline%20handwriting%20recognition,image%20restoration%2C%20and%20recognition%20methods.
https://towardsdatascience.com/build-a-handwritten-text-recognition-system-using-tensorflow-2326a3487cd5
Comments