Exploring Handwritten Digit Recognition with TensorFlow

·

5 min read

In this post, we'll explore how to build a multi-layer perceptron (MLP) for recognizing handwritten digits using TensorFlow, a popular machine learning library. We'll also investigate how different factors affect the accuracy of the model. This project was prepared for my master's assignment at Colorado State University.

Building and Evaluating the Model

We started by building a simple MLP with one hidden layer of 512 neurons. The model was trained on the MNIST dataset, a collection of 70,000 grayscale images of handwritten digits. After training the model for 20 epochs, we achieved an accuracy of 94.7% on the test set. This means that our model correctly classified 94.7% of the test images, a promising start!

import tensorflow as tf
import numpy as np

# Load and preprocess data
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train = x_train.reshape(-1, 784).astype('float32')
x_test = x_test.reshape(-1, 784).astype('float32')
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

# Create tf.data.Dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))

# Define the model
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(512, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='sgd',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(train_dataset.batch(100), epochs=20)

# Evaluate the model
print("Test accuracy: ", model.evaluate(test_dataset.batch(100))[1])

Investigating Misclassified Images

Next, we took a closer look at the images that our model misclassified. Many of these digits were written in unusual or unclear handwriting. For example, some of the digits were written with extra loops or lines, or were partially obscured or incomplete. These factors likely made it difficult for the model to correctly classify these images. This suggests that our model might benefit from additional training data that includes more examples of unusual or unclear handwriting.

y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)
y_true_classes = np.argmax(y_test, axis=1)
misclassified_indices = np.where(y_pred_classes != y_true_classes)[0]
for i in range(3):
    display_sample(misclassified_indices[i])

Tuning the Model: Hidden Neurons

We then experimented with changing the number of hidden neurons in our model. When the number of hidden neurons was increased from 256 to 512, the accuracy increased from 94.5% to 94.7%. When the number of hidden neurons was further increased to 1024, the accuracy only increased slightly to 94.8%. This suggests that increasing the number of hidden neurons can improve the model's accuracy, but there are diminishing returns beyond a certain point.

for hidden_nodes in [256, 512, 1024]:
    model = tf.keras.models.Sequential([
        tf.keras.layers.Dense(hidden_nodes, activation='relu', input_shape=(784,)),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit(x_train, y_train, epochs=EPOCH, batch_size=BATCH_SIZE)
    print("Accuracy with", hidden_nodes, "hidden neurons: ", model.evaluate(x_test, y_test)[1])

Tuning the Model: Learning Rate

The learning rate of our model also had a significant impact on accuracy. When the learning rate was increased from 0.01 to 0.1, the accuracy increased dramatically to from 94.8% to 98.0%. However, when the learning rate was further increased to 0.5, the accuracy increased only slightly to 98.3%. When the learning rate was increased to 1.0, the accuracy increased slightly further to 98.4%. This indicates that there's an optimal learning rate that maximizes accuracy, and setting the learning rate too high or too low can harm performance.

for learning_rate in [0.01, 0.1, 0.5, 1.0]:
   model = tf.keras.models.Sequential([
       tf.keras.layers.Dense(hidden_nodes, activation='relu', input_shape=(784,)),
       tf.keras.layers.Dense(10, activation='softmax')
   ])
   model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=learning_rate), loss='categorical_crossentropy', metrics=['accuracy'])
   model.fit(x_train, y_train, epochs=EPOCH, batch_size=BATCH_SIZE)
   print("Accuracy with learning rate", learning_rate, ": ", model.evaluate(x_test, y_test)[1])

Tuning the Model: Adding a Hidden Layer

Adding a second hidden layer to our model slightly improved the accuracy from 94.8% to 96.1%. This suggests that the additional hidden layer enabled the model to capture more complex patterns in the data, thereby improving its accuracy. However, the improvement was relatively small, suggesting that the model was already able to capture most of the relevant patterns with a single hidden layer.

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(hidden_nodes, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(hidden_nodes, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=EPOCH, batch_size=BATCH_SIZE)
print("Accuracy with two hidden layers: ", model.evaluate(x_test, y_test)[1])

Tuning the Model: Batch Size

Finally, we experimented with different batch sizes for training our model. When the batch size was increased from 50 to 100, the accuracy decreased from 96.2% to 94.9%. However, when the batch size was further increased to 200, the accuracy decreased to 93.4%. This suggests that increasing the batch size can diminish the model's accuracy.

for batch_size in [50, 100, 200]:
    model = tf.keras.models.Sequential([
        tf.keras.layers.Dense(hidden_nodes, activation='relu', input_shape=(784,)),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit(x_train, y_train, epochs=EPOCH, batch_size=batch_size)
    print("Accuracy with batch size", batch_size, ": ", model.evaluate(x_test, y_test)[1])

Conclusion

After conducting a series of experiments with different hyperparameters, we achieved the highest accuracy with a multi-layer perceptron model that had 512 hidden neurons per layer and a learning rate of 1.0. The model had two hidden layers. The best accuracy obtained was 98.4% on the test set, which is a relatively high accuracy indicating the effectiveness of the model in recognizing handwritten digits.

In conclusion, while we achieved a high accuracy with our current multi-layer perceptron model, there's always room for improvement and exploration in the field of machine learning. The journey to optimize a model is a continuous process of learning and tweaking, and that's what makes it exciting!

Through this exploration, we've seen how different factors can affect the accuracy of a machine learning model, and how careful tuning can improve performance. We hope this post has provided some useful insights into the process of building and tuning an MLP with TensorFlow!