3.4.8.13. Simple visualization and classification of the digits dataset¶

Plot the first few samples of the digits dataset and a 2D representation built using PCA, then do a simple classification

fromsklearn.datasetsimportload_digits
digits=load_digits()

Plot the data: images of digits¶

Each data in a 8x8 image

importmatplotlib.pyplotasplt
fig=plt.figure(figsize=(6,6))# figure size in inches
fig.subplots_adjust(left=0,right=1,bottom=0,top=1,hspace=0.05,wspace=0.05)
foriinrange(64):
ax=fig.add_subplot(8,8,i+1,xticks=[],yticks=[])
ax.imshow(digits.images[i],cmap="binary",interpolation="nearest")
# label the image with the target value
ax.text(0,7,str(digits.target[i]))

Plot a projection on the 2 first principal axis¶

plt.figure()
fromsklearn.decompositionimportPCA
pca=PCA(n_components=2)
proj=pca.fit_transform(digits.data)
plt.scatter(proj[:,0],proj[:,1],c=digits.target,cmap="Paired")
plt.colorbar()

<matplotlib.colorbar.Colorbar object at 0x7f78e6f3eff0>

Classify with Gaussian naive Bayes¶

fromsklearn.naive_bayesimportGaussianNB
fromsklearn.model_selectionimporttrain_test_split
# split the data into training and validation sets
X_train,X_test,y_train,y_test=train_test_split(digits.data,digits.target)
# train the model
clf=GaussianNB()
clf.fit(X_train,y_train)
# use the model to predict the labels of the test data
predicted=clf.predict(X_test)
expected=y_test
# Plot the prediction
fig=plt.figure(figsize=(6,6))# figure size in inches
fig.subplots_adjust(left=0,right=1,bottom=0,top=1,hspace=0.05,wspace=0.05)
# plot the digits: each image is 8x8 pixels
foriinrange(64):
ax=fig.add_subplot(8,8,i+1,xticks=[],yticks=[])
ax.imshow(X_test.reshape(-1,8,8)[i],cmap="binary",interpolation="nearest")
# label the image with the target value
ifpredicted[i]==expected[i]:
ax.text(0,7,str(predicted[i]),color="green")
else:
ax.text(0,7,str(predicted[i]),color="red")

Quantify the performance¶

First print the number of correct matches

matches=predicted==expected
print(matches.sum())

The total number of data points

print(len(matches))

And now, the ration of correct predictions

matches.sum()/float(len(matches))

np.float64(0.8777777777777778)

Print the classification report

fromsklearnimportmetrics
print(metrics.classification_report(expected,predicted))

              precision    recall  f1-score   support
           0       0.97      0.95      0.96        37
           1       0.83      0.85      0.84        41
           2       0.89      0.84      0.86        49
           3       0.93      0.83      0.88        47
           4       0.93      0.90      0.92        42
           5       0.89      0.95      0.92        42
           6       0.98      0.97      0.97        60
           7       0.81      0.98      0.88        47
           8       0.65      0.87      0.75        39
           9       0.97      0.63      0.76        46
    accuracy                           0.88       450
   macro avg       0.89      0.88      0.87       450
weighted avg       0.89      0.88      0.88       450

Print the confusion matrix

print(metrics.confusion_matrix(expected,predicted))
plt.show()

[[35  0  0  0  1  0  0  1  0  0]
 [ 0 35  0  0  0  0  1  1  4  0]
 [ 0  1 41  0  0  0  0  0  7  0]
 [ 0  0  2 39  0  1  0  2  2  1]
 [ 0  1  0  0 38  0  0  2  1  0]
 [ 0  0  0  0  1 40  0  1  0  0]
 [ 0  0  1  0  1  0 58  0  0  0]
 [ 0  0  0  0  0  1  0 46  0  0]
 [ 0  2  0  1  0  1  0  1 34  0]
 [ 1  3  2  2  0  2  0  3  4 29]]

Total running time of the script: (0 minutes 1.696 seconds)

Gallery generated by Sphinx-Gallery

3.4.8.13. Simple visualization and classification of the digits dataset¶

Plot the data: images of digits¶

Plot a projection on the 2 first principal axis¶

Classify with Gaussian naive Bayes¶

Quantify the performance¶

Table of Contents

Previous topic

Next topic

This Page