Texture analysis is one method of understanding and classifying images. The goal is to quantify relationships between the pixels of a given image. This can be achieved using so called “texture matrices” (described in detail here).
The radiomics package for R provides tools for calculating image texture, and also for calculating first order image features, such as kurtosis, skewness, and mean deviation. We will compare the predictive ability of the first order features to the texture features.
In this post, I will test the efficacy of predictions made using textures features of a sample of images from the Kylberg Texture Dataset V.1.0. The data we will classify contains 40 images from each of the “Canvas”, “Cushion”, “Linseeds”, “Sand”, “Seat”, and “Stone” categories. A representative sample is shown below:
Load Appropriate Packages
Calculate Features
Here we calculate the gray level co-occurrence, gray level run-length, and grey level size-zone matrices using the commands glcm(), glrlm(), and glszm(). We then calculate the features of each matrix using the calc_features() command on each matrix, and the image itself (giving the first order features).
For reference, the file structure is such that each of the different image classes has its own folder, and the filename are structured as “class-a-ID” (e.g. “canvas1-a-p017”).
A Quick Peek at the Data
We can gain an understanding of the data by looking at the first two principle components of the data. By splitting the analysis by feature type (i.e. first order, glcm, glrlm, and glszm), we can get some intuition as to which feature type will best separate the classes:
From the plots it is clear that the data is separable; qualitatively moreso by the texture features than by the first order features.
Simple Random Forest Classification
We now turn our attention to building a classification model. We will make some use of the wonderful caret package to take care of splitting the data into training (one third) and testing (two thirds), and also to take care of some very basic model tuning.
For definitions of the feature suffixes, see here.
Cross validation accuracy is extremely high, hovering in the high 90s. From the variable importance plot we can see that, of the top 10 most important variables used in the random forest, 7 are texture features.
Of course, the more important metric is the test set accuracy:
Only 2 were incorrectly classified, for accruacy of 98.7%!
Conclusions
Texture analysis features make excellent input into models for classifying images. Furthermore, the radiomics package makes it simple to calculate them, and get results quickly!
A reproducible document (provided you download the data, and set the correct path, can be found here.