Prof. Dr. Stefan Bosse
University of Siegen - Dept. Maschinenbau
University of Koblenz - Dept. Computer Science
Stefan Bosse - AFEML - Module E: Image Analysis with CNN -
CNNs are a useful class of models for both supervised and unsupervised learning paradigms.
Stefan Bosse - AFEML - Module E: Image Analysis with CNN -
The CNN learns to map a given image to its corresponding category by detecting a number of abstract feature representations, ranging from simple to more complex ones.
These discriminative features are then used within the network to predict the correct category of an input image.
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Applications of Convolutional Neural Networks
Classification of entire images
Detection of objects (partial segments of an image)
Detection and classification of objects
Regression of a numerical target variable
Anomaly Detection (non-classified)
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Layers of CNN
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Pre-processing
Khan, A Guide to CNN for Computer Vision, 218
^x0=^x−¯¯¯x,¯¯¯x=1NN∑i=1xi
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Pre-processing
^xn=x0√∑Ni=1(xi−¯¯¯x)2N−1
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Pre-processing
This covariance matrix is then decomposed via the Singular Value Decomposition (SVD) algorithm and the data is decorrelated by projecting it onto the eigenvectors found via SVD.
Afterward, each dimension is divided by its corresponding eigenvalue to normalize all the respective dimensions in the data space.
Xuli Rao et al., 2021
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Pre-processing
Sumeet Saurav et al, 2021
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
In contrast to kernel-based filtering operations using commonly 3 × 3 two-dimensional filters, convolution can be performed here with any kernel size and dimension.
In contrast to kernel-based filtering operations, the kernel parameters (weights) are not pre-determined. They are evolved during the ML training process.
Convolution with N filters applied to one input image (stride: shift of filter position in each dimension)
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
The filter is slided onto the input feature map to compute the corresponding value in the output feature map. The 2 × 2 filter (shown in green) is multiplied with the same sized region (shown in orange) within a 4 × 4 input feature map and the resulting values are summed up to obtain a corresponding entry (shown in blue) in the output feature map at each convolution step. Filter Image Filter
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
Convolution layer with a zero padding of 1 and a stride of 2
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
For a filter with size f × f pixels, an input feature map with size h × w pixels, a stride length s, and zero-padding of p, the output feature dimensions are given by:
ho=⌊h−f+s+ps⌋,wo=⌊w−f+s+ps⌋
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
The padding convolutions are usually categorized into three types based on the involve- ment of zero-padding.
Valid Convolution is the simplest case where no zero-padding is involved. The filter always stays within “valid” positions (i.e., no zero-padded values) in the input feature map and the output size is reduced by f - 1 along the height and the width.
Same Convolution ensures that the output and input feature maps have equal (the “same”) sizes. To achieve this, inputs are zero-padded appropriately. For example, for a stride of 1, the padding is given by p=└f/2┘. This is why it is also called “half ” convolution.
Full Convolution applies the maximum possible padding to the input feature maps before convolution. The maximum possible padding is the one where at least one valid input value is involved in all convolution cases. Therefore, it is equivalent to padding f - 1 zeros for a filter size f so that at the extreme corners at least one valid value will be included in the convolutions.
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
Instead of defining convolutional filters that are equal to the spatial size of the inputs, we define them to be of a significantly smaller size compared to the input images (e.g., in practice 3 × 3, 5 × 5, and 7 × 7 filters are used to process images with sizes such as 110 × 110, 224 × 224, and even larger).
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
This design provides two key benefits: (a) the number of learn-able parameters are greatly reduced when smaller sized kernels are used; and (b) small-sized filters ensure that distinctive patterns are learned from the local regions corresponding to, e.g., different object parts in an image.
The size (height and width) of the filter which defines the spatial extent of a region, which a filter can modify at each convolution step, is called the “receptive field” of the filter.
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
In order to enable very deep models with a relatively reduced number of parameters, a successful strategy is to stack many convolution layers with small receptive field.
Dilated convolution is an approach which extends the receptive field size, without increasing the number of parameters.
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
Convolution with a dilated filter where the dilation factor is d = 2
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
For a filter with size f × f pixels, an input feature map with size h × w pixels, a stride length s, zero-padding of p, and dilation d, the output feature dimensions are given by:
ho=⌊(h−f−d−1f−1+s+2p)⌋s),wo=⌊(w−f−d−1f−1+s+2p)⌋s)
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
The effective receptive field with respect to the input image is shown in orange at each convolution layer.
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
use math,plotm = matrix(runif(100*100),100,100)plot(m,auto.scale=TRUE)k = [| 1,0,2; 3,1,-3; 2,0,-1|]m.conv = convolution(m,k,padding=0)print(summary(m.conv))plot(m.conv,auto.scale=TRUE)
Convolution operation in R(+)
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
The weight layers in a CNN (e.g., convolutional and fully connected layers) are often followed by a nonlinear transfer (or a piece-wise linear) function.
The transfer (or activation) function takes a real-valued input and squashes it within a small range such as [0; 1] or [-1; +1].
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
The sigmoid activation function takes in a real number as its input, and outputs a number in the range of [0,1]. It is defined as:
fsigm(x)=11+e−x
The tanh activation function implements the hyperbolic tangent function to squash the input values within the range of [1; 1]. It is represented as follows:
ftanh(x)=x√1+x2
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
The ReLUis a simple activation function which is of a special practical importance because of its quick computation. A ReLU function maps the input to a 0 if it is negative and keeps its value unchanged if it is positive. This can be represented as follows:
frelu(x)=max(0,x)
The noisy version of ReLU adds a sample drawn from a Gaussian distribution with mean zero and a variance which depends on the input value (σ(x)) in the positive input. It can be represented as follows:
fnrelu(x)=max(0,x+ϵ),ϵ∈N(0,σ(x))
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
The rectifier function completely switches off the output if the input is negative. A leaky ReLU function does not reduce the output to a zero value, rather it outputs a down-scaled version of the negative input. This function 8and more general with parameter p) is represented as:
fp-relu(x)={xifx≥0pxifx≤0
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Convolutional Layer
Different transfer/activation functions applied to product-sums (convolutional or neural network layer)
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Pooling Layer
Apooling layer operates on blocks of the input feature map and combines the feature activations. This combination operation is defined by a pooling function such as the average or the max function. Similar to the convolution layer, we need to specify the size of the pooled region and the stride.
Convolution is pooling with a weighted sum (product sum), poolign applies different mapping functions, e.g., a maximum or relu function.
The max pooling operation is commonly used, where the maximum activation is chosen from the selected block of values.
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Pooling Layer
The operation of max-pooling layer when the size of the pooling region is 2 × 2 and the stride is 1.
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - (Fully Connected) Neural network Layer
Fully connected layers correspond essentially to convolution layers with filters of size 1 × 1.
Each unit in a fully connected layer is densely connected to all the units of the previous layer.
In a typical CNN, fully-connected layers are usually placed toward the end of the architecture.
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - (Fully Connected) Neural network Layer
→y=f(^WT→+→b)
with W as the weights matrix and b as the bias vector (offset shift).
v=f(n∑i=1wiui+b)
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - (Fully Connected) Neural network Layer
A Fully-connected Neural Network architecture
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Softmax Layer
σ(→z)i=ezi∑|z|k=1ezk→z=(z1,z2,..,zn),|z|=n
The outputs of the softmax transfer function can be interpreted as the probabilities associated with each class normalized with all other probabilities. Each output will fall between 0 and 1, and the sum of the outputs will equal 1.
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Locality and Invariance
Locality can be defined in terms of adjacency of dimensionality of signals under a special ordering.
There are cases, other than images, where this is also used. For instance, in the case of audio signals the ordering is by time. In the case of images, the ordering is naturally the ordering of pixels in the image itself
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Locality and Invariance
A classifier or regression function applied to images should be indenpendent (invariant) to absolute position, rotation, and scaling.
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Training Classes
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - ROI and Anomaly Detection in Radiography Data
Goal. Detect pores in Aluminum Die casted plates in X-ray radiography data automatically.
System. Industrial X-ray Radiography devices providing different resolutions and X-ray energies, prepared AluDC plates.
Methods and Algorithms. Semantic Pixel Classifier with a simple CNN, DBSCAN pixel clustering, Ellipse Fitting.
Stefan Bosse and Dirk Lehmhus. Automated Detection of hidden Damages and Impurities in Aluminum Die Casting Materials and Fibre-Metal Laminates using Low-quality X-ray Radiography, Synthetic X-ray Data Augmentation by Simulation, and Machine Learning, arXiv:2311.12041 [cs.CV] (2023)
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - ROI and Anomaly Detection in Radiography Data
Pore marking in X-ray images by using a moving window semantic pixel classifier (CNN)
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - ROI and Anomaly Detection in Radiography Data
Examples of pore marking using a moving window semantic pixel classifier (CNN) and synthetic X-ray image data
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - ROI and Anomaly Detection in 3D CT Data
Goal. Detect regions of interest in CT data volumes automatically. A ROI bases on anomaly detection and is a candidate for a damages: Breakage, impurity, delamination, cracks.
System. Micro X-ray CT devices providing different resolutions and X-ray energies, prepared composite plates (e.g., GLARE).
Methods and Algorithms. Edge detection using kernel filters and gradient algorithms, Z-profiling slicing the CT volume along z-axis (depth), anomaly marking by LSTM, CNN, and SOM, threshold discrimination.
Chirag Shah, Stefan Bosse, and Axel von Hehl. Taxonomy of Damage Patterns in Composite Materials, Measuring Signals, and Methods for Automated Damage Diagnostics, Materials 15 (MDPI), no. 13 (2022): 4645
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - ROI and Anomaly Detection in 3D CT Data
Z-profile signals as 1D images as input for a CNN damage classifier (ND: No damage class, D1: Damage 1, D2: Damage 2, and so on)
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - ROI and Anomaly Detection in 3D CT Data
(Left) Damage feature maps retrieved from four different CNN classifiers and for the specimen A (training and prediction), B, C, and D) (Right) CT image volume and selected x‐y slice visualization (A‐B) With centred resin defect in the PREG layer
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - ROI and Anomaly Detection in 3D CT Data
Principle concept of Self-organising Maps (SOM). The neural node set {n} (squares, left side) represents a feature map {f} (circles, right side)
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - ROI and Anomaly Detection in 3D CT Data
SOM feature maps of the z-signal volumes for different specimen and with different SOM network sizes (rows × columns); Specimen A: Sharp resin washout, B: fuzzy resin washout; C: base-line; D: large area delamination
Stefan Bosse - AFEML - Module E: Image Analysis with CNN - Summary
Further depth reading: A Guide to ConvolutionalNeuralNetworks for Computer Vision, Khan et al., 2018
CNN consists of different layers: Stacked convolutional layers, pooling layers, fully-connected neural layers, and softmax layers for classification.
The CNN learns to map a given image to its corresponding category by detecting a number of abstract feature representations, ranging from simple to more complex ones.
These discriminative features are then used within the network to predict the correct category of an input image.