Automated Feature Extraction with Machine Learning and Image Processing

PD Stefan Bosse

University of Siegen - Dept. Maschinenbau
University of Bremen - Dept. Mathematics and Computer Science

1 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Advanced Object Detectors and Classifiers

How can we find, localize, and classify objects in engineering images?

2 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Advanced Object Detectors and Classifiers

How can we find, localize, and classify objects in engineering images?

What are the challenges and pitfalls?

3 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Advanced Object Detectors and Classifiers

How can we find, localize, and classify objects in engineering images?

What are the challenges and pitfalls?

Which models exist already - are they suitable?

4 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Object Recognition in Images

Classifying the entire image containing one object ⇒ Simple Task (e.g., by using CNN or ANN models)

5 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Object Recognition in Images

Classifying the entire image containing one object ⇒ Simple Task (e.g., by using CNN or ANN models)

Finding (detecting) and classifying of (multiple) objects ⇒ Challenging task!

6 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Object Recognition in Images

Object classifiction output: A discrete class label c ∈ C
Object localization output: A position, i.e., a bounding box b ∈ ℝⁿ, or a centre point p_c(x,y,..)
Object detection output: Probability that there is an (or multiple) object(s)
Object recognition: Class and bounding box
Region proposal: Within a ROI there can be an object (foreground) or the ROI contains no object (background)

7 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Object Recognition in Images

Object classification, localization, detection, and higher-order feature prediction (e.g., geometrical features)

8 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Object Recognition in Images

How to find objects in an image? The search parameter space is very big and high-dimensional!

This is a chicke-egg problem: To find relevant regions we need firstly some understanding (i.e., classificaion) of objects, and than we can estimate the bounding box.
Using generic features like edges, geonmetries (closed polgon paths), color boundaries, or colour similarities can help to find ROIs

9 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Object recognition is a hybrid classification and regression problem!

Different training loss (error) functions must be used to adress different feature output classes

10 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Object Recognition in Images

machinelearningmastery.com/object-recognition-with-deep-learning/ Overview of Object Recognition Computer Vision Tasks: Taxonomy of object recognition

11 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Region Proposal

First we need region proposals identifying Regions-of-Interest (ROI) ⇒ A (typical rectangular) bounding box region with a probability ρ > ρ_thres that this region can contains
1. any interesting object (just being foreground), or
2. containing a specific class object, or inverse
3. containing no object (background).

Many spatially distributed multi-object classifier models consist of two stages: region proposal and classification

12 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Workflow

Given an input tensor of size CxHxW constructed from pixel values of some image ...

Identify content of interest;
Locate the interesting content;
Partition input (i.e. pixels) corresponding to identified content.

A. Brown, 2017, NVIDIA

13 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Image Segmentation

Localization provides commonly dynamic (grid-less) bounding boxes
Segmentation partitions images into static squared or rectangular regions intially unclassified, finally associated with an object class (and background) ⇒ the smallest segment is an image pixel!
Semantic segmentation assigns pixels (or segments) to semantic classes, instance segmentation distinguishes different instances of one class:

A. Brown, 2017, NVIDIA

14 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Region-proposal and Region-based Networks

Region-propsale networks (RPN) deliver a map of bounding boxes and simultaneously predicts object bounding boxes and objectness scores at each position.
Each bounding box is assigned a probability that the region conatins an object feature (or not)
RPN is a fully convolutional network, which is trained in an end-to-end fashion, to produce high quality region proposals for object detection using Fast R-CNN

There is an imbalance in the false positive and false negative prediction rates: Higher FP is okay, higher FN is bad (missing objectes, e.g., smaller)!

15 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Region-proposal and Region-based Networks

Region proposoals (boxes) origante in anchor points

https://towardsdatascience.com/region-proposal-network-a-detailed-view-1305c7875853 Anchor boxes generation

16 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Region-proposal and Region-based Networks

Examples of region proposals originating from fixed spaced anchor points

17 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

R-CNN Model Family

The R-CNN family of methods refers to the R-CNN, which may stand for “Regions with CNN Features” or “Region-Based Convolutional Neural Network,” developed by Ross Girshick, et al.

This includes the techniques:

R-CNN,
Fast R-CNN, and
Faster-RCNN designed and demonstrated for object localization and object recognition.

18 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

R-CNN

The R-CNN model is comprised of three modules; they are:

Module 1: Region Proposal
- Generate and extract category independent region proposals, e.g. candidate bounding boxes.
Module 2: Feature Extractor
- Extract feature from each candidate region, e.g. using a deep convolutional neural network.
Module 3: Classifier
- Classify features as one of the known class, e.g. linear SVM classifier model.

19 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

The anchor boxes were created a priori in best hope, but these are dummy boxes that are different from the actual object of interest.
Also, there might be many boxes which are not having any object in it. So we need to learn whether the given box is foreground or background, at the same time we need to learn the offsets for the foreground boxes to adjust for fitting the objects.
These two tasks are achieved by two convolution layers on the feature map obtained from the backbone network. Those layers are rpn_cls_score and rpn_bbox_pred and the architecture looks like below.

20 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Once these fg/bg scores and offsets are learned using convolution layers, some portions of fg and bg boxes are considered according to confidence scores.
The offsets are applied to those boxes to get the actual ROIs to be processed further.
This post-processing of anchor boxes using offsets is called proposal generation.
- These final proposals are propagated forward through the ROI pooling layer and fc layers.

21 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Image Warping

Scaling of images to a normalized image (tensor/volume) size
In order to extract features from a region of a given image, the region is first converted to make it compatible with the network input. More precisely, irrespective of the candidate region’s aspect ratio or size, all pixels are converted to the required size by warping them in a tight bounding box.
A CNN convolution operation alaways applies a kernel of fixed size to an input matrix of fixed size

22 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

CNN Frameworks

ConvNet

https://cs.stanford.edu/people/karpathy/convnetjs/

Pure JavaScript framework
- Uses 3D volumes (with linear typed arrays)
- No matrix/tensor algebra
Supports:
- Convolutional layer
- Fully connected neural network layer
- Softmax layer
- Pooling layer
- No deconvolution layer (expansion)

23 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

TensorFlow

Native C/C++ implementation
But tensorflow.js in JavaScript
Split into front-end and back-end stages
Different back-ends (CPU, GGPU, ..) supported by the same model architecture and configuration
- But only selected/specific configuration are efficiently processed by a particular GGPU
Uses always matrtix/tensor algebra
Programming entry can be Python (but using native code libraries)
Hard API break from version 1 to 2

24 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Structure model of the TensorFlow

25 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Runtime Platform Model

26 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

A simple way to understand and implement Object Detection from scratch, by pure CNN.

Object parts and fragmentation

Up to here we assumed that each segment classifier can be recognize one entire object.
Often, an object is covered by another object, and the "backgrounnd" object can be detected by different fragments as an output form different segments
Post-processing should try to merge fragments

38 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Encoder-Decoder Networks

Typically symmetric networks consisting of two parts:
- An Encoder that reduces the input volume dimension and size (extracting relevant features from the input data) ⇒ Reduction
- A Decoder that expands the compressed intermediate feature vector to the original input dimension and size (or similar)

39 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Region-proposal CNN (R-CNN)

Khan RCNN object detection system. Input to R-CNN is an RGB image. It then extracts region proposals (Module A), computes features for each proposal using a deep CNN, e.g., AlexNet (Module B), and then classifies each region using class-specific linear SVMs (Module C).

40 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Given an image, the first module (Module A) uses selective search [Uijlings et al., 2013] to generate category-independent region proposals, which represent the set of candidate detections available to the object detector.
The second module (Module B) is a deep CNN (e.g., AlexNet or VGGnet), which is used to extract a fixed-length feature vector from each region.
In both cases (AlexNet or VGGnet), the feature vectors are high-dimensional.
In order to extract features from a region of a given image, the region is first converted to make it compatible with the network input. More precisely, irrespective of the candidate region’s aspect ratio or size, all pixels are converted to the required size by warping them in a tight bounding box.

41 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Next, features are computed by forward propagating a mean-subtracted RGB image through the network and reading off the output values by the last fully connected layer just before the soft-max classifier
After feature extraction, one linear SVM per class is learned, which is the third module(C) of this detection system.
Note that selective search produces many region proposals  Multiple stages must trained independently
- Fine-tune network with softmax classifier (log loss)
- Train post-hoc linear SVMs (hinge loss)
- Train post-hoc bounding-box regressions (least squares)
Training is slow (84h), takes a lot of disk space
R-CNN runtime roughly 47 seconds per image (even using GPUs)

42 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Loss Functions

Multi-task training with 4 loss functions (obj/not obj, ROI bbox, classify, final obj bbox)

Brown, 2017

REgion-proposal and classification networks uses different loss (erro) functions for different features

43 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Quality Assessment and Metrics

Assessing the quality of a classification result is generally well defined
Quality assessment of object localization and segmentation results is more complex
Object localization output is a bounding box
- How to assess overlap between ground truth and computed bounding boxes?
- What about sloppy or loose ground truth bounding boxes?
Segmentation output is polygon-like pixel region
- How to assess overlap of polygon-like ground truth and computed output region?
- What about sloppy or corse ground truth regions?

44 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Ground-truth and intersection over union (IoU)

45 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Intersection over union (IoU)

Each bounding box (i.e. detection) is associated with a confidence (sometimes called rank)
Detections are assigned to ground truth objects and judged to be true/false positives by measuring overlap
To be considered a correct detection (i.e. true positive), the area of overlap a_ovl between predicted bounding box BB_p and the ground truth bounding box BB_gt (training label) must exceed 0.5 according to:

${a}_{{{o}{v}{l}}}=\frac{{{B}{B}_{{p}}∩{B}{B}_{{{g}{t}}}}}{{{B}{B}_{{p}}∪{B}{B}_{{{g}{t}}}}}$

a_ovl is often called intersection over union (IoU)

46 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Brown, 2017 Intersection over unoin illustration

47 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Brown, 2017 Intersection over unoin examples

48 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Common Issues with Algorithms

Compute performance often poor
- Too many region proposals to test and label
- Difficult to scale to larger image size and/or frame rate
- Cascading approaches help but not solve
- Aggressive region proposal suppression leads to accuracy issues
Accuracy problems
- Huge number of candidate regions inflates false-positive rates
- Illumination, occlusion, etc. can confuse test and label process
Not really scale invariant
- Early datasets not very large so limited feature variation
- Now training datasets are many TB – helps but doesn’t solve
- Large variation of feature scale can inflate false-negative rates

49 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Anomaly Detectors

An anomaly detector is trained only with ground-truth base-line examples, i.e., without any of the objects to be detected later
An anomaly detector is basically a binary classificator: There is nothing / There is something.
Commonly, auto-encoder (encoder-decoder) AE architectures are used, and trained in such a way that the network should output the input (image).
The network learns the "background", e.g., micrograph images with a specific texture, but without cracks
It learns to reconstruct the original input data by relevant information, e.g., the texture consists of a cross lien pattern, and only some geometric features are used to recosntruct the pattern (line distance, angle, intensity)

50 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

The output of an AE is then compared with the input, calculating an mean average error (MAE)
If the MAE is over a threshold, something "not normal" was found!

Basic architeture of an anomaly detector

51 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

U-Net

Typical Encoder-Decoder architecture for pixel segmentation and data augmentation!

There is large consent that successful training of deep networks requires many thousand annotated training samples.

U-Net is a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently.
The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization.

52 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Ronneberger et al. showed that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neu- ronal structures in electron microscopic stacks.

53 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Ronneberger U-net architecture (example for 32x32 pixels in the lowest resolution). Each blue box corresponds to a multi-channel feature map. The number of channels is denoted on top of the box. The x-y-size is provided at the lower left edge of the box. White boxes represent copied feature maps. The arrows denote the different operations.

54 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Fast R-CNN

Fast R-CNN combines stages B and C of R-CNN and is trained with multi-task loss (log loss and smooth L1 loss)
A Fast R-CNN network takes as input an image and a set of object proposals
The network first processes the whole image with conv and pooling layers to produce a conv feature map.
The input to Fast R-CNN is an entire image along with object proposals, which are extracted using the selective search algorithm

55 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Fast R-CNN

For each object proposal an region of interest ROI pooling layer extracts associated features from the conv map, i.e., a feature vector of fixed size is then extracted from the feature maps by a Region of Interest (RoI) pooling layer (Module B).

The role of the RoI pooling layer is to convert the features, in a valid RoI, into small feature maps of fixed size (X  Y , e.g., 7  7), using max-pooling.
- X and Y are the layer hyper-parameters.
A RoI itself is a rectangular window that is characterized by a 4-tuple that defines its top-left corner (a, b) and its height and width (x,y).

56 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Fast R-CNN

Fast R-CNN is still using an independent region proposal stage.
Each feature vector is then given as input to fully connected neural layers, which branch into two sibling output layers.
- One of these sibling layers (Module C) gives estimates of the soft-max probability over object classes and a background class.
- The other layer (Module D) produces four values, which redefine bounding box positions, for each of the object classes.

57 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Fast R-CNN architecture and data flow

58 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Faster R-CNN

Faster R-CNN revision will introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network.
The RPN simultaneously predicts object bounds and objectness scores at each position.
The RPN is trained end-to-end to generate high-quality region proposals which are used by Fast R-CNN
The RPN and Fast R-CNN are merged into a single network by sharing their convolutional feature. Using “attention” mechanisms, the RPN component tells unified network where to look.
Faster R-CNN generates about 300 proposals per image

59 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Faster R-CNN: Combining RPN with fast R-CNN object detector

60 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Semantic Segmentation

Typical classification networks take fixed-size images as input and produce non-spatial output maps, that are fed to a soft-max layer to perform classification.
The spatial information is lost, because these networks use fixed dimension fully connected layers in their architecture.

61 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Pixel Classifier

Pixel classifier: Central pixel classification by neighbourood (high spatial accuracy)
Segment classifier: Segment classification by entire segment pixels (medium spatial accuracy)
Fully convolutional networks can take (1) input images of any size and produce (2) spatial output maps. These two aspects make the fully convolutional models a natural choice for semantic segmentation.

62 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Pure pixel classifiers are computational intesive. Each output pixel requires the computation of a CNN with the input segment sub-image.

Difference between static segment classification and pixel classification using a moving and sliding window

63 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Fully Convoultional Network (FCN)

Khan Pixel classifier with convoultional layers only

64 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Segmentation Examples

Brown, 2017 Ciresan - Neuronal membrane segmentation

65 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Brown, 2017 Different levels of object detection and segmentation

66 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Pre-trained Object Detectors

coco-ssd

COCO: Common Objects in Context, SSD: Single Shot Detector

yolo

Yolo: "You Only Look Once" system, an open-source method of object detection that can recognize objects in images and videos swiftly whereas SSD runs a convolutional network on input image only one time and computes a feature map.

YOLO can struggle to localize objects properly, but has fewer background errors (fp)

67 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Model-driven Segmentation

Up to here we only considered data-driven model-less region proposal networks
But in measuring technologies and measuring data we have mostly an idea about geometric features of ROIs
Edge detection, e.g., combined with point clustering methods can be used to propose ROIs very fast and accurately

68 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

(Top) Region proposal by search and data-driven learning (Bottom) Model-based images transformation and point clustering

69 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Anomaly Detection in Tomography Data

Supervised CNN

Three-dimensional data, e.g., from x-ray ct-scans, can bereduced to a set of lower-dimensional data:
- 2D Image slices indexed along a geometric axis, e.g., the z-axis (depth image stack)
- 1D signals along a geometric axis, e.g., z-slices s_z(x,y)
CNNs can be applied to arbitrary dimensional data, indeed, n-dimensional data is commonly processed as a linear one-dimensional array (linearily packed multi-dimensional data)

70 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Anomaly Detection in Tomography Data

Z-profile signals as 1D images as input for a CNN damage classifier (ND: No damage class, D1: Damage 1, D2: Damage 2, and so on) ⇒ Pixel sgementation detector!

71 / 72

PD Stefan Bosse - AFEML - Module G: Advanced Object Detectors and Classifiers

Anomaly Detection in Tomography Data

(Left) Damage feature maps retrieved from four different CNN classifiers and for the specimen A (training and prediction), B, C, and D) (Right) CT image volume and selected x‐y slice visualization (A‐B) With centred resin defect in the PREG layer)

72 / 72