Note 4 to 6

Perceptron

w1x1 + w2x2 + ... + c to binrary output (0/1)
1 if wx + b <= 0
0 if wx + b >  0

Rectified Linear Unit (ReLU)

f(x) = max(0, x)

<= 0  -> 0
> 0   -> x

Stride

Stride is number of number every time offset of stepping filter in image

filter is 3*3
image is 5*5

if stride = 1
output is 3*3

if stride = 2
output is 2*2

Convolution Layer

Origin image multi by Feature Detector (filter/kernel)

There are multiple filter, so the output will be also multiple

Then apply by ReLU function

Pool Layer

Mainly is using Max Pooling or Average Pooling

Max Pooling is Kernel with max function
Average Pooling is Kernal with average of all element

Reduce the resolution of feature map can speed up

Unpooling

         1 1 2 2
1 2  ->  1 1 2 2
3 4      3 3 4 4
         3 3 4 4

Max Unpooling

Remember position of max element
And use this position to Unpooling, all other value be 0

https://medium.com/cubo-ai/%E7%89%A9%E9%AB%94%E5%81%B5%E6%B8%AC-object-detection-740096ec4540

R-CNN

Use selective Search to generate Region Proposals
Use CNN to get result for each Region Proposals
Use SVM to classication
Regulate bounding box

Improve version -> Fast R-CNN, Faster R-CNN

Fast R-CNN

Use RoIPooling (Region of Interest Pooling) to merge the Region, and only need to do one times of apply CNN 

Faster R-CNN

Use CNN to do Proposals

YOLO

Divide image into grid
For each grid, output class probability and offset

Non-Maximum Suppression (NMS)

https://chih-sheng-huang821.medium.com/%E6%A9%9F%E5%99%A8-%E6%B7%B1%E5%BA%A6%E5%AD%B8%E7%BF%92-%E7%89%A9%E4%BB%B6%E5%81%B5%E6%B8%AC-non-maximum-suppression-nms-aa70c45adffa

Harris corner detection

https://en.wikipedia.org/wiki/Harris_corner_detector

https://senitco.github.io/2017/06/18/image-feature-harris/

Ix = dI / dx
Iy = dI / dy

E(Δx, Δy) is squared differences (SSD) error with shifting window W by (Δx, Δy)
E(Δx, Δy) = Σ (x, y) in W  [I(x, y) - I(x + Δx, y + Δy)]^2

By Taylor series,
I(x + Δx, y + Δy) = I(x, y) + Ix(x, y)Δx + Iy(x, y)Δy

E(Δx, Δy) = Σ (x, y) in W  [Ix(x, y)Δx + Iy(x, y)Δy]^2
          = A Δx^2 + 2B Δx Δy + C Δy^2
where
A = Σw Ix^2
B = Σw IY^2
C = Σw IxIy

E(Δx, Δy) = [Δx, Δy] M(Δx, Δy) [ Δx ]
                                 Δy
M(x, y) = Σw [ Ix^2  IxIy ]
               IxIy  Iy^2
M(x, y) = Σw [ A B ]
               B C
A = Σw Ix^2
B = Σw IxIy
C = Σw Iy^2

Me = λe 
(M - λI)e = 0

1. Compute the determinant of M - λI
2. Find the roots of polynomial det(M - λI) = 0
3. For each eigenvalue, solve (M - λI)e = 0

R = det(M) - k trace(M)^2
det(M) = λ1λ2 = AC - B^2
trace(M) = λ1 + λ2 = A + C

it is Edge if λ1 >> λ2 or λ2 >> λ1
it is Corner if λ1, λ2 is large, and λ1 ~ λ2
it is Flat if λ1, λ2 are small, and λ1 ~ λ2

R = det(M) - k trace(M)^2   Harris & Stephens (1988)
R = min(λ1, λ2)             Kanade & Tomasi (1994)
R = det(M) / trace(M)       Nobel (1998)

Threshold with value T

Non-maximum suppression with a window (e.g. 5 * 5 window)

Harris detector: Invariance properties

Shift invariant
Rotation invariant
Partially invariant to affine intensity change

Non-invariant to image scale
How to make the corner detection robust to scale change?
– Compute the Harris matrix for each of Gaussian pyramid image using the same window size

Laplacian of Gaussian (LoG)

https://zhuanlan.zhihu.com/p/92143464

LoG(x, y) = - 1 / (pi σ^4) [1 - (x^2 + y^2) / (2σ^2)] e^(- (x^2 + y^2) / (2σ^2))

Difference of Gaussian (DoG)

https://en.wikipedia.org/wiki/Difference_of_Gaussians

Scale Invariant Feature Transform (SIFT)

https://zh.wikipedia.org/wiki/%E5%B0%BA%E5%BA%A6%E4%B8%8D%E8%AE%8A%E7%89%B9%E5%BE%B5%E8%BD%89%E6%8F%9B

https://zhuanlan.zhihu.com/p/261697473

https://zhuanlan.zhihu.com/p/43543527

https://medium.com/data-breach/introduction-to-sift-scale-invariant-feature-transform-65d7f3a72d40

Scale-space peak Selection
-> Gaussian pyramid
-> Difference of Gaussian blurring
-> DoG

Finding keypoints
-> Pixel compare with 8 neighbors, and 9 pixels in next scale level and 9 pixels in previous scales level
-> 8 + 9 + 9 = 26 number of compare

Keypoint Localization
-> Filter the value that < 0.03

Orientation Assignment
-> Take the neighborhood of keypoint depend on scale
-> Compute gradient magnitude and direction of region
-> Orientation histogram with 36 bins cover 360 degrees
-> Take the peak of histogram (any peak above 80% also considered as orientation)

Keypoint descriptor
-> each keypoint has a location, scale, orientation
-> as keypoint as cente r, divide to 16 * 16 window into 4 * 4 grid
-> for each sub block, 8 bin orientation histogram is created
-> 16 * 8 = 128 dimensional descriptor

Keypoint Matching

k-mean clustering

https://chih-sheng-huang821.medium.com/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E9%9B%86%E7%BE%A4%E5%88%86%E6%9E%90-k-means-clustering-e608a7fe1b43

Perceptron​

Rectified Linear Unit (ReLU)​

Stride​

Convolution Layer​

Pool Layer​

Unpooling​

Max Unpooling​

R-CNN​

Fast R-CNN​

Faster R-CNN​

YOLO​

Non-Maximum Suppression (NMS)​

Harris corner detection​

Harris detector: Invariance properties​

Laplacian of Gaussian (LoG)​

Difference of Gaussian (DoG)​

Scale Invariant Feature Transform (SIFT)​

k-mean clustering​

Perceptron

Rectified Linear Unit (ReLU)

Stride

Convolution Layer

Pool Layer

Unpooling

Max Unpooling

R-CNN

Fast R-CNN

Faster R-CNN

YOLO

Non-Maximum Suppression (NMS)

Harris corner detection

Harris detector: Invariance properties

Laplacian of Gaussian (LoG)

Difference of Gaussian (DoG)

Scale Invariant Feature Transform (SIFT)

k-mean clustering