Note 4 to 6
Perceptron
w1x1 + w2x2 + ... + c to binrary output (0/1)
1 if wx + b <= 0
0 if wx + b > 0
Rectified Linear Unit (ReLU)
f(x) = max(0, x)
<= 0 -> 0
> 0 -> x
Stride
Stride is number of number every time offset of stepping filter in image
filter is 3*3
image is 5*5
if stride = 1
output is 3*3
if stride = 2
output is 2*2
Convolution Layer
Origin image multi by Feature Detector (filter/kernel)
There are multiple filter, so the output will be also multiple
Then apply by ReLU function
Pool Layer
Mainly is using Max Pooling or Average Pooling
Max Pooling is Kernel with max function
Average Pooling is Kernal with average of all element
Reduce the resolution of feature map can speed up
Unpooling
1 1 2 2
1 2 -> 1 1 2 2
3 4 3 3 4 4
3 3 4 4
Max Unpooling
Remember position of max element
And use this position to Unpooling, all other value be 0
https://medium.com/cubo-ai/%E7%89%A9%E9%AB%94%E5%81%B5%E6%B8%AC-object-detection-740096ec4540
R-CNN
1. Use selective Search to generate Region Proposals
2. Use CNN to get result for each Region Proposals
3. Use SVM to classication
4. Regulate bounding box
Improve version -> Fast R-CNN, Faster R-CNN
Fast R-CNN
Use RoIPooling (Region of Interest Pooling) to merge the Region, and only need to do one times of apply CNN
Faster R-CNN
Use CNN to do Proposals
YOLO
Divide image into grid
For each grid, output class probability and offset
Non-Maximum Suppression (NMS)
Harris corner detection
https://en.wikipedia.org/wiki/Harris_corner_detector
https://senitco.github.io/2017/06/18/image-feature-harris/
Ix = dI / dx
Iy = dI / dy
E(Δx, Δy) is squared differences (SSD) error with shifting window W by (Δx, Δy)
E(Δx, Δy) = Σ (x, y) in W [I(x, y) - I(x + Δx, y + Δy)]^2
By Taylor series,
I(x + Δx, y + Δy) = I(x, y) + Ix(x, y)Δx + Iy(x, y)Δy
E(Δx, Δy) = Σ (x, y) in W [Ix(x, y)Δx + Iy(x, y)Δy]^2
= A Δx^2 + 2B Δx Δy + C Δy^2
where
A = Σw Ix^2
B = Σw IY^2
C = Σw IxIy
E(Δx, Δy) = [Δx, Δy] M(Δx, Δy) [ Δx ]
Δy
M(x, y) = Σw [ Ix^2 IxIy ]
IxIy Iy^2
M(x, y) = Σw [ A B ]
B C
A = Σw Ix^2
B = Σw IxIy
C = Σw Iy^2
Me = λe
(M - λI)e = 0
1. Compute the determinant of M - λI
2. Find the roots of polynomial det(M - λI) = 0
3. For each eigenvalue, solve (M - λI)e = 0
R = det(M) - k trace(M)^2
det(M) = λ1λ2 = AC - B^2
trace(M) = λ1 + λ2 = A + C
it is Edge if λ1 >> λ2 or λ2 >> λ1
it is Corner if λ1, λ2 is large, and λ1 ~ λ2
it is Flat if λ1, λ2 are small, and λ1 ~ λ2
R = det(M) - k trace(M)^2 Harris & Stephens (1988)
R = min(λ1, λ2) Kanade & Tomasi (1994)
R = det(M) / trace(M) Nobel (1998)
Threshold with value T
Non-maximum suppression with a window (e.g. 5 * 5 window)
Harris detector: Invariance properties
Shift invariant
Rotation invariant
Partially invariant to affine intensity change
Non-invariant to image scale
How to make the corner detection robust to scale change?
– Compute the Harris matrix for each of Gaussian pyramid image using the same window size
Laplacian of Gaussian (LoG)
https://zhuanlan.zhihu.com/p/92143464
LoG(x, y) = - 1 / (pi σ^4) [1 - (x^2 + y^2) / (2σ^2)] e^(- (x^2 + y^2) / (2σ^2))
Difference of Gaussian (DoG)
https://en.wikipedia.org/wiki/Difference_of_Gaussians
Scale Invariant Feature Transform (SIFT)
https://zhuanlan.zhihu.com/p/261697473
https://zhuanlan.zhihu.com/p/43543527
https://medium.com/data-breach/introduction-to-sift-scale-invariant-feature-transform-65d7f3a72d40
Scale-space peak Selection
-> Gaussian pyramid
-> Difference of Gaussian blurring
-> DoG
Finding keypoints
-> Pixel compare with 8 neighbors, and 9 pixels in next scale level and 9 pixels in previous scales level
-> 8 + 9 + 9 = 26 number of compare
Keypoint Localization
-> Filter the value that < 0.03
Orientation Assignment
-> Take the neighborhood of keypoint depend on scale
-> Compute gradient magnitude and direction of region
-> Orientation histogram with 36 bins cover 360 degrees
-> Take the peak of histogram (any peak above 80% also considered as orientation)
Keypoint descriptor
-> each keypoint has a location, scale, orientation
-> as keypoint as cente r, divide to 16 * 16 window into 4 * 4 grid
-> for each sub block, 8 bin orientation histogram is created
-> 16 * 8 = 128 dimensional descriptor
Keypoint Matching