Object Detection Progress

Part I Computer Vision Field Leaders：

http://mp.weixin.qq.com/s?__biz=MzAwMTA3MzM4Nw==&mid=2649439592&idx=1&sn=fdb687300e4a930fdd08c23d8816bbd8&chksm=82c0d4ecb5b75dfae69dd0a219916ab8533da9a6d02d3c7cfbcc5323579c16033bba7407f2b5&scene=21#wechat_redirect

David Marr(1945-1980) :

He laid the groundwork of computer vision. <>

Three major levels of CV: Express(use math to express the problem), Algorithm(to solve the problem), Implement(can be implemented in CPU, DSP or NN)
What is computed in CV: primal sketch, 2 1/2 D sketch, 3D sketch, including texture, 3D vision, motion analysis, surface shape.
CV is the “progress” of learning the image, is not the result. The longer you observe the image, the more information you will get. 视觉是受任务驱动的，而任务是时刻在改变之中。视觉求解不是打一个固定的靶子，而是打一个运动目标。

King-Sun Fu 傅京孫(1930-1985)

Syntactic Pattern Recognition
Bottom-up and Top-down

Ulf Grenander (1923-2016)

Pattern Theory( use math and statistic)

Proposed analysis-by-synthesis (let the model to generate a image then tell the different between the generated image and the real-world image, then you will know whether this model is a great model)

Part II Time Line

1999

Scale Invariant Feature Transform(SIFT)

(improved in 2004)

detector
descriptor

Based on points describe.

Feature Based Descriptor (1995~2010)

1. Shape Context 2002

Used in MNIST.

2. HOG 2005

Describe the whole patch.

3. Spin Image 1997->1999

A descriptor of 3D mesh, used in surface matching.

4. STIP (Space time interest points) 2005; HOF (Histogram of oriented optical flow, 2009); MBH (motion boundary histogram, 2013)

Object Recognition 2005~2010

1. LDA (Latent Dirichlet Allocation) 2003

Unsupervised topic modeling, BoW(bag of visual words) algorithm.

2. SPM (Spatial Pyramid Matching)

Use spatial grid to separate the image into patches, then calculate the BoW histogram, then combine them together, thus those encoded vector descriptor will have spatial information.

3. Image Encoding Method based on Bow 2006~2009

Sparse coding, Fisher vector to improve BoW (use image encoding)

4. PMK (pyramid matching kernel)

5. DPM (deformable parts models) 2010

Deep Learning 2010~2015

Doesn’t need the structure information of the object, multiple layers.

n* (convolution layer + pooling layer) + several fully connected layers

1. OverFeat

Step 1: Use slide window to get multi-scales ROI. Classify each region by CNN.
Step 2: Use regression model to estimate the location of the object. Use bounding box to box the object.
Combine the bounding boxes.