This article gives a high level overview for the process of recognizing the object by neural network.
Step 1
To be more clever, we could run this algorithm multiple times with different of weights that each capture different edge cases:
Step 2
Let’s combine our four attempts to guess into one big diagram:
Step 3
Back propagation, using gradient descent algorithm which takes consider of the Cost Function.
Step 4
Convolution (Make the result translation invariant)
1. Break the image into overlapping image tiles
Why we split the image into tiles? Because:
Fully connected NN: 1000 × 1000 × 10^6 = 10^12
Locally connected NN: 10 × 10 × 10^6 = 10^8
2. Feed each image tile into a small neural network
We’ll keep the same neural network weights for every single tile in the same original image. In other words, we are treating every image tile equally(Share the weights). If something interesting appears in any given tile, we’ll mark that tile as interesting.
Why we share the weight?
Convolutional N: 10 x 10 x 100= 10k
3. Save the results from each tile into a new array
Convolution process:
4. Downsampling
Use max pooling method to look at each 2x2 square of the array and keep the biggest number, which means to keep the most interesting bit.
5. Do prediction
We can use that small array as input into another neural network, this final neural network will decide if the image is or isn’t a match:
Tips
When solving problems in the real world, these steps can be combined and stacked as many times as you want! You can have two, three or even ten convolution layers. You can throw in max pooling wherever you want to reduce the size of your data.
The basic idea is to start with a large image and continually boil it down, step-by-step, until you finally have a single result. The more convolution steps you have, the more complicated features your network will be able to learn to recognize.
For example, the first convolution step might learn to recognize sharp edges, the second convolution step might recognize beaks using it’s knowledge of sharp edges, the third step might recognize entire birds using it’s knowledge of beaks, etc.
Reference
https://medium.com/@ageitgey/machine-learning-is-fun-part-2-a26a10b68df3