Goal: find
Main challenge: What is a good similarity or distance measure between two patches?
|
![]() |
Goal: find
in image
Method 0: filter the image with eye patch
\[h[m,n] = \sum_{k,l} g[k,l] f[m+k,n+l]\]
\(f\) is image, \(g\) is filter
Goal: find
in image
Method 1: filter the image with zero-mean eye
\[h[m,n] = \sum_{k,l} (g[k,l])-\overline{g}) f[m+k,n+l]\]
\(\overline{g}\) is mean of \(g\)
Goal: find
in image
Method 2: SSD
\[h[m,n] = \sum_{k,l} (g[k,l])-f[m+k,n+l])^2\]
\(\overline{f}\) is mean of \(f\)
Goal: find
in image
Method 2: SSD
\[h[m,n] = \sum_{k,l} (g[k,l])-f[m+k,n+l])^2\]
What is the potential downside?
Goal: find
in image
Method 3: Normalized cross-correlation
\[h[m,n] = \frac{\sum_{k,l} (g[k,l])-\overline{g})(f[m-k,n-l]-\overline{f}_{m,n})}{\left( \sum_{k,l}(g[k,l] - \overline{g})^2 \sum_{k,l}(f[m-k,n-l]-\overline{f}_{m,n})^2 \right)^{0.5}}\]
MATLAB: normxcorr2(template, im)
Goal: find
in image
Method 3: Normalized cross-correlation
Goal: find
in image
Method 3: Normalized cross-correlation
a. depends
a. Use image pyramid to find ![]() |
![]() |
input: image, template
\[h_\sigma(u,v) = \frac{1}{2 \pi \sigma^2} e^{-\frac{u^2+v^2}{2 \sigma^2}},\quad \frac{\partial}{\partial x} h_\sigma(u,v),\quad \nabla^2 h_\sigma(u,v)\]
\(\nabla^2\) is the Laplacian operator:
\[\nabla^2 f = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2}\]
\(-\)
\(\approx\)
Can we reconstruct the original from the Laplacian pyramid?
high frequency -> low frequency
Pixels
Fourier transform
Pyramid/filter banks
Why is this faster?
Are we guaranteed to get the same result?
How is it that a 4MP image can be compressed to a few hundred KB without a noticeable change?
4MP = 4 million pixels = 2000 pixels x 2000 pixels
If storing 8bits/channel (RGB), each pixel uses 24bits = 3bytes.
\[4\text{MP} * 3\text{B/P} = 12,000,000\text{B} \approx 11,718\text{KB} \approx 11.5\text{MB} \]
Block-based Discrete Cosine Transform (DCT)
B(0,0)
is the DC component, the average intensity“Left: a final image is built up from a series of basis functions. Right: each of the DCT basis functions that comprise the image, and the corresponding weighting coefficient. Middle: the basis function, after multiplication by the coefficient: this component is added to the final image. For clarity, the 8x8 macroblock in this example is magnified by 10x using bilinear interpolation.
”