Mark S. Drew and James Au
School of Computing Science, Simon Fraser University,
Vancouver, B.C. Canada V5A 1S6
Full text [.pdf]
Motivated by problems in video summarization caused by change in lighting, we develop a new video frame feature that is more insensitive to such changes than previous methods. The new low-dimensional feature has the further advantage of more expressively capturing image information and as a result produces a very succinct set of keyframes for any video. Because we effectively reduce any video to the same lighting conditions, we can produce a universal basis on which to project video frame features. We set out a new multi-stage hierarchical clustering method that merges clusters based on variance ratio, the ratio of intra- and total variance, and merges only adjacent frames. A second stage merges clusters incorrectly split in the first round by the greedy hierarchical algorithm, and finally merges non-adjacent clusters to fuse near-repeat shots. Results are very encouraging.
In this document, we present results and compare our method with another algorithm, designed by Ferman and Tekalp, that is based on average color histogram and intersection histogram. 
For the following generated keyframes, "Correct" indicates correct (human-generated) results, "Signatures" indicates our method, and "HistInt" indicates method in , with k = 3 and Tc = 3000. Perfect transition detection (human-performed) was performed as a preprocessing step for the HistInt method.
Notice that the Signatures method generates a much more succinct summarization while missing very few keyframes. As well, it requires no transition detection.
Basketball video (basketball.mpg) - 897 frames 
Note: The HistInt method fails very badly because of sudden lighting changes and a changing background.
Football video (football.mpg) - 560 frames 
Simpsons video (simpson.mpg) - 2004 frames
Note: Most of the misses here are due to the fact that several scenes have very similar colors.
Child video (child.mpg) - 30 frames
Aba video (aba.mpg) - 175 frames  
Nitobmov video (nitobmov.mpg) - 580 frames
Ant video (ant.mpg) - 64 frames