VIDEO SUMMARIZATION BY EFFICIENT CLUSTERING OF COMPRESSED CHROMATICITY SIGNATURES

Mark S. Drew and James Au
School of Computing Science, Simon Fraser University,
Vancouver, B.C. Canada V5A 1S6
{mark, ksau}@cs.sfu.ca



    Full text [.pdf]

TABLE OF CONTENTS

Abstract
Results
References


ABSTRACT

Motivated by problems in video summarization caused by change in lighting, we develop a new video frame feature that is more insensitive to such changes than previous methods.  The new low-dimensional feature has the further advantage of more expressively capturing image information and as a result produces a very succinct set of keyframes for any video.  Because we effectively reduce any video to the same lighting conditions, we can produce a universal basis on which to project video frame features.  We set out a new multi-stage hierarchical clustering method that merges clusters based on variance ratio, the ratio of intra- and total variance, and merges only adjacent frames.  A second stage merges clusters incorrectly split in the first round by the greedy hierarchical algorithm, and finally merges non-adjacent clusters to fuse near-repeat shots.  Results are very encouraging.


RESULTS

In this document, we present results and compare our method with another algorithm, designed by Ferman and Tekalp, that is based on average color histogram and intersection histogram. [1]

For the following generated keyframes, "Correct" indicates correct (human-generated) results, "Signatures" indicates our method, and "HistInt" indicates method in [1], with k = 3 and Tc = 3000. Perfect transition detection (human-performed) was performed as a preprocessing step for the HistInt method.

Notice that the Signatures method generates a much more succinct summarization while missing very few keyframes. As well, it requires no transition detection.

Basketball video  (basketball.mpg) - 897 frames [2]

Method #Keyframes Generated Keyframes
Correct 10
Signatures 12
HistInt 132

Note: The HistInt method fails very badly because of sudden lighting changes and a changing background.

Football video  (football.mpg) - 560 frames [3]

Method #Keyframes Generated Keyframes
Correct 5
Signatures 4
HistInt 18

Simpsons video (simpson.mpg) - 2004 frames

Method #Keyframes Generated Keyframes
Correct 20
Signatures 16

Note: Most of the misses here are due to the fact that several scenes have very similar colors.

HistInt 139

Child video  (child.mpg) - 30 frames

Method #Keyframes Generated Keyframes
Correct 1
Signatures 1
HistInt 4

Aba video (aba.mpg) - 175 frames [4] [5]

Method #Keyframes Generated Keyframes
Correct 3
Signatures 3
HistInt 24

 Nitobmov video (nitobmov.mpg) - 580 frames

Method #Keyframes Generated Keyframes
Correct 4
Signatures 6
HistInt 18

Ant video  (ant.mpg) - 64 frames [6]

Method #Keyframes Generated Keyframes
Correct 2
Signatures 1
HistInt 6

REFERENCES

  1. A.M. Ferman and A.M. Tekalp.  Efficient filtering and clustering methods for temporal video segmentation and visual summarization. J. Vis. Commun. & Image Rep., 9:336-351, 1998.
  2. Video obtained from University of Kensas. http://busboy.sped.ukans.edu/campus/movies/basketball/basketball.mpg
  3. Video obtained from University of Kensas. http://busboy.sped.ukans.edu/campus/movies/football/football.mpg
  4. Video obtained from VideoQ. http://ives.ctr.columbia.edu:8888/VideoQ/
    Disclaimer
  5. Video obtained from IBM. http://www.almaden.ibm.com/cs/video
  6. Video obtained from VideoQ. http://ives.ctr.columbia.edu:8888/VideoQ/
    Disclaimer