Video Diver: Generic Video Indexing with Diverse Features

Semantic video indexing is critical for practical video retrieval systems and a generic and scalable indexing framework is a must for indexing a large semantic lexicon with over 1000 concepts present. This paper fully explores the idea of incorporating many kinds of diverse features into a single framework, combining them altogether to obtain larger degree of invariance which is absent in any of the component features, and thus achieves genericness and scalability. We scale down the formidable computational expense with a clever design of the classification and fusion schemes. To be specific, ~20 kinds of diverse features are extracted to capture limited yet complementary variance in color, texture and edge with spatial constraints implicitly integrated, and over 100 classifiers are built subsequently and fused to produce a generic detector. The extensive experiments on a total of 310 hours of TRECVID news videos show that the proposed framework yields significantly improved performance over that of the best single feature across a variety of concepts. Moreover, a benchmark comparison demonstrates that this approach is state-of-the-art. Meanwhile, the proposed approach generalizes well over previously unseen programs and stations and scales well to a lexicon of over 300 concepts in the LSCOM ontology.


author = {Wang, Dong and Liu, Xiaobing and Luo, Linjie and Li, Jianmin and Zhang, Bo},
title = {Video Diver: Generic Video Indexing with Diverse Features},
booktitle = {Proceedings of the International Workshop on Workshop on Multimedia
Information Retrieval},
series = {MIR '07},
year = {2007},
pages = {61--70},
numpages = {10},
publisher = {ACM},
address = {New York, NY, USA},