The performance of deployed audio, face, and multi-modal person recognition systems in non-controlled scenarios, is typically lower than systems developed in highly controlled environments. With the aim to facilitate the development of robust audio, face, and multi-modal person recognition systems, the large and realistic multi-modal (audio-visual) VALID database was acquired in a noisy “real world” office scenario with no control on illumination or acoustic noise

The database consists of five recording sessions of 106 subjects over a period of one month. One session is recorded in a studio with controlled lighting and no background noise, the other 4 sessions are recorded in office type scenarios.

Speaker identification experiments using visual speech features extracted from the mouth region were carried out. These are aimed to be a baseline comparison for similar experiments. The performance of the uncontrolled VALID database is compared with that of the controlled XM2VTS database. Example lip region identification accuracies for VALID and XM2VTS are 63.21% and 97.17% respectively. The results highlight the degrading effect of an uncontrolled illumination environment and the importance of this database in providing real world deployment metrics.



This work is part of the VALID biometric project, funded by Enterprise Ireland. Andrea Osl and Rosalyn Moran are thanked for the collection/preparation of the VALID database. Finally, a special thank you is given to all the database participants.

[1] "The Realistic Multi-modal VALID database and Visual Speaker Identification Comparison Experiments," N. A. Fox, B. A. O'Mullane, and R. B. Reilly, to appear in the Proc. of the 5th International Conference on Audio- and Video-Based Biometric Person Authentication (AVBPA-2005), New York, July 20-22, 2005. PDF Paper Link






