Home > CSC-OpenAccess Library > Manuscript Information
EXPLORE PUBLICATIONS BY COUNTRIES |
EUROPE | |
MIDDLE EAST | |
ASIA | |
AFRICA | |
............................. | |
United States of America | |
United Kingdom | |
Canada | |
Australia | |
Italy | |
France | |
Brazil | |
Germany | |
Malaysia | |
Turkey | |
China | |
Taiwan | |
Japan | |
Saudi Arabia | |
Jordan | |
Egypt | |
United Arab Emirates | |
India | |
Nigeria |
Content Modelling for Human Action Detection via Multidimensional Approach
Lili Nurliyana Abdullah, Fatimah Khalid
Pages - 17 - 30 | Revised - 20-02-2009 | Published - 15-03-2009
Published in International Journal of Image Processing (IJIP)
MORE INFORMATION
KEYWORDS
audiovisual, semantic, multidimensional, multimodal, hidden markov model
ABSTRACT
Video content analysis is an active research domain due to the availability and
the increment of audiovisual data in the digital format. There is a need to
automatically extracting video content for efficient access, understanding,
browsing and retrieval of videos. To obtain the information that is of interest and
to provide better entertainment, tools are needed to help users extract relevant
content and to effectively navigate through the large amount of available video
information. Existing methods do not seem to attempt to model and estimate the
semantic content of the video. Detecting and interpreting human presence,
actions and activities is one of the most valuable functions in this proposed
framework. The general objectives of this research are to analyze and process
the audio-video streams to a robust audiovisual action recognition system by
integrating, structuring and accessing multimodal information via
multidimensional retrieval and extraction model. The proposed technique
characterizes the action scenes by integrating cues obtained from both the audio
and video tracks. Information is combined based on visual features (motion,
edge, and visual characteristics of objects), audio features and video for
recognizing action. This model uses HMM and GMM to provide a framework for
fusing these features and to represent the multidimensional structure of the
framework. The action-related visual cues are obtained by computing the spatiotemporal
dynamic activity from the video shots and by abstracting specific visual
events. Simultaneously, the audio features are analyzed by locating and compute
several sound effects of action events that embedded in the video. Finally, these
audio and visual cues are combined to identify the action scenes. Compared with
using single source of either visual or audio track alone, such combined audiovisual
information provides more reliable performance and allows us to
understand the story content of movies in more detail. To compare the usefulness of the proposed framework, several experiments were conducted and
the results were obtained by using visual features only (77.89% for precision;
72.10% for recall), audio features only (62.52% for precision; 48.93% for recall)
and combined audiovisual (90.35% for precision; 90.65% for recall).
1 | Lin, Y. W., Li, G. L., Chen, M. J., Yeh, C. H., & Huang, S. F. (2010). Repeat-Frame Selection Algorithm for Frame Rate Video Transcoding. International Journal of Image Processing (IJIP), 3(6), 341. |
1 | Google Scholar |
2 | ScientificCommons |
3 | Academic Index |
4 | CiteSeerX |
5 | refSeek |
6 | Socol@r |
7 | ResearchGATE |
8 | Bielefeld Academic Search Engine (BASE) |
9 | Scribd |
10 | WorldCat |
11 | SlideShare |
12 | PDFCAST |
13 | PdfSR |
D M Gavrila. “The Visual Analysis of Human Movement: A Survey”, Computer Vision and Image Understanding, vol. 3 no.1, pp.82 - 98, 1999. | |
J. S. Boreczky and L.D. Wilcox, “A Hidden Markov Model Framework for Video Segmentation using Audio and Image Features”, in Proceedings of the International Conference Acoustics, Speech, Signal Processing, pp. 3741 – 3744, 1998. | |
J. Yamato, J. Ohya, and K. Ishii, “Recognizing Human Action in Time-Sequential Images using Hidden Markov Models”. Proceedings of Computer Vision and Pattern Recognition, pp. 379 – 385, 1992. | |
K. Sato, J. K. Aggarwal, “Tracking and Recognizing Two-Person Interactions in Outdoor Image Sequences”. Proceedings of IEEE Workshop on Multi Object Tracking, pp. 87 – 94, 2001. | |
S. F. Chang, W. Chen and H.J. Meng, et al., “A Fully Automated Content-based Video earch Engine Supporting Spatio-temporal Queries”, IEEE Trans. Circuits System Video Technology, vol. 2, pp. 602 -615, 1998. | |
S. Fischer, R. Lienhart, and W. Effelsberg, “Automatic Recognition of Film Genres”, Proceedings of ACM Multimedia, pp. 295 – 304, 1995. | |
S. Hongeng, F. Bremond and R. Nevatia, “Representation and Optimal Recognition of Human Activities”. IEEE Proceedings of Computer Vision and Pattern Recognition, pp. 818 – 825, 2000. | |
S. Seitz and C.R. Dyer, “View MorthiMorphing: Uniquely Predicting Scene Appearance from Basis Images”. Proceedings on Image Understanding Workshop, pp. 881 – 887, 1997. | |
S. W. Smoliar and H. Zhang, “Content-based Video Indexing and Retrieval”. IEEE Multimedia, pp.62 – 72. 1994. | |
W. Niblack, et al., “Query by Images and Video Content: The QBIC System”. Computer, vol. 28 no. 9, pp. 23 – 32, 1995. | |
Dr. Lili Nurliyana Abdullah
- Malaysia
liyana@fsktm.upm.edu.my
Dr. Fatimah Khalid
UPM - Malaysia
|
|
|
|
View all special issues >> | |
|
|