Periodic Report Summary - OBJECTGUIDEDACTIONS (Recognizing Object Guided Human Actions)
Project context and objectives
In the second period our research continues to develop methods for consistent image segmentation and for capturing context in human motion analysis. Furthermore, we have made progress in our methods for fast search in large databases of features and compact representations for image features. We elaborate next.
Consistent image segmentation
We have proposed a probabilistic framework for carrying out segmentation and recognition simultaneously. The framework combines an LDA (Linear discriminant analysis) model for recognition, and a hybrid parametric-nonparametric model for segmentation. If applied to a collection of images, our framework can simultaneously discover the segments of each image, and the correspondence between such segments, that may be thought of as the 'parts' of corresponding objects. The model may be used for learning new categories, detecting / classifying objects, and segmenting images. This work was published in:
- M. Andreetto, L. Zelnik-Manor and P. Perona, 'Unsupervised Learning of Categorical Segments in Image Collections', accepted for publication in PAMI, 2012.
Human motion analysis and incorporating context
In the second period, we have worked on advanced methods for human action recognition. For human action recognition we adopt the popular bag-of-words approach, where the underlying assumption is that every video clip can be viewed as an unordered collection of 'words'. These words are typically features capturing local appearance and motion patterns of pixels in the video. Interactions with objects typically consist of an ordered set of atomic motions. To extend the applicability of bag-of-words methods to recognition of object guided actions, we further wish to incorporate the temporal order into the model. We have developed a model which combines the underlying ideas of bag-of-words models with temporal context. Our model captures the temporal order of sub-actions in multiple temporal scales. Our experiments show this leads to improved action recognition results. This work was published in:
- T. Glaser and L. Zelnik-Manor, 'Incorporating Temporal Context in Bag-of-Words Models', The Third IEEE Workshop on Video Event Categorization, Tagging and Retrieval for Real-World Applications, 2011.
Feature databases
Subspaces offer convenient means of representing information in many pattern recognition, and machine vision applications. The problem of efficiently searching through large subspace databases is becoming important. Hence, we have presented a general solution to the Approximate Nearest Subspace search problem. Our solution uniformly handles cases where both query and database elements may differ in dimensionality, where the database contains subspaces of different dimensions, and where the queries themselves may be subspaces. We have presented a simple mapping from subspaces to points, thus reducing the problem to the well-studied approximate-nearest-neighbour problem on points. Our tests indicate that an approximate nearest subspace can be located significantly faster than the nearest subspace, with little loss of accuracy.
- R. Basri, T. Hassner and L. Zelnik-Manor, 'Approximate Nearest Subspace Search', IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Vol. 33, No. 2, pp. 266 - 278, 2011.
In addition, we have further developed novel methods for learning a dictionary that leads to sparse representations of signals which are known to reside on a union of subspaces. Two separate algorithms have been proposed. One that trains a sparsifying dictionary and another that learns a sensing matrix for obtaining compact sparse representations. This work was published in:
- K. Rosenblum, L. Zelnik-Manor and Y. Eldar, 'Dictionary Optimization for Block-Sparse Representations', AAAI Fall 2010 Symposium on Manifold Learning;
- L. Zelnik-Manor, K. Rosenblum and Y. C. Eldar, 'Sensing Matrix Optimization for Block-Sparse Decoding', IEEE Transactions on Signal Processing, Vol. 59, No. 9, 4300 - 4312, September 2011.
In the second period our research continues to develop methods for consistent image segmentation and for capturing context in human motion analysis. Furthermore, we have made progress in our methods for fast search in large databases of features and compact representations for image features. We elaborate next.
Consistent image segmentation
We have proposed a probabilistic framework for carrying out segmentation and recognition simultaneously. The framework combines an LDA (Linear discriminant analysis) model for recognition, and a hybrid parametric-nonparametric model for segmentation. If applied to a collection of images, our framework can simultaneously discover the segments of each image, and the correspondence between such segments, that may be thought of as the 'parts' of corresponding objects. The model may be used for learning new categories, detecting / classifying objects, and segmenting images. This work was published in:
- M. Andreetto, L. Zelnik-Manor and P. Perona, 'Unsupervised Learning of Categorical Segments in Image Collections', accepted for publication in PAMI, 2012.
Human motion analysis and incorporating context
In the second period, we have worked on advanced methods for human action recognition. For human action recognition we adopt the popular bag-of-words approach, where the underlying assumption is that every video clip can be viewed as an unordered collection of 'words'. These words are typically features capturing local appearance and motion patterns of pixels in the video. Interactions with objects typically consist of an ordered set of atomic motions. To extend the applicability of bag-of-words methods to recognition of object guided actions, we further wish to incorporate the temporal order into the model. We have developed a model which combines the underlying ideas of bag-of-words models with temporal context. Our model captures the temporal order of sub-actions in multiple temporal scales. Our experiments show this leads to improved action recognition results. This work was published in:
- T. Glaser and L. Zelnik-Manor, 'Incorporating Temporal Context in Bag-of-Words Models', The Third IEEE Workshop on Video Event Categorization, Tagging and Retrieval for Real-World Applications, 2011.
Feature databases
Subspaces offer convenient means of representing information in many pattern recognition, and machine vision applications. The problem of efficiently searching through large subspace databases is becoming important. Hence, we have presented a general solution to the Approximate Nearest Subspace search problem. Our solution uniformly handles cases where both query and database elements may differ in dimensionality, where the database contains subspaces of different dimensions, and where the queries themselves may be subspaces. We have presented a simple mapping from subspaces to points, thus reducing the problem to the well-studied approximate-nearest-neighbour problem on points. Our tests indicate that an approximate nearest subspace can be located significantly faster than the nearest subspace, with little loss of accuracy.
- R. Basri, T. Hassner and L. Zelnik-Manor, 'Approximate Nearest Subspace Search', IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Vol. 33, No. 2, pp. 266 - 278, 2011.
In addition, we have further developed novel methods for learning a dictionary that leads to sparse representations of signals which are known to reside on a union of subspaces. Two separate algorithms have been proposed. One that trains a sparsifying dictionary and another that learns a sensing matrix for obtaining compact sparse representations. This work was published in:
- K. Rosenblum, L. Zelnik-Manor and Y. Eldar, 'Dictionary Optimization for Block-Sparse Representations', AAAI Fall 2010 Symposium on Manifold Learning;
- L. Zelnik-Manor, K. Rosenblum and Y. C. Eldar, 'Sensing Matrix Optimization for Block-Sparse Decoding', IEEE Transactions on Signal Processing, Vol. 59, No. 9, 4300 - 4312, September 2011.