Abstract : Most of the traditional sketch-based image retrieval sys- tems compare sketches and images using morphological features. Since these features belong to two different modal-ities, they are compared either by reducing the image to a sparse sketch like form or by transforming the sketches to a denser image like representation. However, this cross-modal transformation leads to information loss or adds undesirable noise to the system. We propose a method, in which, instead of comparing the two modalities directly, a cross-modal correspondence is established between the images and sketches. Using an extended version of Canonical Correlation Analysis (CCA), the samples are projected onto a lower dimensional subspace, where the images and sketches of the same class are maximally correlated. We test the efficiency of our method on images from Caltech, PASCAL and sketches from TU-BERLIN dataset. Our results show significant improvement in retrieval performance with the cross-modal correspondence.
Abstract : Motion trajectories extracted from certain videos contain sufficient spatio-temporal information which can be effectively used to characterize those videos. But the task of framing text-based queries for such videos in content-based video retrieval systems is very complicated. Sketch based query is an efficient tool to construct motion-based queries but perceptual differences like spatial and temporal variability pose serious challenges to query modelling.
In this work we propose a new method of modelling sketch based queries which attempts to extract the qualitative features of motion by minimizing the perceptual variability. We also develop a multilevel filter for indexing a query, in which the search results are refined at each stage using a cumulative scoring mechanism. Finally, we show the effectiveness of our algorithm on a dataset of real pool videos and a synthetic dataset containing simulated videos having very complex motion trajectories.
Abstract : We aimed to investigate the relationship between modes of learning and audio-spatial working memory. Modes of learning were ma- nipulated by unsupervised with no error feedback and supervised with error feedback. Furthermore, supervised learning was divided into audio and visual error feedback. An experiment was conducted in three phases, which consisted of working memory task before and after various modes of learning. Set size of working memory was varied from 2-8. The pilot result shows a trend that learning enhances the performance of WM task.
Abstract : A hand-drawn sketch is a convenient way to search for an image or a video from a database where examples are unavailable or textual queries are too difficult to articulate. In this thesis, we have tried to propose solutions for some problems in sketch-based multimedia retrieval. In case of image search, the queries could be approximate binary outlines of the actual objects. In case of videos, we consider the case where the user can specify the motion trajectory using a sketch, which is provided as a query.
However there are multiple problems associated with this paradigm. Firstly, different users sketch the same query differently according to their own perception of reality. Secondly, sketches are sparse and abstract representations of images and the two modalities can not be compared directly. Thirdly, compared to images, datasets of sketches are rare. It is very difficult, if not impossible to train a system with sketches of every possible category. The features should be robust enough to retrieve classes that were not a part of training.
In this thesis, the work can be broadly divided into three parts. First, we develop a motion-trajectory based video retrieval strategy and propose a representation for sketches that aims to reduce the perceptual variability among different users. We also propose a novel retrieval strategy, which combines multiple feature representations for a final result using a cumulative scoring mechanism.
In order to tackle the problem of multiple modalities, we propose a sketch-based image retrieval strategy by mapping the two modalities into a lower dimensional sub-space where they are maximally correlated. We use Cluster Canonical Correlation Analysis (c-CCA), a modified version of standard CCA, for the mapping.
Finally, we investigate the use of semantic features derived from a Convolutional Neural Network, and extend the idea of sketch-based image retrieval to the task of zero-shot learning or unknown class retrieval. We define an objective function for the network such that, while training, a close miss is penalized less than a distant miss. Our training encodes semantic similarity among the different classes. We perform experiments to evaluate our algorithms on well known datasets and our results show that our features perform reasonably well in challenging scenarios.