In this paper, we propose a novel scheme for scalable image coding based on the concept of epitome. An epitome can be seen as a factorized representation of an image. Focusing on spatial scalability, the enhancement layer of the proposed scheme contains only the epitome of the input image. The pixels of the enhancement layer not contained in the epitome are then restored using two approaches inspired from local learning-based super-resolution methods. In the first method, a locally linear embedding model is learned on base layer patches and then applied to the corresponding epitome patches to reconstruct the enhancement layer. The second approach learns linear mappings between pairs of co-located base layer and epitome patches. Experiments have shown that the significant improvement of the rate-distortion performances can be achieved compared with the Scalable extension of HEVC (SHVC).
The concept of epitome was first introduced by Jojic et al. as the condensed representation (meaning its size is only a fraction of the original size) of an image containing the essence of its the textural properties. This original epitomic model is based on a patch-based probabilistic approach, and has different applications in segmentation, denoising, recognition, indexing or texture synthesis.
Several epitomic models have been since proposed, such as the factorized representation of Wang et al. dedicated to texture mapping, or its extension designed for image coding purposes by Chérigui et al. The epitome is in this case the union of epitome charts which are pieces of repeatable textures found in the image. The search for self-similar or repeatable texture patterns, based on the KLT or a block matching (BM) algorithm, is known to be memory and time consuming.
In this work, we propose a clustering-based technique to reduce the self-similarities search complexity.
The main steps of the proposed scheme for scalable image coding are depicted in Fig. 1.
In the proposed scheme, the enhancement layer (EL) consists in an epitome of the input image. Consequently, at the decoder side, the EL patches not contained in the epitome are missing, but the corresponding base layer (BL) patches are known. We thus propose to restore the full enhancement layer by taking advantage of the known representative texture patches available in the EL epitome charts. (More explanations on the epitome generation are available [here]() or in the papers listed at the end of this page.)
The epitomes are encoded with a scalable scheme as an enhancement layer. The blocks not belonging to the epitome are directly copied from the decoded base layer, thus their rate-cost is practically non-existent.
The non-epitome part of the enhancement layer is restored using methods derived from local learning-based super-resolution methods, and can be summarized in the three following steps: K-NN search, learning step, and processing step. These steps are shown in Fig. 2. A first method is proposed relying on Locally Linear Embedding, noted E-LLE. A second technique called E-LLM, based on Local Linear Mapping, is also studied.
The experiments are performed on the test images listed in Table I, obtained from the HEVC test sequences. The base layer images are obtained by down sampling the input image with a factor 2x2, using the SHVC down-sampling filter available with the SHM software (ver. 9.0). The BL images are encoded with HEVC, using the HM software (ver. 15.0). We then use the SHM software (ver. 9.0) to encode the corresponding enhancement layers. Both layers are encoded with the following quantization steps: QP = 22, 27, 32, 37.
For each input image, 3 to 4 epitomes of different sizes are genererated, ranging from 30% to 90% of the input image sizes.
We show in Fig. 3 the Bjontegaard rate gains averaged over all sequences depending on the epitome size. The complete results are given in Table 2. We show in Fig. 4 the RD curve of the City image, which behavior is representative of the set of test images. We first show (left) the RD curve for both E-LLE and E-LLM methods with the biggest epitome size (best RD performances). We then show (right) the RD curve for the E-LLE with different epitome sizes.
We show in Fig. 5 the running time of the different methods for each image class (i.e. size) depending on the epitome size. On the left, we show the running time of the epitome generation at the encoder side. On the right, we show the running time of the restoration step at the decoder side. Note that the epitome generation algorithm was implemented in C++ while the restoration methods were implemented in Matlab.
In the table below are listed the exhaustive Bjontegaard rate gains obtained with the proposed methods against SHVC.
|Image||Epitome size (% of input image)||BD rate gains (%)|