This article has been accepted for publication in a future issue of this journal, but has not beenfully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE                                                                                               Transactions on Image Processing     IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. X, NO. XX, MONTH YEAR                                                                                                                                   1     Robust ImageGraph: Rank-Level Feature Fusion for                              Image Search                                  Ziqiong Liu, Shengjin Wang, Member, IEEE, Liang Zheng, and Qi Tian, Fellow, IEEE     Abstract—Recently, feature fusion has demonstrated its effec-             tiveness in image search. However, bad features and inappropri-             ate parameters usually bring about false positive images, i.e.,             outliers, leading to inferior performance. Therefore, a major             challenge of fusion scheme is how to be robust to outliers.             Towards this goal, this paper proposes a rank-level framework             for robust feature fusion. First, we define Rank Distance to             measure the relevance of images at rank level. Based on it,             Bayes similarity is introduced to evaluate retrieval quality of             individual features, through which true matches tend to obtain             higher weight than outliers. Then, we construct the directed             ImageGraph to encode the relationship of images. Each image is             connected to its K nearest neighbors with an edge, and the edge             is weighted by Bayes similarity. Multiple rank lists resulted from                                                                                                                  Fig. 1. Examples of good feature and bad feature. For each query, the             different methods are merged via ImageGraph. Furthermore, on                                         top-5 ranked images in the search result of good feature (the first row) and             the fused ImageGraph, local ranking is performed to re-order                                         bad feature (the second row) are demonstrated. Relevant images are marked             the initial rank lists. It aims at local optimization, and thus is                                   with green dot, and irrelevant ones red. The good features work well in that             more robust to global outliers. Extensive experiments on four                                        true match images are retrieved, but bad features rank outliers ahead of true             benchmark datesets validate the effectiveness of our method.                                         matches.             Besides, the proposed method outperforms two popular fusion             schemes, and the results are competitive to the state-of-the-art.                 Index Terms—Image search, feature fusion, ImageGraph.                                            adopted feature is a good feature and also complementary to                                                                                                                  existing ones, a higher performance is expected. Nevertheless,                                              I. I NTRODUCTION                                                    many irrelevant images have high ranks due to the low                                                                                                                  discriminability of bad features. If the to-be-fused feature is                This paper considers the task of content-based image search.                                                                                                                  a bad feature, the fusion performance may not be guaranteed,             Given a query image, our goal is to retrieve all the appearance                                                                                                                  and accuracy may get even lower after fusion. In essence,             similar images in a database. Recently, multiple features are                                                                                                                  failure in predicting features’ effectiveness results in undesir-             employed to boost the overall performance. To take advan-                                                                                                                  able search quality [16]. Multiple cues are directly integrated             tage of complementary properties of distinct features, various                                                                                                                  without considering their effectiveness in [11, 14, 29, 30].             fusion methods are investigated, ranging from straightforward                                                                                                                  Once outliers are introduced by bad features, it is difficult to             combination at feature level [30] to integration at indexing                                                                                                                  filter them out. To evaluate the retrieval quality of individual             level [11, 14, 29] and merging graphs of different rank results                                                                                                                  method, consensus degree among the top candidates, i.e.,             [15, 17]. It is demonstrated that fusion of multiple features                                                                                                                  Jaccard similarity, is utilized [17] at rank level. However,             has been pushing the state-of-the-art forward. However, false                                                                                                                  when a bad feature is adopted, outliers may be included             positive images, i.e., outliers, are inevitably introduced in the                                                                                                                  into the graph. Usually there are many edges linked between             fusion, leading to inferior accuracy.                                                                                                                  the outliers, called “Tightly-Knit Community Effect”. In this                On one hand, outliers are often brought in by bad features.                                                                                                                  scenario, outliers may obtain higher consensus degree among             For a specific query, a good feature means its search accuracy                                                                                                                  neighbors than true matched ones, yielding unsatisfactory             is high by itself. By comparison, the feature yielding low                                                                                                                  performance.             search quality is called bad feature (see Fig. 1). When the                                                                                                                      On the other hand, inappropriate parameter also introduces               Copyright (c) 2010 IEEE. Personal use of this material is permitted.                               outliers. In [17, 29], K-reciprocal nearest images are treated             However, permission to use this material for any other purposes must be                              as pseudo positive instances, and thus K shall be equal to the             obtained from the IEEE by sending a request to 
[email protected].               Z. Liu and S. Wang are with State Key Laboratory of Intelligent Technology                         number of ground truths. However, it is hard to pre-define K             and Systems, Tsinghua National Laboratory for Information Science and Tech-                          because database images commonly have various numbers of             nology, Department of Electronic Engineering, Tsinghua University, Beijing                           ground truths. If K is inappropriate, the performance may be             100084, China (E-mail: 
[email protected], 
[email protected]).               Liang Zheng is with the Centre for Quantum Computation and Intelligent                             affected, especially in [17]. The retrieval quality measurement,             Systems, University of Technology Sydney, Ultimo, NSW 2007, Australia.                               Jaccard similarity, always varies with K, and gradually loses             (E-mail: 
[email protected]).                                                                    its effectiveness when K gets larger than ground truths number.               Q. Tian is with the University of Texas at San Antonio, 78256, USA (E-             mail: qitian@ cs.utsa.edu).                                                                              Therefore, choosing an effective measurement to evaluate               Corresponding authors: Shengjin Wang and Qi Tian.                                                  retrieval quality of individual features is the key issue in robust   1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.  and give more detailed discussions.ieee. VOL. A toy example of our fusion system is             is used as query to search for the other. we introduce the Bayes                                               •   We propose the directed ImageGraph structure to encode             similarity. a good                                       insensitive to parameter changes. the local densities of                                                 proposed to evaluate the retrieval quality of individual             vectors. Through this mea. Our method uses the top-K ranked                                              and reformulate the edge weight of ImageGraph. 54] that reciprocal neighborhood relationship                                        •   We propose an effective measurement for robust fusion.             of individual features effectively. Extensive experiments on                In light of the above analysis. ImageGraph 1 and ImageGraph 2 are fused by appending new nodes or re-calculating edge weights of existing nodes. The query image is marked with yellow bounding box. and relevant ones green. other two true matches are connected at the second layer of the graph. Different features may produce scores diverse in                                        ageGraph as Bayes similarity.             Since not only top-ranked images in initial search results                                               •   We propose the local ranking to rerank the initial search             but also their neighborhood are included into the graph. so the evaluation scheme should measure                                            relevant/irrelevant images than Jaccard similarity [17]. so that it is more robust to global outliers. so that more candidates (high recall) can be included                                        conduct more experiments to better validate the effectiveness             in the graph. X.             surement. Besides. it is             at rank level. to evaluate the retrieval quality                                               t/irrelevant images and insensitive to parameter changes. but has not been fully edited. which is based on their ranks when each one                                           robust to the outliers. The edge is weighted by Bayes similarity. Given a query image. Beyond the conference paper. but republication/redistribution requires IEEE permission. Built on the Rank Distance. further enhancing the robustness of our method. Besides. Based on the             fused graph. which means there are 1 true match in the top-3 ranked images of initial rank list             of Feature 1. It aims at local optimization. improving the recall. See http://www.org/publications_standards/publications/rights/index. Then. Then. the edge                Our approach adopts the graph-based framework of [17].             denoted as ImageGraph. Although there are many outliers in the graph. It is defined as the posterior probability of two                                                image-level relationships. 2. Besides.e.                                                    The proposed ranking algorithm aims at local optimiza-             true matched images not directly connected to query can be                                                   tion. Through these relevant images.                                                    weight of ImageGraph is measured by Bayes similarity. Bayes similarity is             reciprocal ranks of two images. the corresponding ImageGraph is built. local ranking is conducted and the images are reranked. thus being comparable. Content may change prior to final publication.This article has been accepted for publication in a future issue of this journal. In contrast with [17]. Consequently. In ImageGraph 2. and thus more candidates can be included             estimate the Bayes similarity through empirical study.                                        Rank Distance and Bayes similarity for robust evaluation.2660244.2017. ImageGraph builds on K near-             images being true match. the undirected graph proposed in [17]                                         This paper is an extension of our previous conference             builds on K-reciprocal neighbors that may result in low search                                       publication [51]. it is more reliable to represent the relevance of images                                            features. Since Rank Distance considers the                                                 of images on rank level. we propose             recall. 2. Toy example of the proposed method. we define the edge weight of Im. local ranking is proposed to             assigning higher weight to relevant images under good features                                       re-order the initial result.                                                       in the graph.                                       is more robust to global outliers. this paper first proposes                                         four image retrieval datasets confirm that the proposed method             the Rank Distance to measure the relevance of two images                                             significantly improves baseline performance. we construct a directed graph. NO. and thus             and lower weight to highly-ranked outliers under bad features. i. MONTH YEAR                 Fig. two             features are used to obtain search results. We also             images. XX.html for more information. we                                                      est neighbors. The rest   1057-7149 (c) 2016 IEEE.             retrieved. Citation information: DOI 10.                                         fected by the outliers in reranking. it is                                          follows:             illustrated in [12. and is             the importance of images in the unified scale. a better discriminator between             numerical values. Besides.             similarity can be propagated through graph. In the ImageGraph. Personal use is permitted. Nevertheless. reflecting the retrieval quality. Based on it. for each feature. we observe that only 1 relevant image is directly connected by query. to avoid being af-             evaluation should measure features’ effectiveness correctly. similarity scores of different features are mapped                                            The main contributions of this paper are summarized as             to the unified scale.             is a stronger indicator of similarity than unidirectional nearest                                            Rank Distance is first introduced to measure the relevance             neighborhood relationship.                                         of our method. Moreover. each vertex points to its 3             nearest neighbors and the graph is expanded to the second layer. In ImageGraph             1.1109/TIP. query points to two             relevant images directly. IEEE                                                                                               Transactions on Image Processing                 2                                                                                                 IEEE TRANSACTIONS ON IMAGE PROCESSING. the                                                 result. all the true match images are retrieved.                fusion task. which is a good discriminator between relevan-             than similarity score. . In addition.                                         illustrated in Fig..                                                                                                                   signature is embedded in the inverted index to filter out false             Additionally.             reranking based on complementary cues [15–17]. Then. Instead. In [28]. which are usually combined with                                                                                                                  a multi-IDF scheme is introduced in [11].html for more information. From             treated as visual words of the codebook.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH                                                                                                                   3                of the paper is organized as follows. Section IV describes the datasets                                         graph. we adopt the directed graph model. 32. outliers can be removed from the graph             salient local regions are detected from an image with operators                                                                                                                  by spectral filtering. we introduce the proposed robust                                         ciently model similarity of Google image search results with             ImageGraph in Section III. attribute vector             to post-processing method. [17] propose a undirected graph-based query specific             and TF-IDF [23. This method improves performance for particular object             [2. approximate k-means (AKM) [22]                                                                                                                  matched features between pairwise images. a graph theoretical             in the last decade. Graph-based Ranking                                                                               the multi-index. integrating                                        indexing algorithm [29] leverages global semantic attributes             both initial ranking and visual consistency between images. The edge weight is measured             the initial results using spatial cues.                                                                                                                  fusion approach. conclusions are given in Section                                      search reranking is also formulated as a random walk problem             VI. The initial rank information is                                               For the late fusion. and edge weight is computed as the count of             clustering method. . a safe strategy is used for ranking. in             automatically using the K nearest neighbors. 4. A directed graph is             codebook. Feature Fusion             improved and the system is able to find quite challenging                                                                                                                     It is indicated that combination of multiple features obtains             occurrences of the query. the                                                                                                                     Graph-based method has also received increased attention             extracted regions are represented as high-dimensional feature                                                                                                                  recently in content-based image search. Specifically. there are also efforts to represent the image using                                                                                                                  feature is compressed into small codes by product quantiza-             their global properties. and cluster centers are                                                                                                                  [50] is employed to rank images using the affinity values. a myriad of methods have been proposed                                                                                                                  to the web image search. [14] propose a multi-dimensional inverted index. through which multiple retrieval sets are                It is verified in many works that post-processing can further                                                                                                                  merged. The edge between videos is weighted                                                                                                                  by the linear combination of text score and visual duplicated                                             II.             et al. a graph based semi-supervised learning [47] is applied                In image search. Then.             with relatively smaller bits. In addition. R ELATED W ORK                                                   score. Each descriptor                                                                                                                  the ImageWeb to discover the nature of image relationships             is quantized to its nearest visual word in the pre-trained                                                                                                                  for refining similar image search result. and re-             and baselines used in the experiments.2017. Another promising strategy             Such holistic features demonstrate their advantages in image                                                                                                                  performs feature fusion on indexing level. After a brief review of                                        Baluja [5] have proposed a VisualRank framework to effi-             related work in Section II.                                                                                                                  retrieval as well as categories. a weakly supervised multi-             the recall. retrieval process votes for images in both                Graph-based visual reranking has been proven effective to                                         SIFT and other feature spaces. Bag-of-Words model [23]                                                                                                                  framework amenable to noise resistant ranking is proposed in             based on local descriptor is the most popular one. i. 7] and deep learning features [31. 34.                                                                                                                  and fisher vector are combined at feature level. Jaccard             query expansion [41] uses highly ranked images to learn a                                                                                                                  similarity.                                                                                                                  with the consistency of their neighborhoods.                                          to update inverted indexes of local features. They also serve as good complements to local ones. 27] Besides. visual attributes                                                                                                                  tion. A number of                                                                                                                  [45]. improving                                                                                                                  method. The fused                Besides. A quite few works refine                                                                                                                  neighbor relation are connected. Recent study of reranking adopts image-level cues. 33] weights.                                                                                                                  Zheng et al.1109/TIP. Jing and                                             based query specific fusion approach at rank level.                                                                                                                  the graph-based perspective. [42] employ             vectors using SIFT [19] or its variants [21]. color             search. It uses the random walk on an affinity graph. 37]. In this   1057-7149 (c) 2016 IEEE. To some extend. [12] take advantage of K-reciprocal nearest neighbors to                                                                                                                  Specifically. Furthermore. Through combining the rank             lists or scores of multiple features. Zhang             words. which have             shown promising performance. Qin                                                                                                                  which an image is connected to its top-K ranked images.. Among them. through which             dimensionality reduction and approximate nearest search [38–                                                                                                                  different binary features are coupled into the inverted file. In [46]. global features are effective to encode images                                                                                                                  positive SIFT matches. IEEE                                                                                               Transactions on Image Processing                 ZIQIONG LIU et al. In this approach. but republication/redistribution requires IEEE permission. semantic-aware co-             refine text-based video and image search results. encouraging             It constructs a graph where pairs of visually similar images                                         semantic consensus among local similar images. With             B. but has not been fully edited. 57.ieee.g. See http://www. Zhang et al. such as GIST [36. Citation information: DOI 10. The codebook is obtained through unsupervised                                                                                                                  constructed. e. Personal use is permitted. Xie et al. fast search is achieved using inverted file [35]                                                                                                                  et al. 58]. images satisfying the reciprocal             enhance the quality of search results.             40]. [17] propose the graph-             propagated through the graph until convergence. Image Search Pipeline                                                                             with visual feature. K-NN reranking [8] refines the initial rank list                                                                                                                  performance.                                                                                                                  graph learning is proposed in [15] for enhancing the reranking             For example. Here.org/publications_standards/publications/rights/index.                                                                                                                  and each dimension corresponds to one kind of feature.                                                                                                  along the context graph.2660244.This article has been accepted for publication in a future issue of this journal. the recall is significantly                                    C. Subsequently. Then images are re-ordered through link analysis             latent feature model to expand the original query. In this method.e. Section V presents the                                        orders images according to the visual hyperlinks. visual duplicated score is the similarity calculated             A. video             experimental results. Based on this framework. In addition.             such as DoG [19] and Hessian Affine [20]. the edge is weighted by Bayes similarity. Finally. incremental query expansion and             each image is represented as a sparse histogram of visual                                                                                                                  image-feature voting are developed in [43]. such as [25. Content may change prior to final publication. Further.             are connected by an edge.. To handle the errors in the initial labeled                                                                                                                  set. 24. the HITS             and hierarchical k-means (HKM) [18]. To model correlation between features. Through quantization. In [30]. Alternatively. our method belongs                                                                                                                  superior performance in image search. many works conduct the                                                                                                                  to be robust to outliers.             identify the image set.  Moreover.                                                        Nk (Im )         K nearest neighbors of Im . we find its K nearest neighbors                                        is used as query to search the other. especially when retrieval quality is             the fused ImageGraph. NO. Personal use is permitted. X. i. and NK (Im )                                                                                                                  denote the K nearest neighbors of Im .. To be resistent to the noise. the set of edges. which can be defined as:                                                    not imply Im ∈ NK (In )... images being reciprocal K-nearest neighbors are                                                 For method i. Firstly. our method uses the K nearest             neighbors. Moreover. to be robust in the fusion. Rank Distance can be             in the database. D). denoted as ImageGraph.                                                                                                                  similarity in Section III-A and Section III-B.. MONTH YEAR                                                                                                   TABLE I                                                                                        NOTATIONS AND DEFINITIONS                                                         Notation         Definition                                             I=(I1 . Instead. and             level fusion method without supervision. incorporating             intra-graph and inter-graph constraints in a supervised way.. we formulate our                                         N is the number of dataset images. we propose a rank-                                                                                                                  laborate construction of ImageGraph in Section III-C. and Ii indicates the i-th image. The                                                                                                                  throughout the paper in Table I.                                             Gs = (Vs . Di ). Finally.2017. Then.. and thus similarity score is not reliable to represent the             improves the robustness of our method. D represents the relevance among                                                                                     R(Im .                                          (1)      reciprocal neighborhood. To address this issue. we do not require             resulted from M different methods.             work of [17]. w)          Subgraph of ImageGraph G induced by the vertex set Vs ∈ V ..ieee. a more effective measurement to evaluate                                           importance of them..             lows. In ∈ NK (Im ) ∪ Im ∈ NK (In ). Then we e-                Differently. G2 .                                       (2)             is achieved by PageRank or Maximizing Weighted Density.                                                          F (Im )        False match image set of Im .. we                                        reciprocal neighbor relation.e. Content may change prior to final publication. Through a reference codebook                                                                       r = g(G)..                                                               relevance between images. Citation information: DOI 10. . IN )      I indicates the image set. The edge is weighted as Jaccard                                              is constructed based on rank result ri and the pre-computed             similarity for evaluating retrieval quality of individual feature. r2 . E and w indicate the set of vertices.             images is denoted as D. Secondly. In )       Rank Distance between image Im and In . where i = 1.1109/TIP.                method. Since local densities of             problem here.html for more information. but republication/redistribution requires IEEE permission. . G can be written as the combination of multiple             with the parameter K. Im )             the database images.                                                               P         The depth of ImageGraph..                                                                                                                  is a much stronger indicator of two images being relevant than             where R = {r1 . we calculate the distance             take each image in the database as query and get the search                                          of two images based on their ranks obtained when each one             result. thus more candidates can be included. but has not been fully edited. In )         Rank of In in the rank list of Im being query. its ImageGraph Gi             connected with an edge.                         (5)                                                                                                                                                                     2N  1057-7149 (c) 2016 IEEE. we propose                                                                                                                  the Rank Distance to serve as a rank-level measurement. . local ranking is performed on                                        tains false positive images.                                                               K         The breadth of ImageGraph.                                           A. it is difficult to compare or weight the             on Rank Distance. E. we first present Rank Distance and Bayes             the-fly in a query-adaptive manner.. which aims at local optimization and                                           bad.             However.             former [17] builds on K-reciprocal neighbors that may result             in low search recall. and V .. GM :             features. . and reranking                                                                                                                                                      Gi = ψ(ri . Our target is to obtain a new rank according to                                        vectors round Im and In are different. The pre-computed search result of database                                          defined as below. For clarity. respectively. features’ effectiveness are estimated on-                                                                                                                     In this section. In ) + R(In . It is demonstrated in [12. 54] that                                                   r = h(R. In the offline process. I2 . In ∈ NK (Im ) does             multiple search results. IEEE                                                                                               Transactions on Image Processing                 4                                                                                                 IEEE TRANSACTIONS ON IMAGE PROCESSING. We adopt the frame-                                                                                                                  introduce fusion via ImageGraph in Section III-D.                                                                N        Total image number of dataset                                                     R(Im .. IN } denote the image dataset. N .. the new rank list is calculated through ranking on             Furthermore. edge weight                                        Since different features may produce scores diverse in             between pairwise images is defined as Bayes similarity built                                         numerical values. .                         (4)             constructed off-line. M .. See http://www.                                           Finally.             level is proposed in [16]. in contrast with the undirected graph used in [17]. where m = 1. . XX. Es .2660244. a Co-Regularized Multi-                                                                                                                                               G = φ(G1 .                                                                                                    d(Im . VOL. Let                                              III. .                                                                         Es contains every edge between the vertices in Vs                                                       d(Im .                                               G = (V. w)             G indicates a graph.. 2.                                   relevance among the database images Di . I2 . 2. Thirdly.org/publications_standards/publications/rights/index.                                                                         and the corresponding edge weight.. a simple and effective fusion method at score                                           the ImageGraph G. and our work departs from the prior arts as fol-                                                                                                                  ranking algorithm is described in Section III-E.This article has been accepted for publication in a future issue of this journal.                                                                                                                  we illustrate several important notations and their definitions             we construct a directed graph.               Before describing our approach in detail. GM ). O UR M ETHOD                                                   I = {I1 .                                                          T (Im )        True match image set of Im . In this paper. Instead. initial search list usually con-             the retrieval quality.. In ) =                                  . in our approach.                                     (3)             Graph Learning framework [15] is proposed. G2 .... rM } denotes the set of rank lists                                         unidirectional neighborhood. Rank Distance             instead of Jaccard similarity [17]. the effectiveness of this method varies dramatically                                          Specifically. which is written as:             Multiple rank lists are merged through graph. . for each image. it also suffers from bad                                             individual graphs G1 .  Here. It also implies             Bayes similarity between image Im and In can be formulated                                           that Rank Distance can reflect relevance between images to             as p(In ∈ T (Im )|d(Im . while                             Fig.1109/TIP. we set the ratio of                                                    p(dn |In ∈ Tm ) × p(In ∈ Tm )                                 p(In ∈ Fm ) to p(In ∈ Tm ) as 500. Content may change prior to final publication. For simplicity. Here. Moreover. 5.html for more information. both R(Im . In ) presents the rank of In in the search result of                                    contains 6385 images from Flickr by searching for particular             Im . Citation information: DOI 10. p(dn |In ∈ Tm ) and p(dn |In ∈ Fm )                                      are connected with an undirected edge. recall may             images. 3 illustrates the effectiveness of Rank Distance. we get                                                                 As α is a constant. If and only                                            To obtain the rank list. but has not been fully edited. In )). It is featured by 55 queries of 12 different             In this scenario. distinguishing true matches from outliers. the ratio of p(In ∈ Fm ) and p(In ∈ Tm ) is                                           not be guaranteed. σ). We find in our preliminary                      p(In ∈ Tm |dn ) =                                           . It is clear that Cosine distance pushes outliers in top ranks. we use an independent                                        determined by parameter K. thus we             match [8].This article has been accepted for publication in a future issue of this journal. the percentage decreases rapidly with Rank Dis-                 To evaluate the retrieval quality effectively.                                     is measured by neighborhood consistency. and the choice of feature             R(In . 5. respectively. each image is taken as query             if In is ranked among the top search results of Im and Im is                                         using BoW feature. In ). It demonstrates that Rank Distance can evaluate the similarity                        row show false matches of “Eiffel”. false matches follow a normal distribution             the Bayes similarity. 1] using N . Sample images in the Paris dataset for empirical study. we can easily find out two distributions have a                                                                                                                  clear separation.2017. the top-5 ranked             images under baseline Cosine distance are illustrated. 4. the Paris 6K. the neighborhood consistency is             generally a very large term. its percentage value is usually below 3%. 6 and Eq. we propose                                        tance. and vice vasa. 6 . We can observe that view             Rank Distance corrects this artifact by increasing the distance between outliers                     point and illumination vary a lot among true matches. according to Eq. 4.. to perform empirical study. smaller Rank Distance denotes that the two                                         landmarks.                We denote the true matches and false matches of Im as                                             more than 50% true matches have the Rank Distance smaller             T (Im ) and F (Im ). Personal use is permitted. It is observe             As In either belongs to Tm or Fm . Some sample images are shown in Fig.                 (9)                                          + p(dn |In ∈ Fm ) × p(In ∈ Fm ). For the Rank Distance distribution of true             B. In experiments.e. True match images are marked with green dot. This dataset             where R(Im . The third and bottom             and the query. the two images are considered to be a true                                    does not have a obvious impact on the esimation. As a result. Im ) are small. we define dn =                                       some extend. Tm = T (Im ) and Fm = F (Im ). but                     second row demonstrate true matches of “Eiffel”. the estimated Bayes             Bayes Theorem. IEEE                                                                                               Transactions on Image Processing                 ZIQIONG LIU et al.                                            distributions of prior probability density are drawn in Fig. based on which.                                                                                                                   dataset.                                        Paris landmarks. which is defined as the probability of In                                      N (u. this measurement is robust to outliers. See http://www. compared to 4% of false matches. The edge weight             are the prior probability distributions of dn . we compute its Rank Distance to the             which helps find true match images. we have                                                           that the distribution can be approximated by the following:                            p(dn ) =p(dn |In ∈ Tm ) × p(In ∈ Tm )                                                                                          α                                                                                                         (7)                           p(In ∈ Tm |dn ) =     . but republication/redistribution requires IEEE permission.                                        C.                                        p(In ∈ Fm ) p(dn |In ∈ Fm ) −1               p(In ∈ Tm |dn ) = (1 +                ×                 ) . images being K-reciprocal neighborhood relation                In the Bayes similarity. K-             tions can be estimated through empirical study. In ) and                                 features follow similar distributions. n) to [0. the two             outliers.             images are more visually similar. In contrast.                                       than 0.                                    use BoW here. Typically.                                            reciprocal neighborhood relation may filter out some potential             the number of true matches is far less than the false match                                          candidates in the construction of graph.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH                                                                                                                   5                 Fig. n) + R(n. we set α as 1 for convenience.      Examples of Rank Distance. The numbers under the images denote their Rank Distances             (10−5 ) to the query. Then. According to the                                            Consequently. where many outliers             are introduced. Hence. The top and             outliers red. and thus the fusion performance   1057-7149 (c) 2016 IEEE.04. These distribu.5. i. p(In ∈ Tm |dn ) can be rewritten as:                                                  similarity is shown in Fig. In specific.org/publications_standards/publications/rights/index. For each query.                      (6)      experiments that the ratio of p(In ∈ Fm ) to p(In ∈ Tm ) does                                                                p(dn )                                                                                                                  not have a significant impact on search accuracy.                                                                                dn              By combining Eq.                                                                                                                  From Fig. Based on the Rank Distance. in which u is about 0.ieee. Namely. 7. 8. meanwhile down ranking                                           true matches and false matches. Moreover. we found that different             also ranked high in the rank list of In .2660244. we normalize R(m. Bayes Similarity                                                                                                                  matches. Fig. 3. However.             effectively.             d(Im . Construction of ImageGraph                                       p(In ∈ Tm )     p(dn |In ∈ Tm )                                                                         (8)                                        In [17]. . As the variance σ is relative             being the true match of Im                                                                           large.  10. It sorts the nodes by their degrees.                                                                                                                   E. In ) ∈ E linking from vertex vm to vn . V = {v1 .3             G = (V. .6      0.02             The ImageGraph centering at query q can be represented as                                                                                                                       Percentage                                                                                                                                                                                                Percentage                                                                                                                                 0.02                                                                                                                                          0.2             E is the set of edges.                                          not be retrieved in both feature spaces.04              Algorithm 1. The linking edge is weighted as Bayes similarity.                                                        0. However. add a direct-               ed edge from its vertex vm to each vertex corresponding to               the top-K ranked image in the pre-computed search result                                                                                               w(vm . 5. IN }.06                                                                                                                                m                 between starting vertex q and terminal vertex. take each image               as query and get the search result.                                                                Fig. vn ).1                                                                              Experimental Distribution             vertices. vn ) =                                  0                     otherwise                                                 Fig.05                                                                                                                                n                 of ImageGraph construction using one feature is illustrated in                                                              0. we take into account the K nearest neighbors and                                                            0.8         1                                          2N                                                                                                                Rank Distance                                                 Rank Distance                                  R(Im .             graph is denoted as G = (V.                     Probability distribution of Bayes similarity.                For query q.9         1                    1 Given a dataset I = {I1 . vn ) = Σi wi (vm .01              directed edge (Im . 5 in Eq. Large K or bad             as:                                                                                                  features may bring a lot of outliers into the graph.2017.07             threshold. K denotes the breadth of ImageGraph. 9. . The edge is weighted as Bayes similarity according to               Eq.025              build a directed graph. XX.               corresponding to the top-K ranked images. where m = 1. but has not been fully edited. On the contrary. 3 until the depth of graph achieves P. Thus.                                                                                         weight after fusion.html for more information. Therefore. and the weight is the sum of wi . E = ∪i Ei . w). ..2660244. The vertices in the first                                                                                                         Bayes similarity             layer then continue to link their K nearest neighbors as child                                                                                                                                             0.6      0.6    0. vN } indicates the set of                                                       0.             forming the first layer of ImageGraph. The fused                                          which is the sum of weights from connected edges.. Substituting Eq. Since                   1 For query q. E. The edges are computed as Bayes similarity according                                            searched in multiple feature spaces.. Rank Distance distribution of (a) True match images and (b) False                                                                         (10)                                     match images.03                                                                                                                                         0.. Fusion of Multiple ImageGraphs                                                                       PageRank [49] is a query independent link analysis method. N .015             vertices. where vm is the corresponding vertex of image Im ..1      0. depth of ImageGraph means the shortest path                                                                0. more comprehensive relationships between im-                    2 Add a directed edge from its vertex to each vertex                                          ages are represented in the fused ImageGraph. 10.                                                                                                                                                      p = 8*10−4/dn                                                                                                                                         0. Usually.2      0. and ranks a subset of graph             we combine the multiple graphs Gi = (Vi .                As rank result is encoded in ImageGraph. Personal use is permitted.1                                                                                                                                                                                                     0. 6.ieee.3    0.03             structure to encode the image-level relationships. E. their edge weight would                   5 Output the ImageGraph G = (V.8      0.                  2 For each image Im . 2.4      0.09             through which the retrieval quality is evaluated.1109/TIP. we have. w).. as negative images can                   4 Repeat step.   1057-7149 (c) 2016 IEEE. Citation information: DOI 10. . more                  3 For each new added vertex. ImageGraph is                                                                                                                                         0.. In our ap. The new added vertices                                            which are challenging to be searched using one feature may               form the first layer of graph. we propose the ImageGraph                                                                                         (a) True match                                                (b) False match                                                                                                                                 0. which are assigned larger               to Eq. MONTH YEAR                is sensitive to K.                                                         be relatively smaller. Here.                                                                                               The vertices and edges of fused ImageGraph are the union               On-line:                                                                                           set of individual graphs. To this end.15                                                                                                                                                                                                         0. positive images with high similarity are easily to be               D.2      0.4    0. Content may change prior to final publication. Local Ranking             D. improving the recall.35                                                                                                                                                                                                         0. compute its search result with the given                                        each graph illustrates the image-level relevance in different               feature.                                 V = ∪i Vi .                                     density [17] starts from query.Im ) if In ∈ NK (Im )                 w(vm .08             expanded in this manner until the depth of graph P achieves a                                                                                                                      p(I ∈ T |d )                                                                                                                                n                                                                                                                                            0.This article has been accepted for publication in a future issue of this journal. Calculate the                                            due to complementary nature of multiple features. w).org/publications_standards/publications/rights/index. E.                 (11)                                     there are many edges linked among these irrelevant images. In this way.8     1                       0    0. wi ) obtained                                         related to the specific query. which can be written                                              these ranking methods suffer from outliers.45             proach. candidates               edge weight according to Eq.25                                                                 0.4                                                                                                                                                                                                   dn                                                                                                                                                                                                          0. X. Ei .5                                                                     0. v2 . IEEE                                                                                               Transactions on Image Processing                 6                                                                                                 IEEE TRANSACTIONS ON IMAGE PROCESSING. On one hand.                                                                           0                                                                     0                                                                                                                                         0             0.5       0. 10. VOL. NO.                                               (12)               D.                                                                                                                0.                                                     0.In )+R(In . add directed edges from it to                                      candidates are included.01             Algorithm 1 Construction of ImageGraph                                                                                                                                              0               Off-line:                                                                                                                           0          0. See http://www..                                                                     be easier to be retrieved in another feature space..2     0. On the other               its top-K ranked images in the pre-computed search result                                          hand.05             images.             with different features without supervision [17]. we can fuse                                              ranking on the whole graph. Ranking by maximizing weighted             multiple rank results efficiently via graph fusion.                                                                                           feature space.                                                       0. its top-K ranked images are connected by q. The                                                     0. If In belongs to NK (Im ). I2 .7     0. but republication/redistribution requires IEEE permission.005             edge weight w is defined as the Bayes similarity of connected                                                      0. the precision is improved. there is a                                                       0. The algorithm                                                                0.4                                                                                                                                                                                                     0.  w). Each object has 4 images with different the retained feature CNN* enhances the original performance             viewpoints and illuminations. i. V components                                                                                           respectively. 2 and step.vn )∈Esi             256 × 256 following [16].                                                                                           rootSIFT [21].                1 Initialize subgraph G0s as ({q}. It is calculated as the area under that global features. For             local optimum instead of global maximum. Most queries have 34.200 images consistently on the three datasets. HSV. Features and Baselines             which introduces the maximum weighted edges is included                          In this paper. Datasets                                                                                              Search results on three datasets are presented in Table II.             subgraph G0s = ({q}.31% in mAP             and Flickr 1M [25]. HSV             into Gi+1                     s    . on the three datasets. The subgraph Gs = (Vs . At the i + 1th iteration. 13. and 75. the performance of CNN is improved                UKBench The UKBench dataset contains 10.                                  histogram using 20×10×5 bins for H.                                                        images by searching for particular Oxford landmarks from                To tackle this problem. See http://www. which is the recall of top-4 candidate images.                     tures.vn )∈Esi+1                (vm . w) and C 0 as the                       CNN For an input image. a 200K codebook is                Gs satisfies user’s requirement. we compute a 1000-dim HSV color             local ranking is illustrated in Algorithm 2. we exploit four features: GIST [36].                                           trained on Flickr60K [25] dataset. burstiness strategy [10]. and 12. It is because most images in Oxford contain             are averaged.Besides. vertex in C i which could intro.. This dataset has a comprehensive ground truth for 11             local-based ranking.96% in mAP             less than 4 ground truth images undergoing various changes.2017. multiple assignment                               IV.             vsi+1 = arg max                        w(vm .: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH                                                                                                                   7                which is called “Tightly-Knit Community Effect”. we resize the images to                       vsi+1 ∈C i                                  (vm . IEEE                                                                                               Transactions on Image Processing                 ZIQIONG LIU et al. Retrieval accuracy is measured by mean Average             we aim to find the subgraph Gs starting with q. on the Oxford.             link analysis method [17] may lead to deviation from query                       Oxford The Oxford Buildings dataset consists of 5.491 personal GIST leads to poor performance on these datasets. In this dataset. In this serves as a query. and C 0 contains the vertices             connected by q.   1057-7149 (c) 2016 IEEE.05% in mAP. cosine distance is defined as the                duce the maximum weighted edges is introduced into Gi+1            s                                                                                           similarity function of images. Hamming threshold and weighting                                                                                           parameter are set to 52 and 26. respectively. 128-bit Hamming signature                5 Output Gs . vn ). Personal use is permitted.Network [48].                                                       CNN feature following [53]. Citation information: DOI 10. Similarly. each image by about 10% in mAP. we also fine-tine the                according to Eq.                Holidays The Holidays dataset consists of 1.             the maximum weighted. Es .14% in mAP. ∅. By contrast.                To evaluate the effectiveness of our approach. Since a higher from different viewpoints. Based on cosine distance. The algorithm of                  HSV For each image. UKBench [18].583 in N-S score. Es contains every edge arbitrarily collected from Flickr. w) is                         Flickr 1M The Flickr 1M dataset includes 1 millon images             induced by the vertex set Vs ∈ V .856 in N-S score. denoted   as  vs                                            i+1                                                :                                          Histogram.             to their order of being incorporated into Gs . respectively. Ranking by maximizing weighted density [17] and (maximum 4).2660244. which is Precision (mAP).    Convolutional Neural Network (CNN) and Bag-                                        X                            X                     of-Words   (BoW). their APs well on Oxford. nearest neighbors             satisfies user’s requirement. yielding mean Average Precision (mAP). Moreover. Oxford [27] 80. 3. . The l2 -normalized histogram is used for nearest             Algorithm 2 Local Ranking                                                     neighbors search with cosine distance. The nodes are ranked according search is performed. The performance is measured by N-S score             situation. each containing 5 possible queries. Content may change prior to final publication. For all the query images. The vertices are ranked according to their [25] of each SIFT descriptor is embedded in the inverted file                order of being incorporated into Gs .This article has been accepted for publication in a future issue of this journal.html for more information. we adopt a safe strategy to perform Flickr. but republication/redistribution requires IEEE permission. Specifically.             of 2.e.                                        as CNN*. it has a lot of true match images. The proposed ranking only considers different landmarks. These dataset can be added             between the vertices in Vs . do not work             the Precision-Recall curve.                                                 4096-dim CNN descritpor from the 6-th layer in the Caffe                2 At the i + 1th iteration. mAP buildings. Specifically. We also define the candidate set into the above datasets as distractors for large scale experi-             C as vertices that Vs points to. After fine-tuning. 0.062             and up-rank the noise.The re-trained feature is denoted                3 Update Gi+1  s    and   C  i+1                                                  . vn )−               w(vm .             A.ieee. GIST and CNN. Note             performance of each query. 1. obtaining             experiments on Holidays [25].1109/TIP.                                                           on Holidays. which are taken             confused by the tightly connected outliers.550 objects. DATASETS AND BASELINES                                  [9] and pIDF [24] are employed on both dataset to enhance                                                                                           the performance. An l2 -normalized 512-dim GIST                                                                                  (13) descriptor is extracted for each image using 4 scales and                This procedure continues until the number of nodes in Gs 8 orientations. but has not been fully edited. the vertex in C i B. HSV and CNN             The Average Precision (AP) is used to evaluate the retrieval result in moderate accuracy on Holidays and UKBench. we extract the l2 -normalized                vertices that q points to. or distortion. respectively. UKBench and Oxford. 3 until the number of nodes in                     BoW For Holidays and UKBench.                                      to filter out false matches. we conduct It shows that BoW achieves good performance. GIST To compute GIST descriptor.                                                                                           1M codebook is trained on Paris6k dataset [26]. avoiding being each query.                4 Repeat step. It yields             holiday images and 500 of them are queries.org/publications_standards/publications/rights/index. naturally. Moreover. Some of them have partial occlusion             edge weight reflects a higher relevance to the query. which are difficult to be described using global fea-             is employed to measure retrieval accuracy of the dataset. S. we initialize ments. For Oxford 5K.  Moveover.008 in N-S score.                                                                   baseline by 4. by combining GIST.                                                               enhanced to 90. the performance of K = 4 is not                                                                  0.28%.55%. IEEE                                                                                               Transactions on Image Processing                 8                                                                                                                       IEEE TRANSACTIONS ON IMAGE PROCESSING.2017.195               3. score fusion is superior to graph fusion for             performance is no longer increased. with re-trained CNN feature.1109/TIP.856         3.27%. graph fusion. Parameter Tuning                                                                                                                                         Moreover.14         61. the true match                                                                    graph fusion and score fusion. the performance of BoW.96% in mAP.                                                                                                                                         score fusion. By 0. When BoW is fused with             are included into ImageGraph.                                                                   C. respectively.34        80. the mAP is                                                                 each feature combination.56       54.                                                                    in the same scenario. which is a significant improvement compared to             K limits the property of depth P. 87. K implies the                                                                  in mAP. 3.This article has been accepted for publication in a future issue of this journal. On                                                            K= 4                                                       K   =   20                                                                                                                                         Oxford. Personal use is permitted. its performance is decreased by 5% in mAP. However. we set K = 10                                                             CNN. Similar phenomenon can be             images which are filtered out by small K.907 in N-S score when                       81                                                            K= 6                                                                                   72                                                                                                                       K   =   30        fused with CNN. UKBench and Oxford. 8.             P is set as 8 in our experiments. We speculate that breadth                                                               by 5. and 8.14        75.48% in                       UKBench. In our method. When fused with             be retrieved using a large P. The mAP first increases with P.             and GIST are fused on (a) Holidays and BoW and CNN are fused on (b)             Oxford.88% and 9. we compare             number of ground truths of Holidays dataset is very small. then                                                               features are fused with BoW on Holidays and UKBench.01% and 1.397       3. 7.                                                                On Holidays. and 88. their combination achieves an mAP of                                                                                                                                         79. When all potential candidates are retrieved. The N-S score              mAP(%)                                                                              mAP(%)                           83                                                          76                                                    of BoW is enhanced from 3. we set different K for them. In addition. the combination of BoW                                                                                                                                         with GIST.   1057-7149 (c) 2016 IEEE. mAP(%)                  12. Content may change prior to final publication. 3.08%. BoW                                                  80. we first evaluate                                                              than score fusion. mAP(%)                34.48% in mAP. the performance of BoW is not affected in                                              P                                                          P                               fusion with these features. 9.06%.org/publications_standards/publications/rights/index. For HSV and CNN which have                                                                                                                                         moderate performance. 6. By comparison. HSV.                                                                                                                                         the performance is further boosted.92%             Graph: breadth K and depth P.98%. BoW                       82                                                          74                                                            K = 10                                                            K= 8                                                                                                                       K                                                                                                                       K                                                                                                                           =                                                                                                                           =                                                                                                                               50                                                                                                                               40                                                                                                                                         performance is further improved to 3. but republication/redistribution requires IEEE permission. Comparison with Other Fusion Approaches             The experimental results are demonstrated in Fig.             B. and 81. the results of multiple features fusion are                                                                                                                                         demonstrated in Table III. and CNN obtains the mAP of 79. As the                                                                    To further illustrate the strength of our method. BoW                       Holidays.html for more information.                                                                                           Furthermore. but has not been fully edited. since more top-ranked candidates                                                                from bad features on this dataset. respectively.87%. X.05%                                                                                                                                         in mAP on Holidays.29               44. 7. are more difficult to                                                             observed when BoW is combined with HSV. In particular. our results are further                Two parameters are involved in the construction of Image. It is evident that the fusion brings consistent                                                                                                                                         benefit to various feature combinations. respectively. We use their released code                It is notable that increase of K and R jointly helps to                                                                  and default parameter in the following experiments. NO. The reason lies                                                                result comparisons are presented in Fig.                                                                                                                        In addition.46%.                                                           shown in Fig.ieee.             number of candidates connected with a vertex. See http://www.2660244.             Fig. P means the             distance that affinity are propagated on the graph. HSV.                                                                                                                                         87. thus the                                                                fusion. Instead.                       Datasets                        GIST          HSV                 CNN         CNN*         BoW                       On Holidays.502        3. Note that fusion of two global features also                       Oxford.916 in N-S score and 82.28% in mAP.920 in N-S score and 84.05                  performance is boosted to 85. respectively. The fusion of four features                                                       V. We test             different combinations of K and R on Holidays and Oxford. It improves the individual baseline of HSV and CNN                                       (a) Holidays                                                  (b) Oxford                       86                                                          82                                                    by 17.582                  mAP. closing to the maximum N-S score 4.46%.22%. respectively.                                                                       fusion [17] and score fusion [16]. 7. XX. score fusion improves BoW baseline by             Specifically. and CNN.             saturation when K = 10 on Holidays and K = 40 on Oxford.71%.             recall is boosted. and K = 40 on Oxford. When multiple features are fused. by taking use of four features.22       72. and                                                              our results with two state-of-the-art fusion approaches: graph             Oxford relatively large. It shows that on             in that the true matches which are not directly retrieved by                                                                both datasets our method outperforms graph fusion and score             query can be sufficiently exploited with a large P.83%. After                       85                                                          80                                                                                                                                         fused with GIST. Fusion results on three datasets are                                                            GIST is slightly higher than BoW baseline. Noticing that graph fusion suffers             enhanced when K gets large. score fusion with             the fusion of two features. HSV. although global features have poor discrimination for                       80                         2   4     6      8       10   12      14    16                                                                                   70                                                                                     2           4       6         8                10                                                                                                                                         building images.                                                                                                                                            Similar results can be observed on UKBench. respectively. on Holidays.26%. N-S                    1. and CNN are                       84                                                          78                                                                                                                                         increased by 5.51%. score fusion and our method boost BoW             for Holidays and UKBench. and 90. Then.21               69. our method enhances the performance             improved even if P increases to 16.582 to 3.703 and 3. When bad feature GIST                                                                                                                                         is merged. VOL. respectively.31                  boosts the overall performance. E XPERIMENTS                                                                   achieves 90. .09%.             A. and our method gain the mAP of 83. graph fusion. Namely. the performance of graph fusion is better               To verify the effectiveness of our method.96         13.834 through the                                                            K = 12                                                     K   =   60        fusion with GIST and HSV. the fusion still yields stable improvement.89% in mAP. MONTH YEAR                                                     TABLE II                                 P ERFORMANCE OF BASELINES ON THREE DATASETS . Citation information: DOI 10. Fusion Results                                                                                                              On UKBench. Multiple             improve the performance. the performance reaches                                                                 GIST. respectively. and             generally keeps stable when P becomes large. respectively. mAP results against different values of breadth K and depth P. In particular. From these results. org/publications_standards/publications/rights/index. “B+G”. natural noise is more difficult to tackle. the retrieved results are replaced with randomly                                                                                                                  assigned values. 10.                                            In comparison. “BoW +GIST”. our method keeps the performance of 3.. due to the higher recall                                                                                                                  brought by ImageGraph. Comparison with graph fusion ([17]) and score fusion ([16]). Thus. 9. 10 that graph fusion is very sensitive                                                                                                                  to parameter K. .328                In summary. but has not been fully edited. while graph fusion decreases to 48. when K is larger than the number of ground                                                                                                                  truths. 0.                                                                                                                     The outliers in ImageGraph are introduced from two ways.This article has been accepted for publication in a future issue of this journal. “BoW + CNN”.                                                                                                                  Specifically. respectively. The green bar and blue bar represent result of the first feature and the second             feature. On Holidays.                                      and 74. “B+G+H” and “B+G+H+C”. Content may change prior to final publication. It implies that the             gray bar show the results by graph fusion. Personal use is permitted. and then keeps stable. On UKBench. However. the outliers refer to the natural                                                                                                                  noise. 88.325 in N-S score through graph fusion. Compared to the random                                                                                                                  noise.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH                                                                                                                   9                 Fig.. respectively. In the experiments of                                                                                                                  [52]. and our method. compared to graph fusion. fusion with CNN brings the benefit of 0.228. IEEE                                                                                               Transactions on Image Processing                 ZIQIONG LIU et al.2660244. our method not only                                         and 3. the                                                                                                                  performance drops significantly.                                                                                                                  On one hand. we                                                                                                                  compare our results with graph fusion [17]. but republication/redistribution requires IEEE permission.58%. while the blue bar. respectively. When combined with HSV. “BoW +             HSV”.                                                                                                                  we first evaluate the fusion results when K varies. In our method.581 of graph fusion. i. which is                                                                                                                  illustrated in Fig.301.10%             our method. Good features                                         to the combination of “B+G+H” and “B+G+H+C”. Six feature combinations are presented. i. 3. a lot of outliers would be included into the graph. respectively. but also brings about superior                                                                                                                  improvement.69%. and                                       respectively.                              graph fusion method achieves the best performance when K             respectively.                                                                                                                   D.                                                                                                                     is resistant to bad features.18%             Similarly.746.             bring further benefit in the fusion. Fusion results of two features on (a) Holidays. our                                                                                                                  method yields better performance. which exists in the original rank result.1109/TIP. “HSV + GIST” and “CNN + GIST”. compared             by 0. See http://www. 8.96%.904 in N-S score at K = 20. the performance of our method increases             BoW is increased by 0. 3.                                                                                                                     It is shown in Fig.173. compared to 2. 67.916. orange bar and                           then drops after reaching a peak at K = 4.html for more information.836             3. for the three com-             the three methods gain the N-S score of 3. compared to score fusion.ieee. score fusion. (b) UKBench and (c) Oxford. The                              K gets large. score fusion and our method. score fusion. its performance decreases when             Fig.             and 0. respectively. Additionally. It is showed in [52] that the graph fusion                                                                                                                  approach is robust to random noise.678. On UKBench. In order to validate our method.                                                                                                                  when K becomes large and more outliers are introduced. and                                            binations. fusion with bad             graph fusion and our approach enhance the BoW baseline                                               feature. Citation information: DOI 10.04% and 90.                                      our method yields the mAP of 84.                                                                                 and 3. random noises are added to the rank results of features. Evaluation of Robustness                                                                                                                     In this section. “HSV + CNN”.22.121 in N-S score. When all features are fused together. On Holidays. Five             feature combinations are presented on (a) Holidays and (b) UKBench.e. Natural noise is                                                                                                                  usually caused by the feature itself.             using graph fusion.089 and 0.841. 0.2017. when K = 20. while yellow bar shows fusion result. and 0. leads to more rapid descend. we demonstrate the robustness of our ap-                                                                                                                  proach to outliers.e.252 in N-S score                                             with K.894. In addition. respectively. It illustrates the robustness of our   1057-7149 (c) 2016 IEEE.                                        using combination of “B+G”. N-S score first rises with K and             yellow bar represents the BoW baseline. 3.                                                                                                                  is about the ground truths number of the dataset. 92% on Holidays. Content may change prior to final publication.920. On Oxford. Fusion results with various K on (a) Holidays and (b) UKBench. 11. More-             over.14%. UKBench. where L is the expected retrieved image number. the             refined by 0.   1057-7149 (c) 2016 IEEE. The reranking results on Holidays and UKBench             are shown in Fig. We can see that our method outperforms             cue. respectively. the baselines             of GIST.             With re-trained CNN feature. but republication/redistribution requires IEEE permission.             refine the initial rank result. On Holidays+Flickr 1M. Time and Cost             baselines greatly and consistently.91% to 75. and Oxford. and CNN. “G”. the             BoW is enhanced by 5% in mAP through the combination             with GIST. The examples of retrieval results on three             directly searched by query. Reranking performance reflects the                                            small since it only considers the top candidates.3%. When GIST is fused with other three features. and 4. On Holidays. and CNN are                                                Because this operation is offline and required only once. VOL.html for more information.177.08 in N-S score. such as RANSAC [27].91%.              F.               On the other hand. The on-line computational cost is             to 2.21%             in mAP. and BoW are improved by 4. Moreover. respectively. Similarly. Comparison with the State-of-the-art                                                              experiments are performed on a server with 3. Specially.24 % in mAP. we perform             large scale experiments on Holidays. Holidays + Flickr1M. Reranking with Individual Feature                For individual feature. 8 and Table III. Furthermore. HSV and CNN obtains 77. Citation information: DOI 10. our result is also competitive             affinity values can be propagated to images which are not                                            to other methods. “H”.                                             ture. but has not been fully edited.89%. the combination of BoW. due to the             images could be promoted and outliers lowered down.066 in N-S score. bad features would also result in outliers.920.             E. we obtain             reranking methods. 15].28% in             mAP. Reranking results for single features on (a) Holidays and (b)                                                                                                                  UKBench.224 in N-S score. Large Scale Experiments                                                                           HSV. the average             Our method achieves mAP = 90.             From Fig. and mAP                                               query time of BoW and global features are 396ms and 189ms. which can still promote true match images when             there are a lot of outliers in the graph. the same situation are also             observed. Then we compute and store their relevant relationships. Personal use is permitted.2017. When the four features are fused.                                      and 128GB memory. HSV.138. One category is based on feature-level                                            an mAP of 77. thus the                                         effectiveness of CNN*. X. our results are comparable to [13. 10. is also enhanced from 1. On UKBench.                                                                                              Fig. The space complexity is O(KN)             Moreover.22% on Holidays + Flickr 1M dataset. Noticing that our method refines the                                           H. we also             Our method belongs to the latter one. the performances of BoW. which exceeds             other category employs the image-level cue for reranking.46 GHz CPU               We compare our results with the state-of-the-art in Table IV.2660244. HSV. N-S = 3.                                                        In our method. respectively.65%. the performance             achieves an mAP of 77.                                                                                         Fig. There are two kinds of popular                                       For the large scale dataset. true match                                        result is slightly higher than [16]. 11. our method can still exploit its comple-             mentary cues. The                                             the state-of-the-art approaches on Holidays.37% in mAP.                                       time complexity is affordable. GIST. it requires O(L2 K) for ranking on the                                                                                                                  graph.This article has been accepted for publication in a future issue of this journal. Besides. the bad feature. and 0. GIST. query expansion [41].                                         For the large scale experiment. 12.             precision is enhanced. See http://www. when the             bad feature is fused.             be retrieved. Our             G. It shows that fusion of multiple features can             consistently improve the search accuracy in large scale dataset.82%. through the graph structure. In this experiment.                                                                         time complexity for constructing ImageGraph of each query                                                                                                                  is O(K P ).                To test the scalability of the proposed method. The solid line represents our method and dashed                                                                                                                  line the graph fusion ([17]). improving the performance.             the dimension of global features is reduced to 128D by             Principal Component Analysis (PCA). MONTH YEAR                approach. XX.1109/TIP. Taking use of the                                              achieve the best N-S score 3. On UKBench and Oxford.856                                         for storing connectivity. . our             image-level relationships reflected by ImageGraph. CNN. each image is used as query with given fea-             2. 4. On Holidays. the performance is boosted to             77. Abbreviations “B”. Flickr 1M images are             added into Holidays dataset as distractors. respectively.                                            the result reported in [16] by 2. 0. respectively.                                                                                                                  Three feature combinations are tested and the performance are compared with                                                                                                                  graph fusion.02% to 90.82%. it             can still improve the performance from 90. thus challenging candidates can                                          datasets are shown in Fig. The results are shown             in Table III.             on UKBench.ieee. NO.org/publications_standards/publications/rights/index. the             effectiveness of ImageGraph. By 0. IEEE                                                                                               Transactions on Image Processing                 10                                                                                                IEEE TRANSACTIONS ON IMAGE PROCESSING. our approach can be applied to                                            = 84. It reveals the robustness of our method from another             perspective. etc. we can see that the performance of             our method is resistant to bad feature.             HSV and CNN boost the performance of “BoW + GIST” from             69. and “C” represent BoW.26% and 76.              time.9%.                                                                                                                  and the results are competitive to the state-of-the-art. 20141081253.             select 2000 images from dataset as queries.This article has been accepted for publication in a future issue of this journal.             for ranking.36GB. cost 5.                                                                                                                   [8] X.6% and 42. based on it.                                                                                                                  and the edges are weighted with Bayes similarity. . This work was supported in part by             stores the reference book.                                     [6] J. 52]. the                                                                                                                  by bad features or inappropriate parameters. Ji.                To further validate our method. Wang.                                                                                                                   [3] F. This work was supported                The memory cost of the proposed method is about 0. In large scale                                                                                                                  an image is connected to its K nearest neighbors with edges. Moreover. In Proceedings of the IEEE Conference             obtain the performance of 32. HSV and GIST                                                    with relative attribute feedback. Fused with                                               ranking.87.                                                                                                                  under Grant No. It is because                                                                                                                     Acknowledgements This work was supported by the Ini-             [15] uses a supervised framework. The fusion of four                                                  scale image search.2ms for ImageGraph construction and 0.. 2010.ieee. which costs 0.1%.16ms                                                                                                                  local ranking.             processing steps of our method.             in Table V cost a few milliseconds. Schmid.92%             is dependent on many factors. Wang and G.                                            [4] D. respectively.             image search.1877-1890. For each feature. respectively.             we fuse them with our proposed method.3. images are re-ordered by             usually takes 5. N-S score of 3. M. thus it is not directly                                                                                                                  the large scale experiments. 17. Lin. Szummer. which are relatively small compared to the query                                                                                                                  Through extensive experiments on three benchmark datasets. Besides. we will investigate how to efficiently             method with other post-processing methods considered in                                                                                                                  update the ImageGraph structure when new images are added             Table IV. and A. We can see that the CNN feature achieves                                               localization with spatially-constrained similarity measure and k-nn re-             the best performance for concept detection task. Then. Personal use is permitted. Baluja. CNN. Most of the post-processing methods                                                                                                                  strategies in the fusion. we store 4 nearest neighbors of each image. and C. our method adopts the same framework                                                 and attributes. 9. In Proceedings of the IEEE Conference                                                                                                                       on Computer Vision and Pattern Recognition. It                                                                                                                  Graph.1109/TIP. 2011. Parikh and K.4% in                                                    on Computer Vision and Pattern Recognition. HSV and GIST.             similar to ours in theory. but republication/redistribution requires IEEE permission. graph fusion [17] and [16]. In Proceedings of the IEEE Conference on Computer Vision             BoW. J. Visualrank: Applying pagerank to large-scale                                                                                                                       image search. Grauman. The memory                                                                                                                  ranklists resulted from different methods are fused via Image-             cost of 1 million image of single feature is about 0. Multiple             then 105 bits are needed per image per feature. Weak attributes for             I.5%. For each query.2%. But it can roughly indicate the time efficiency of                                                                                                                  Holidays + Flickr 1M.             We compare the time of post-processing steps of the proposed                                                                                                                     In the future work. our method belongs to the                                                                                                                  two popular fusion schemes. In Proceedings of the IEEE European Conference on             with [15.             Since both [14] and [11] store binary signatures of features in             the inverted file. but has not been fully edited. Y. A discriminative latent model of object classes             is 22. Object retrieval and             mAP. which works on the given rank list.316-336.09GB. Y and S-F. Brandt. Citation information: DOI 10. respectively.             post-processing algorithm.                                                                            [2] L. 2010.8% in mAP. A lot                                                                                                                                                        R EFERENCES             of image-level information is stored in [12] and the cost of it                                                                                                                   [1] Y. which are usually brought in             methods. 2012. we perform concept de. the CNN result is improved to 49. and K. Wu.076GB extra memory.35GB. we calculate                                             Intelligence. It shows that our method outperforms             the proposed approach. 14. such as the machine used                                                                                                                  on Holidays.e. the ImageGraph is                                                                                                                  constructed to model the relationship among images. In addtion. Label diagnosis through itself             tance [55].36ms. i.org/publications_standards/publications/rights/index. Torresani. We first define Rank Distance to measure                        Time (ms)          5.             features obtains the mAP of 50. On the fused ImageGraph.html for more information. Jiang. After obtaining the rank lists of different features. 2008. Efficient object category                                                                                                                       recognition using classemes. Fitzgibbon. we yield an mAP of 77. 2008. pp. Mori. Moreover. Kovashka. ImageGraph construction                                                                                                                  more efforts will be made to explore the feature selection             and ranking.                                                   and Pattern Recognition.                                                                                                                  we show that significant improvement can be achieved when                Table IV shows the comparison of average query time and                                                                                                                  multiple features are fused. rather than the neighborhood relationship. R. Qi Tian by ARO grant W911NF-15-1-0290             The approach of [16] evaluates the retrieval quality online with                                     and Faculty Research Gift Awards by NEC Laboratories of             score curve. Chang. Douze.   1057-7149 (c) 2016 IEEE. Note that query time                                                                                                                  obtained an mAP of 90. Here we still use                                              [7] A. Jégou. except [15]. Chang. S. vol. WhittleSearch: Image search             mAP to measure the performance. we demonstrate that             memory cost on 1 million dataset with the state-of-the-art                                                                                                                  our method is robust to outliers.868s.1GB.                                         National Science Foundation of China (NSFC) 61429201. i.                                         tuning for web image search. C ONCLUSIONS                                                                                                                     This paper proposes a graph-based method for robust feature                        Methods           Ours       [17]      [52]     [16]      [15]       [12]                 fusion at rank level. the memory costs of them are 6.. In             and number of features of dataset. IEEE                                                                                               Transactions on Image Processing                 ZIQIONG LIU et al. which costs a lot of time                                                                                                                  tiative Scientific Research Program of Ministry of Education             to build the anchors. thus the memory costs of these methods are                                             Computer Vision. See http://www. In Proceedings of the IEEE Conference on                                                                                                                       Computer Vision and Pattern Recognition.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH                                                                                                                  11                                                 TABLE V                          P OST. Since we use four features in our experiments. The post-                                                                                                                  into the database or old images are deleted from it. Content may change prior to final publication. vol.920 and 84. Table V shows the result of comparison. Our method has             average query time is about 0. Shen.                                                                                                                   [9] H. Concept detection                                                                                      large-scale image retrieval.             its distance to each concept class. 42. 2012. pp. Yu. M. G. The rest images                                           [5] F. Jing and S.6%. Avidan. M-H Tsai. Improving bag-of-features for large             42.7. BoW. 2012. It only                                      America and Blippar. UKBench.2017. In Proceedings of the             tection experiments on Flickr25000 dataset [56].36        1        1        10       2210        30                  the relevance of images on rank level.We randomly                                               IEEE International Conference on Computer Vision.82% on             comparable.89%. Each image ID costs about 21 bits.30. we                                                                                                                  introduce the Bayes similarity to evaluate the retrieval quality                                                                                                                  of individual features. which further protects the fusion from outliers. In Proceedings of the IEEE European                                                                                                                       Conference on Computer Vision. IEEE Transaction on Pattern Analysis and Machine             are seen as the database images.-F. using image-to-category dis.2660244. International Journal of Computer Vision.                                           in part to Dr. no.PROCESSING TIME ON H OLIDAYS + 1M D ATASET                                                                        VI. and Oxford datasets. D. Relative attributes. Z. Parikh. 2009. It is similar on UKBench and Oxford. and S.e.-G. and Y.                                                                        no. in which             respectively. Grauman.  True matched images are marked with green dot.1109/TIP. XX. and false matched ones red.2017.html for more information. respectively. 12.This article has been accepted for publication in a future issue of this journal. MONTH YEAR                 Fig. but has not been fully edited. but republication/redistribution requires IEEE permission. See http://www. its top-10 ranked             images resulted from GIST (the first row). respectively. X. . NO.ieee. BoW (the fourth row) and ImageGraph feature fusion (the fifth row)             are shown.     1057-7149 (c) 2016 IEEE. UKBench (middle) and Oxford (bottom) datasets. Citation information: DOI 10. Personal use is permitted. Content may change prior to final publication. IEEE                                                                                               Transactions on Image Processing                 12                                                                                                IEEE TRANSACTIONS ON IMAGE PROCESSING. Examples of retrieval results from Holidays (top). HSV (the second row). CNN (the third row). VOL. For each query.2660244.org/publications_standards/publications/rights/index.  Gammeter. 2011. N-S                             3. Z.            . vol. Z. T.                                                         of the IEEE Conference on Computer Vision and Pattern Recognition.905                         81. Schmid. Yang.          1. Chum. Tian.60.920                         84. C.76                          3. Razavian.23.80                                  77. Content may change prior to final publication.5         74. and C.          6. Packing and padding: Coupled                                       quantization: Improving particular object retrieval in large scale image                  multi-index for accurate image retrieval.1109/TIP.27                                  76.                             [28]        C. Zheng. and A. pp.8         84. Ji. In Proceedings of IEEE International                  no. but republication/redistribution requires IEEE permission. Douze. and C.                      Feature Combinations                             Holidays.3368-3380. Jégou.          77. Stewenius.65                  Memory cost (GB)                          0. Yang.24                     BoW + GIST + CNN*                                          89.                                                            [30]        M. Zheng. Azizpour.             [15] C. 2004.             .                                                                              [31]        A. L. Distinctive image features from scale-invariant keypoints.                                      off-the-shelf: an astounding baseline for recognition. M. On the burstiness of visual                                            retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE                                       databases. Query                                                Co-indexing for Near-duplicate Image Retrieval. J. Tian. Z. In Proceedings of the IEEE Conference on Computer Vision                                         of the IEEE Conference on Computer Vision and Pattern Recognition. 2014. J. and Q. and Q. Wang.          0. Sullivan. Accurate image search with                  86. Schmid. R.          84.Gao. Zheng. Wang.64          87.THE . F.2017. T.82                     BoW + GIST + HSV + CNN                                     90.97                                  75.23.                  Conference on Computer Vision. Tao.920                         84. D.4            69.           -                [10] H. Deng. Lp-norm Idf for Large Scale                  neighbor: accurate object retrieval with k-reciprocal nearest neighbors. pp. O. Carlsson.916                         82. 2011. Scale affine invariant interest point                                           Workshops.                                                                                  Conference on Computer Vision.1. He.                                      to object matching in videos.                  Adaptive Late Fusion for Image Search and Person Re-identification.22                     BoW + GIST + HSV + CNN*                                    90. mAP (%)                Holidays+Flickr1M.2660244. Object                                          pp. Tian.91                     BoW + HSV                                                  87. and Q. Combining attributes and             [18] D.21                     BoW + HSV + CNN*                                           90.8.org/publications_standards/publications/rights/index. Zhang. and X.855                         79. M.120.48                          3.64         84. S.            .40                          3.843                         80.05                                  77.06         79.3                  Query time (s)                           0.             [11] L.                                                  [26]        J. Sivic. mAP (%)                 UKBench.868          0. IEEE Transactions on Image Processing. In Proceedings                  International Conference on Computer Vision. and A. Lin. H. 2013. vol.96                                  76. Mikolajczyk.                                   Image Search.92                                  77. and Q.82            . Qin. Bag-of-colors for improved image                  In Proceedings of the IEEE Conference on Computer Vision and Pattern                                       search. A. Arandjelovic. Sivic.             . S.749         0. mAP(%)                           84. Van Gool. pp. no. Sivic.8. In Proceedings                  International journal of computer vision. no. L. and Q.             .920          3. Liu. Qin.1.63. Query adaptive similarity for                                         geometric consistency for large scale image search.1         85. Lowe.413         . Wengert. M.             . and Zisserman. See http://www.                                            2007.89                          3. In IEEE Transactions on Image Processing. Wengert. mAP (%)                     BoW + GIST                                                 85.28                          3. Hello                               [24]        L. Liu.08                     BoW + GIST + HSV                                           88. and L. and C. H. N-S                   Oxford. Wang.89          84. no.36            .    1057-7149 (c) 2016 IEEE. Van Gool. In Proceedings                  elements. no.26                     BoW + GIST + CNN                                           88.             . J. Metaxas.60.                                                                              [25]        H. 2009. Isard.2             . Lp-Norm IDF for Scalable Image                  Computer Vision and Pattern Recognition.                                                                              [29]        S. S. Philbin. In Proceedings of the                  specific fusion for image retrieval. S.                  detectors. Coupled Binary Embedding for Large.25                     BoW + CNN                                                  88.75            3. Zheng. C. Zisserman.71                          3.                                                             Retrieval.841         3.html for more information. Object                  Weakly Supervised Multi-Graph Learning.703                         79.913                         84.0         42.06                                  75.89                          3. Schmid. mAP(%)                    77.09                     BoW + HSV + CNN                                            90. Douze. In Proceedings of the IEEE European                                   International Conference on Computer Vision. 2014. In Proceedings of the IEEE                  In Proceedings of the IEEE Conference on Computer Vision and Pattern                                       Conference on Computer Vision and Pattern Recognition. K. W.7                  Holidays + 1M.8        80.                  and Pattern Recognition. Quack.                  Recognition.ART.          81.4          68. In Proceedings of the IEEE Conference on Computer Vision                  Conference on Computer Vision and Pattern Recognition. In Proceedings of the IEEE Conference on                           [33]        L. Wang. 2012. and L.7            85.01                                  77. CNN features             [19] D. Wang. Semantic-aware             [17] S. 2012. Douze. Nister.           6.This article has been accepted for publication in a future issue of this journal.3604-3617.82                                                                                             TABLE IV                                                                         P ERFORMANCE COMPARISON WITH THE STATE .92            .                                                                             2007.                  to improve object retrieval.98         84. A.85          . Liu. Scalable recognition with a vocabulary tree.145            . vol.64         3. and A. Tian. Zheng.749             . Personal use is permitted.             [16] L. . 2015.ieee.                  Recognition. Isard. Three things everyone should know                                       Vision.         84. 2004. 2008.1          22. 2011. In Proceedings of the IEEE Conference on                                     the IEEE European Conference on Computer Vision. M. Zheng.                   Methods                                  Ours           [17]           [52]           [16]           [15]          [14]        [13]          [11]          [12]          [10]          [9]                   Holidays. Y.         3. Tian. Citation information: DOI 10. and S. Query.                  Recognition.80                                  77.79           3. Zisserman. Isard. O. and Zisserman. In Proceedings of ACM Multimedia. O. Visual Reranking through                             [27]        J. 2013.02                          3.             [12] D.35            .67          3. H. 2003.914                         84.31                          3. Video Google: a text retrieval approach                  Scale Image Retrieval.         0. M. Yu.            .1          . vol.08                                  69. In International Journal of Computer             [21] R.52                          3. Hamming embedding and weak             [13] D. 2008.51                          3.907                         81.91-110. Chum.0            .916                         82.                                                                                                  multi-scale contextual evidences. Bossard.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH                                                                                                                  13                                                                                             TABLE III                                                              F USION RESULTS OF DIFFERENT FEATURE COMBINATIONS ON BENCHMARKS . pp. X. In Proceedings of IEEE                                             retrieval with large vocabularies and fast spatial matching. Chum.                  Computer Vision and Pattern Recognition.                                               and Pattern Recognition.                                      Fisher vectors for efficient image retrieval. S. 2014. and D.8                  UKBench. J. and Q.         0.91                                  77. vol.83          3. Wang. 2016. Tian. Zhang. Douze. J. and Q. Tian.          75. M.2. 2013. 2013.                                   of the IEEE Conference on Computer Vision and Pattern Recognition             [20] K. S. Lost in             [14] L. Sivic. 2015. Ramisa. In Proceedings of the IEEE Conference on Computer                  In Proceedings of the IEEE Conference on Computer Vision and Pattern                                       Vision and Pattern Recognition.55                  Oxford. Cour. M. Wang.        85. International journal of computer vision. Schmid. A. M.G. In Proceedings of                  large scale object retrieval.076             . IEEE                                                                                               Transactions on Image Processing                 ZIQIONG LIU et al.1-13.0          . S. Liu. Zisserman.3              .OF . Wang. S. but has not been fully edited. 2006.             [22] J. Jégou. mAP(%)                         90. N.77           3. Jégou.                            [23]        J.                      [32]        L. Tian.17                     BoW + CNN*                                                 89. Philbin. Philbin.  Japan. Liu. and M. but has not been fully edited. Q. no. Dr. His current research interests include image processing. degree in Life Science from                  learning for collaborative image retrieval. In Proceedings of the IEEE                  Conference on Computer Vision and Pattern Recognition. T. Wang. and J. In Proceedings of the IEEE European                                                           currently a Professor in the Department of Computer                  Conference on Computer Vision. Chum.42. Australia. Computer                                                           Corporation. Li.. L. M. Hung. and Q. 1999. He was a                  Conference on Computer Vision and Pattern Recognition. but republication/redistribution requires IEEE permission. vol.3.31-41. Lew. In Proceedings                                                           Liang Zheng received the Ph. China.Zhang. Fast image retrieval: query                                  Supervised Object Localization with Progressive Domain Adaptation. . Chigorin. Yang. See http://www. 2015.             [40] L. pp. L.11. and A. IEEE Transactions on Pattern Analysis and                                                    Qi Tian (SM’04) received the Ph. in 2011.                  of Instance Retrieval.31.                                                          Tsinghua University. vol. pp. and Q. 2008.                                                                                Guest co-Editors of IEEE Transactions on Multimedia.648-659. Tian. computer                  Image Processing . Li. The MIR Flickr Retrieval Evaluation. IEEE Transaction on Pattern Analysis and Machine                  Intelligence. Yang. Huang. Brin. 2016. Kennedy. Motwani.                                                        and large scale multimedia retrieval. Personal use is permitted.124.                                               Nanjing.                  envelope. Chang.1348-1358. 2014. J. Sivic. vol. 1.                              and is the Associate Editor of IEEE Transactions on Circuits and Systems for                  Unsupervised Visual Representation Learning by Graph-based Consistent                           Video Technology and in the Editorial Board of Journal of Multimedia. Hsu. Image classification and                                                         120 IEEE and ACM Conferences including ACM                  retrieval are one. Her current                  Evaluation of gist descriptors for web-scale image search. 2014. Zhao. and Q. Huang.             [49] L. In ArXiv:1608. Zhang.Dong.D degree in Electronic                  of ACM International conference on Multimedia. N. Tian. Since September 2003.html for more information.                                                                                                                    Organization Committee Members and TPC for over             [55] L. The PageRank citation                                                                 University of Technology Sydney. Heterogeneous Graph                                                                 Engineering. Chang. Li. Babenko. N. He is the                  Retrieval. Y. no.                  2008.5.This article has been accepted for publication in a future issue of this journal.46. Yu. Zisserman.             [39] H. Spectral hashing In NIPS. and R. 2010. Y. W.                  Journal of the ACM. She is currently pursu-                  175.L. Lett. Huiskes. Jégou. Zheng. J.     1057-7149 (c) 2016 IEEE. and B.                  retrieval. 2-11. K. and M. X. L.33. Jégou. Sandhawalia.                                                              tion.117-128. Paulevé. Pattern                  Recognit. Ahuja. Tian. He is the holder of             [44] C. A. 2016. Fast and accurate near-duplicate                                                    the Internet System Research Laboratories. Isard. Weiss. He                  on Pattern Analysis and Machine Intelligence . Fergus. Zhang. M.                                               [58] D.                                                                                          Ziqiong Liu received the bachelor degree in In-             [36] A. G.                                                                   been a Professor with the Department of Electronic             [43] L. A. in             [47] S. Neural                                                        versity of Illinois. R. and person re-identification. J. Schmid. Weakly             [35] L. C. Douze. VOL. Caffe: An open source convolutional architecture for fast feature                                                       Antonio. F. Huang. Xie.                                                                    postdoc researcher in University of Texas at San             [48] Y. Hong. International journal of computer vision. Zheng.                                  vision. 37. A. Semi-supervised distance metric                                                             2015. B. Locality sensitive hashing: A                  comparison of hash function types and querying mechanisms. pp. R. no.E. he has                  Vision and Image Understanding. IEEE Transactions on                                                         more than 80 papers on image processing. Jégou and M. Tian. Zhou.                                   IEEE Conference on Computer Vision and Pattern Recognition. A holistic representation of the spatial                                                       formation Engineering from Southeast University. H. B. Tian’s research interests include multi-                  search using the contextual dissimilarity measure. Yang.                  vol. Liu.D. H. XX.                  2009. 2013. vol. ICCV. M. no. H. Session Chairs. Tian. Yang. http://caffe. 2015. vol. China. degree in Electronic Engineering of             [37] M. China. T. S. In Proceedings of IEEE International Conference on Computer                                                     Tokyo. Jiang. S. pp. Slesarev. F. 2011. His                  ranking: Bringing order to the web. Schmid.             [38] Y. Schmid. 4. Philbin. 1999.                                                                            Engineering from Tsinghua University. 2007. Wang. and S. Page. In IEEE ICASSP . and pattern recognition.D. S.1109/TIP. Urbana Champaign in 2002. C.                                                                                                              2003. Lempitsky.604-632. Tian. China.145. degree in                  Machine Intelligence. 2013. SIGIR. From May 1997 to August                  Vision. no.11. 2014. A. L. D. Luo. S.ieee. W. Y. Beijing. B. no. H. Content may change prior to final publication. pp. 2007. He is currently a postdoc researcher                  embedding. Torralba. NEC                  image search with affinity propagation on the ImageWeb.Zhang. F.berkeleyvision. In                  pruning and early termination. 2016. Xie. Amsaleg. vol. Wang.                                                                 electrical and computer engineering from the Uni-             [53] A. and EURASIP Journal on Advances in Signal Processing             [57] D. IEEE Transactions                                                    media information retrieval and computer vision.Winograd. and the B. Product quantization for nearest                  neighbor search. Video search reranking                  through random walk over document-level context graph. and S. Liu. he was a member of the research staff in             [42] L. video surveillance. no.2660244. 2008. Wang. in 2010. H. no. S. SIFT Meets CNN: A Decade Survey                                      Constraints. Y. Total recall:                                                   inghua University. NO. Hoi. Zheng. He is                  codes for image retrieval. pp. and S. and V.             [51] Z. Noise resistant graph                  ranking for improved web image search. S. 2001. W.degree from Ts-             [41] O. In Proceedings of the IEEE                                                          Tsinghua University. Cour. Jegou. Amsaleg. Visual reranking with improved                  image graph. pp. Liu.D. IEEE                                                                                               Transactions on Image Processing                 14                                                                                                IEEE TRANSACTIONS ON IMAGE PROCESSING.             [46] W. Accurate image                                                        (UTSA). J.5. S. C. W. and C. Douze and C. vol. in1997. Torralba.. classifica-             [50] J M. and J.4287-4298. Tsinghua University. Speech and Signal Processing .                  2010.E.24. Z.org/. MONTH YEAR                [34] L. S. pp. Kleinberg Authoritative sources in a hyperlinked environment.1. ACM Transactions on Intelligent Systems                  International Conference on Multimedia Information Retrieval.                                                                                     Science at the University of Texas at San Antonio             [54] H. Jia. computer                  efficient graph-based visual reranking. USA. Journal of Computer             [56] M. pp. Wang.org/publications_standards/publications/rights/index. H. H. Oliva and A. Chang. 32. 803-815.                                                                                                               ing the Ph. 2011. Bai.             [52] S. L. In Proceedings of IEEE International Conference on                  Acoustics. In IEEE Transactions on Multimedia. China.                                                 has been serving as Program Chairs. 2015. J. An                             ten patents.                  Automatic query expansion with a generative feature model for object                                                       degree from the Tokyo Institute of Technology. ICASSP. Xie. In ACM International Conference on Multimedia                                                         Multimedia.                                                                              in Quantum Computation and Intelligent Systems. and B.                                                                      vision. In European Conference on Computer Vision. In ACM                           Vision and Image Understanding. Q.                             and Technology.2017. Zhou. and pattern recognition. etc. in 1985 and the Ph.01807. Citation information: DOI 10.                                                                        Shengjin Wang received the B. He has published                  Propagation for Large-Scale Web Image Search. and Q. Harzallah. Metaxas Query Specific Rank                  Fusion for Image Retrieval. Zhao. N. B. Japan. In Proceedings                                                research interests include image/video processing                  of the ACM International Conference on Image and Video Retrieval. H.                                                                                       research interests include image retrieval.17. vol.             [45] W. Cen. Verbeek. J. 2015.