Robust ImageGraph Rank-Level Feature Fusion For

This article has been accepted for publication in a future issue of this journal, but has not beenfully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2660244, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. X, NO. XX, MONTH YEAR 1 Robust ImageGraph: Rank-Level Feature Fusion for Image Search Ziqiong Liu, Shengjin Wang, Member, IEEE, Liang Zheng, and Qi Tian, Fellow, IEEE Abstract—Recently, feature fusion has demonstrated its effectiveness in image search. However, bad features and inappropriate parameters usually bring about false positive images, i.e., outliers, leading to inferior performance. Therefore, a major challenge of fusion scheme is how to be robust to outliers. Towards this goal, this paper proposes a rank-level framework for robust feature fusion. First, we define Rank Distance to measure the relevance of images at rank level. Based on it, Bayes similarity is introduced to evaluate retrieval quality of individual features, through which true matches tend to obtain higher weight than outliers. Then, we construct the directed ImageGraph to encode the relationship of images. Each image is connected to its K nearest neighbors with an edge, and the edge is weighted by Bayes similarity. Multiple rank lists resulted from Fig. 1. Examples of good feature and bad feature. For each query, the different methods are merged via ImageGraph. Furthermore, on top-5 ranked images in the search result of good feature (the first row) and the fused ImageGraph, local ranking is performed to re-order bad feature (the second row) are demonstrated. Relevant images are marked the initial rank lists. It aims at local optimization, and thus is with green dot, and irrelevant ones red. The good features work well in that more robust to global outliers. Extensive experiments on four true match images are retrieved, but bad features rank outliers ahead of true benchmark datesets validate the effectiveness of our method. matches. Besides, the proposed method outperforms two popular fusion schemes, and the results are competitive to the state-of-the-art. Index Terms—Image search, feature fusion, ImageGraph. adopted feature is a good feature and also complementary to existing ones, a higher performance is expected. Nevertheless, I. I NTRODUCTION many irrelevant images have high ranks due to the low discriminability of bad features. If the to-be-fused feature is This paper considers the task of content-based image search. a bad feature, the fusion performance may not be guaranteed, Given a query image, our goal is to retrieve all the appearance and accuracy may get even lower after fusion. In essence, similar images in a database. Recently, multiple features are failure in predicting features’ effectiveness results in undesir- employed to boost the overall performance. To take advan- able search quality [16]. Multiple cues are directly integrated tage of complementary properties of distinct features, various without considering their effectiveness in [11, 14, 29, 30]. fusion methods are investigated, ranging from straightforward Once outliers are introduced by bad features, it is difficult to combination at feature level [30] to integration at indexing filter them out. To evaluate the retrieval quality of individual level [11, 14, 29] and merging graphs of different rank results method, consensus degree among the top candidates, i.e., [15, 17]. It is demonstrated that fusion of multiple features Jaccard similarity, is utilized [17] at rank level. However, has been pushing the state-of-the-art forward. However, false when a bad feature is adopted, outliers may be included positive images, i.e., outliers, are inevitably introduced in the into the graph. Usually there are many edges linked between fusion, leading to inferior accuracy. the outliers, called “Tightly-Knit Community Effect”. In this On one hand, outliers are often brought in by bad features. scenario, outliers may obtain higher consensus degree among For a specific query, a good feature means its search accuracy neighbors than true matched ones, yielding unsatisfactory is high by itself. By comparison, the feature yielding low performance. search quality is called bad feature (see Fig. 1). When the On the other hand, inappropriate parameter also introduces Copyright (c) 2010 IEEE. Personal use of this material is permitted. outliers. In [17, 29], K-reciprocal nearest images are treated However, permission to use this material for any other purposes must be as pseudo positive instances, and thus K shall be equal to the obtained from the IEEE by sending a request to [email protected]. Z. Liu and S. Wang are with State Key Laboratory of Intelligent Technology number of ground truths. However, it is hard to pre-define K and Systems, Tsinghua National Laboratory for Information Science and Tech- because database images commonly have various numbers of nology, Department of Electronic Engineering, Tsinghua University, Beijing ground truths. If K is inappropriate, the performance may be 100084, China (E-mail: [email protected], [email protected]). Liang Zheng is with the Centre for Quantum Computation and Intelligent affected, especially in [17]. The retrieval quality measurement, Systems, University of Technology Sydney, Ultimo, NSW 2007, Australia. Jaccard similarity, always varies with K, and gradually loses (E-mail: [email protected]). its effectiveness when K gets larger than ground truths number. Q. Tian is with the University of Texas at San Antonio, 78256, USA (E- mail: qitian@ cs.utsa.edu). Therefore, choosing an effective measurement to evaluate Corresponding authors: Shengjin Wang and Qi Tian. retrieval quality of individual features is the key issue in robust 1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. and give more detailed discussions.ieee. VOL. A toy example of our fusion system is is used as query to search for the other. we introduce the Bayes • We propose the directed ImageGraph structure to encode similarity. a good insensitive to parameter changes. the local densities of proposed to evaluate the retrieval quality of individual vectors. Through this mea. Our method uses the top-K ranked and reformulate the edge weight of ImageGraph. 54] that reciprocal neighborhood relationship • We propose an effective measurement for robust fusion. of individual features effectively. Extensive experiments on In light of the above analysis. ImageGraph 1 and ImageGraph 2 are fused by appending new nodes or re-calculating edge weights of existing nodes. The query image is marked with yellow bounding box. and relevant ones green. other two true matches are connected at the second layer of the graph. Different features may produce scores diverse in ageGraph as Bayes similarity. Since not only top-ranked images in initial search results • We propose the local ranking to rerank the initial search but also their neighborhood are included into the graph. so the evaluation scheme should measure relevant/irrelevant images than Jaccard similarity [17]. so that it is more robust to global outliers. so that more candidates (high recall) can be included conduct more experiments to better validate the effectiveness in the graph. X. surement. Besides. it is at rank level. to evaluate the retrieval quality t/irrelevant images and insensitive to parameter changes. but has not been fully edited. which is based on their ranks when each one robust to the outliers. The edge is weighted by Bayes similarity. Given a query image. Beyond the conference paper. but republication/redistribution requires IEEE permission. Built on the Rank Distance. further enhancing the robustness of our method. Besides. Based on the fused graph. which means there are 1 true match in the top-3 ranked images of initial rank list of Feature 1. It aims at local optimization. improving the recall. See http://www.org/publications_standards/publications/rights/index. Then. Then. the edge Our approach adopts the graph-based framework of [17]. denoted as ImageGraph. Although there are many outliers in the graph. It is defined as the posterior probability of two image-level relationships. 2. Besides.e. The proposed ranking algorithm aims at local optimiza- true matched images not directly connected to query can be tion. Through these relevant images. weight of ImageGraph is measured by Bayes similarity. Bayes similarity is reciprocal ranks of two images. the corresponding ImageGraph is built. local ranking is conducted and the images are reranked. thus being comparable. Content may change prior to final publication.This article has been accepted for publication in a future issue of this journal. In contrast with [17]. Consequently. In ImageGraph 2. and thus more candidates can be included estimate the Bayes similarity through empirical study. Rank Distance and Bayes similarity for robust evaluation.2660244.2017. ImageGraph builds on K near- images being true match. the undirected graph proposed in [17] This paper is an extension of our previous conference builds on K-reciprocal neighbors that may result in low search publication [51]. it is more reliable to represent the relevance of images features. Since Rank Distance considers the of images on rank level. we propose recall. 2. Toy example of the proposed method. we define the edge weight of Im. local ranking is proposed to assigning higher weight to relevant images under good features re-order the initial result. in the graph. is more robust to global outliers. this paper first proposes four image retrieval datasets confirm that the proposed method the Rank Distance to measure the relevance of two images significantly improves baseline performance. we construct a directed graph. NO. and thus and lower weight to highly-ranked outliers under bad features. i. MONTH YEAR Fig. two features are used to obtain search results. We also images. XX.html for more information. we est neighbors. The rest 1057-7149 (c) 2016 IEEE. retrieved. Citation information: DOI 10. fected by the outliers in reranking. it is follows: illustrated in [12. and is the importance of images in the unified scale. a better discriminator between numerical values. Besides. similarity can be propagated through graph. In the ImageGraph. Personal use is permitted. Nevertheless. reflecting the retrieval quality. Based on it. for each feature. we observe that only 1 relevant image is directly connected by query. to avoid being af- evaluation should measure features’ effectiveness correctly. similarity scores of different features are mapped The main contributions of this paper are summarized as to the unified scale. is a stronger indicator of similarity than unidirectional nearest Rank Distance is first introduced to measure the relevance neighborhood relationship. of our method. Moreover. each vertex points to its 3 nearest neighbors and the graph is expanded to the second layer. In ImageGraph 1.1109/TIP. query points to two relevant images directly. IEEE Transactions on Image Processing 2 IEEE TRANSACTIONS ON IMAGE PROCESSING. the result. all the true match images are retrieved. fusion task. which is a good discriminator between relevan- than similarity score. . In addition. illustrated in Fig.. signature is embedded in the inverted index to filter out false Additionally. reranking based on complementary cues [15–17]. Then. Instead. In [28]. which are usually combined with a multi-IDF scheme is introduced in [11].html for more information. From treated as visual words of the codebook.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH 3 of the paper is organized as follows. Section IV describes the datasets graph. we adopt the directed graph model. 32. outliers can be removed from the graph salient local regions are detected from an image with operators by spectral filtering. we introduce the proposed robust ciently model similarity of Google image search results with ImageGraph in Section III. attribute vector to post-processing method. [17] propose a undirected graph-based query specific and TF-IDF [23. This method improves performance for particular object [2. approximate k-means (AKM) [22] matched features between pairwise images. a graph theoretical in the last decade. Graph-based Ranking the multi-index. integrating indexing algorithm [29] leverages global semantic attributes both initial ranking and visual consistency between images. The edge weight is measured the initial results using spatial cues. fusion approach. conclusions are given in Section search reranking is also formulated as a random walk problem VI. The initial rank information is For the late fusion. and edge weight is computed as the count of clustering method. . a safe strategy is used for ranking. in automatically using the K nearest neighbors. 4. A directed graph is codebook. Feature Fusion improved and the system is able to find quite challenging It is indicated that combination of multiple features obtains occurrences of the query. the Graph-based method has also received increased attention extracted regions are represented as high-dimensional feature recently in content-based image search. Specifically. there are also efforts to represent the image using feature is compressed into small codes by product quantiza- their global properties. and cluster centers are [50] is employed to rank images using the affinity values. a myriad of methods have been proposed to the web image search. [14] propose a multi-dimensional inverted index. through which multiple retrieval sets are It is verified in many works that post-processing can further merged. The edge between videos is weighted by the linear combination of text score and visual duplicated II. et al. a graph based semi-supervised learning [47] is applied In image search. Then. with relatively smaller bits. In addition. R ELATED W ORK score. Each descriptor the ImageWeb to discover the nature of image relationships is quantized to its nearest visual word in the pre-trained for refining similar image search result. and re- and baselines used in the experiments.2017. Another promising strategy Such holistic features demonstrate their advantages in image performs feature fusion on indexing level. After a brief review of Baluja [5] have proposed a VisualRank framework to effi- related work in Section II. retrieval as well as categories. a weakly supervised multi- the recall. retrieval process votes for images in both Graph-based visual reranking has been proven effective to SIFT and other feature spaces. Bag-of-Words model [23] framework amenable to noise resistant ranking is proposed in based on local descriptor is the most popular one. i. 7] and deep learning features [31. 34. and fisher vector are combined at feature level. Jaccard query expansion [41] uses highly ranked images to learn a similarity. with the consistency of their neighborhoods. to update inverted indexes of local features. They also serve as good complements to local ones. 27] Besides. visual attributes tion. A number of [45]. improving method. The fused Besides. A quite few works refine neighbor relation are connected. Recent study of reranking adopts image-level cues. 33] weights. Zheng et al.1109/TIP. Jing and based query specific fusion approach at rank level. the graph-based perspective. [42] employ vectors using SIFT [19] or its variants [21]. color search. It uses the random walk on an affinity graph. 37]. In this 1057-7149 (c) 2016 IEEE. To some extend. [12] take advantage of K-reciprocal nearest neighbors to Specifically. Furthermore. Through combining the rank lists or scores of multiple features. Zhang words. which have shown promising performance. Qin which an image is connected to its top-K ranked images.. Among them. through which dimensionality reduction and approximate nearest search [38– different binary features are coupled into the inverted file. In [46]. global features are effective to encode images positive SIFT matches. IEEE Transactions on Image Processing ZIQIONG LIU et al. In this approach. but republication/redistribution requires IEEE permission. semantic-aware co- refine text-based video and image search results. encouraging It constructs a graph where pairs of visually similar images semantic consensus among local similar images. With B. but has not been fully edited. 57.ieee.g. See http://www. Zhang et al. such as GIST [36. Citation information: DOI 10. The codebook is obtained through unsupervised constructed. e. Personal use is permitted. Xie et al. fast search is achieved using inverted file [35] et al. 58]. images satisfying the reciprocal enhance the quality of search results. 40]. [17] propose the graph- propagated through the graph until convergence. Image Search Pipeline with visual feature. K-NN reranking [8] refines the initial rank list performance. graph learning is proposed in [15] for enhancing the reranking For example. Here.org/publications_standards/publications/rights/index. and each dimension corresponds to one kind of feature. along the context graph.2660244.This article has been accepted for publication in a future issue of this journal. the recall is significantly C. Subsequently. Then images are re-ordered through link analysis latent feature model to expand the original query. In this method.e. Section V presents the orders images according to the visual hyperlinks. visual duplicated score is the similarity calculated A. video experimental results. Based on this framework. In addition. such as DoG [19] and Hessian Affine [20]. the edge is weighted by Bayes similarity. Finally. incremental query expansion and each image is represented as a sparse histogram of visual image-feature voting are developed in [43]. such as [25. Content may change prior to final publication. Further. are connected by an edge.. To handle the errors in the initial labeled set. 24. the HITS and hierarchical k-means (HKM) [18]. To model correlation between features. Through quantization. In [30]. Alternatively. our method belongs superior performance in image search. many works conduct the to be robust to outliers. identify the image set. Moreover. Nk (Im ) K nearest neighbors of Im . we find its K nearest neighbors is used as query to search the other. especially when retrieval quality is the fused ImageGraph. NO. Personal use is permitted. X. i. and NK (Im ) denote the K nearest neighbors of Im .. To be resistent to the noise. the set of edges. which can be defined as: not imply Im ∈ NK (In )... images being reciprocal K-nearest neighbors are For method i. Firstly. our method uses the K nearest neighbors. Moreover. to be robust in the fusion. Rank Distance can be in the database. D). denoted as ImageGraph. similarity in Section III-A and Section III-B.. MONTH YEAR TABLE I NOTATIONS AND DEFINITIONS Notation Definition I=(I1 . Instead. and level fusion method without supervision. incorporating intra-graph and inter-graph constraints in a supervised way.. we formulate our N is the number of dataset images. we propose a rank- laborate construction of ImageGraph in Section III-C. and Ii indicates the i-th image. The throughout the paper in Table I. Gs = (Vs . Di ). Finally.2017. Then.. and thus similarity score is not reliable to represent the improves the robustness of our method. D represents the relevance among R(Im . (1) reciprocal neighborhood. To address this issue. we do not require resulted from M different methods. work of [17]. w) Subgraph of ImageGraph G induced by the vertex set Vs ∈ V ..ieee. a more effective measurement to evaluate importance of them.. lows. In ∈ NK (Im ) ∪ Im ∈ NK (In ). Then we e- Differently. G2 . (2) is achieved by PageRank or Maximizing Weighted Density. F (Im ) False match image set of Im .. we reciprocal neighbor relation.e. Content may change prior to final publication. Through a reference codebook r = g(G).. relevance between images. Citation information: DOI 10. . IN ) I indicates the image set. The edge is weighted as Jaccard is constructed based on rank result ri and the pre-computed similarity for evaluating retrieval quality of individual feature. r2 . E and w indicate the set of vertices. images is denoted as D. Secondly. In ) Rank Distance between image Im and In . where i = 1.1109/TIP. method. Since local densities of problem here.html for more information. but republication/redistribution requires IEEE permission. . G can be written as the combination of multiple with the parameter K. Im ) the database images. P The depth of ImageGraph.. is a much stronger indicator of two images being relevant than where R = {r1 . we calculate the distance take each image in the database as query and get the search of two images based on their ranks obtained when each one result. thus more candidates can be included. but has not been fully edited. In ) Rank of In in the rank list of Im being query. its ImageGraph Gi connected with an edge. (5) 2N 1057-7149 (c) 2016 IEEE. we propose the Rank Distance to serve as a rank-level measurement. . local ranking is performed on tains false positive images. K The breadth of ImageGraph. A. it is difficult to compare or weight the on Rank Distance. E. we first present Rank Distance and Bayes the-fly in a query-adaptive manner.. which aims at local optimization and bad. However. former [17] builds on K-reciprocal neighbors that may result in low search recall. and V .. GM : features. . and reranking Gi = ψ(ri . Our target is to obtain a new rank according to vectors round Im and In are different. The pre-computed search result of database defined as below. For clarity. respectively. features’ effectiveness are estimated on- In this section. In ) + R(In . It is demonstrated in [12. 54] that r = h(R. In the offline process. I2 . In ∈ NK (Im ) does multiple search results. IEEE Transactions on Image Processing 4 IEEE TRANSACTIONS ON IMAGE PROCESSING. We adopt the frame- introduce fusion via ImageGraph in Section III-D. N Total image number of dataset R(Im .. IN } denote the image dataset. N .. the new rank list is calculated through ranking on Furthermore. edge weight Since different features may produce scores diverse in between pairwise images is defined as Bayes similarity built numerical values. . (4) constructed off-line. M .. See http://www. Finally. level is proposed in [16]. in contrast with the undirected graph used in [17]. where m = 1. . XX. Es .2660244. a Co-Regularized Multi- G = φ(G1 . d(Im . VOL. Let III. . Es contains every edge between the vertices in Vs d(Im . G = (V. w) G indicates a graph.. 2. relevance among the database images Di . I2 . 2. Thirdly.org/publications_standards/publications/rights/index. and the corresponding edge weight.. a simple and effective fusion method at score the ImageGraph G. and our work departs from the prior arts as fol- ranking algorithm is described in Section III-E.This article has been accepted for publication in a future issue of this journal. we illustrate several important notations and their definitions we construct a directed graph. Before describing our approach in detail. GM ). O UR M ETHOD I = {I1 . T (Im ) True match image set of Im . In this paper. Instead. initial search list usually con- the retrieval quality.. In ) = . in our approach. (3) Graph Learning framework [15] is proposed. G2 .... rM } denotes the set of rank lists unidirectional neighborhood. Rank Distance instead of Jaccard similarity [17]. the effectiveness of this method varies dramatically Specifically. which is written as: Multiple rank lists are merged through graph. . for each image. it also suffers from bad individual graphs G1 . Here. It also implies Bayes similarity between image Im and In can be formulated that Rank Distance can reflect relevance between images to as p(In ∈ T (Im )|d(Im . while Fig.1109/TIP. we set the ratio of p(dn |In ∈ Tm ) × p(In ∈ Tm ) p(In ∈ Fm ) to p(In ∈ Tm ) as 500. Content may change prior to final publication. For simplicity. Here. Moreover. 5.html for more information. both R(Im . In ) presents the rank of In in the search result of contains 6385 images from Flickr by searching for particular Im . Citation information: DOI 10. p(dn |In ∈ Tm ) and p(dn |In ∈ Fm ) are connected with an undirected edge. recall may images. 3 illustrates the effectiveness of Rank Distance. we get As α is a constant. If and only To obtain the rank list. but has not been fully edited. In )). It is featured by 55 queries of 12 different In this scenario. distinguishing true matches from outliers. the ratio of p(In ∈ Fm ) and p(In ∈ Tm ) is not be guaranteed. σ). We find in our preliminary p(In ∈ Tm |dn ) = . It is clear that Cosine distance pushes outliers in top ranks. we use an independent determined by parameter K. thus we match [8].This article has been accepted for publication in a future issue of this journal. the percentage decreases rapidly with Rank Dis- To evaluate the retrieval quality effectively. is measured by neighborhood consistency. and the choice of feature R(In . 5. respectively. each image is taken as query if In is ranked among the top search results of Im and Im is using BoW feature. In ). It demonstrates that Rank Distance can evaluate the similarity row show false matches of “Eiffel”. false matches follow a normal distribution the Bayes similarity. 1] using N . Sample images in the Paris dataset for empirical study. we can easily find out two distributions have a clear separation.2017. the top-5 ranked images under baseline Cosine distance are illustrated. 4. the Paris 6K. the neighborhood consistency is generally a very large term. its percentage value is usually below 3%. 6 and Eq. we propose tance. and vice vasa. 6 . We can observe that view Rank Distance corrects this artifact by increasing the distance between outliers point and illumination vary a lot among true matches. according to Eq. 4.. to perform empirical study. smaller Rank Distance denotes that the two landmarks. We denote the true matches and false matches of Im as more than 50% true matches have the Rank Distance smaller T (Im ) and F (Im ). Personal use is permitted. It is observe As In either belongs to Tm or Fm . Some sample images are shown in Fig. (9) + p(dn |In ∈ Fm ) × p(In ∈ Fm ). For the Rank Distance distribution of true B. In experiments.e. True match images are marked with green dot. This dataset where R(Im . The third and bottom and the query. the two images are considered to be a true does not have a obvious impact on the esimation. As a result. Im ) are small. we define dn = some extend. Tm = T (Im ) and Fm = F (Im ). but second row demonstrate true matches of “Eiffel”. the estimated Bayes Bayes Theorem. IEEE Transactions on Image Processing ZIQIONG LIU et al. distributions of prior probability density are drawn in Fig. based on which. dataset. Paris landmarks. which is defined as the probability of In N (u. this measurement is robust to outliers. See http://www. compared to 4% of false matches. The edge weight are the prior probability distributions of dn . we compute its Rank Distance to the which helps find true match images. we have that the distribution can be approximated by the following: p(dn ) =p(dn |In ∈ Tm ) × p(In ∈ Tm ) α (7) p(In ∈ Tm |dn ) = . but republication/redistribution requires IEEE permission. C. p(In ∈ Fm ) p(dn |In ∈ Fm ) −1 p(In ∈ Tm |dn ) = (1 + × ) . images being K-reciprocal neighborhood relation In the Bayes similarity. K- tions can be estimated through empirical study. In ) and features follow similar distributions. n) to [0. the two outliers. images are more visually similar. In contrast. than 0. use BoW here. Typically. reciprocal neighborhood relation may filter out some potential the number of true matches is far less than the false match candidates in the construction of graph.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH 5 Fig. n) + R(n. we set α as 1 for convenience. Examples of Rank Distance. The numbers under the images denote their Rank Distances (10−5 ) to the query. Then. According to the Consequently. where many outliers are introduced. Hence. The top and outliers red. and thus the fusion performance 1057-7149 (c) 2016 IEEE.04. These distribu.5. i. p(In ∈ Tm |dn ) can be rewritten as: similarity is shown in Fig. In specific.org/publications_standards/publications/rights/index. For each query. (6) experiments that the ratio of p(In ∈ Fm ) to p(In ∈ Tm ) does p(dn ) not have a significant impact on search accuracy. dn By combining Eq. From Fig. Based on the Rank Distance. in which u is about 0.ieee. Namely. 7. 8. meanwhile down ranking true matches and false matches. Moreover. we found that different also ranked high in the rank list of In .2660244. we normalize R(m. Bayes Similarity matches. Fig. 3. However. effectively. d(Im . Construction of ImageGraph p(In ∈ Tm ) p(dn |In ∈ Tm ) (8) In [17]. . As the variance σ is relative being the true match of Im large. 10. It sorts the nodes by their degrees. E. In ) ∈ E linking from vertex vm to vn . V = {v1 .3 G = (V. .6 0.02 The ImageGraph centering at query q can be represented as Percentage Percentage 0.02 0.2 E is the set of edges. not be retrieved in both feature spaces.04 Algorithm 1. The linking edge is weighted as Bayes similarity. 0. However. add a directed edge from its vertex vm to each vertex corresponding to the top-K ranked image in the pre-computed search result w(vm . 5. IN }.06 m between starting vertex q and terminal vertex. take each image as query and get the search result. Fig. vn ).1 Experimental Distribution vertices. vn ) = 0 otherwise Fig.05 n of ImageGraph construction using one feature is illustrated in 0. we take into account the K nearest neighbors and 0.8 1 2N Rank Distance Rank Distance R(Im . graph is denoted as G = (V. Probability distribution of Bayes similarity. For query q.9 1 1 Given a dataset I = {I1 . vn ) = Σi wi (vm .01 directed edge (Im . 5 in Eq. Large K or bad as: features may bring a lot of outliers into the graph.2017.07 threshold. K denotes the breadth of ImageGraph. 9. . The edge is weighted as Bayes similarity according to Eq.025 build a directed graph. XX. corresponding to the top-K ranked images. where m = 1. but has not been fully edited. On the contrary. 3 until the depth of graph achieves P. Thus. weight after fusion.html for more information. Therefore. and the weight is the sum of wi . E = ∪i Ei . w). ..2660244. The vertices in the first Bayes similarity layer then continue to link their K nearest neighbors as child 0.6 0.6 0. vN } indicates the set of 0. forming the first layer of ImageGraph. The fused which is the sum of weights from connected edges.. Substituting Eq. Since 1 For query q. E. The edges are computed as Bayes similarity according searched in multiple feature spaces.. Rank Distance distribution of (a) True match images and (b) False (10) match images.03 0.. Fusion of Multiple ImageGraphs PageRank [49] is a query independent link analysis method. N .015 vertices. where vm is the corresponding vertex of image Im ..1 0. depth of ImageGraph means the shortest path 0. more comprehensive relationships between im- 2 Add a directed edge from its vertex to each vertex ages are represented in the fused ImageGraph. 10. p = 8*10−4/dn 0. Usually.2 0. and ranks a subset of graph we combine the multiple graphs Gi = (Vi . As rank result is encoded in ImageGraph. Personal use is permitted.1 0. 6.ieee.3 0.03 structure to encode the image-level relationships. E. their edge weight would 5 Output the ImageGraph G = (V.8 0. 2 For each image Im . 2.4 0.09 through which the retrieval quality is evaluated.1109/TIP. we have. w).. as negative images can 4 Repeat step. 1057-7149 (c) 2016 IEEE. Citation information: DOI 10. . more 3 For each new added vertex. ImageGraph is 0.. In our ap. The new added vertices which are challenging to be searched using one feature may form the first layer of graph. we propose the ImageGraph (a) True match (b) False match 0. which are assigned larger to Eq. MONTH YEAR is sensitive to K. be relatively smaller. Here. The vertices and edges of fused ImageGraph are the union On-line: set of individual graphs. To this end.15 0. positive images with high similarity are easily to be D.2 0.4 0. Content may change prior to final publication. Local Ranking D. improving the recall.35 0. compute its search result with the given each graph illustrates the image-level relevance in different feature. V = ∪i Vi . density [17] starts from query.Im ) if In ∈ NK (Im ) w(vm .08 expanded in this manner until the depth of graph P achieves a p(I ∈ T |d ) n 0.This article has been accepted for publication in a future issue of this journal. Calculate the due to complementary nature of multiple features. w).org/publications_standards/publications/rights/index. E. (11) there are many edges linked among these irrelevant images. In this way.8 1 0 0. wi ) obtained related to the specific query. which can be written these ranking methods suffer from outliers.45 proach. candidates edge weight according to Eq.25 0.4 dn 0. X. Ei .5 0. v2 . IEEE Transactions on Image Processing 6 IEEE TRANSACTIONS ON IMAGE PROCESSING. On one hand. 0 0 0 0.5 0. 10. VOL. NO. (12) D. 0. 0.In )+R(In . add directed edges from it to candidates are included.01 Algorithm 1 Construction of ImageGraph 0 Off-line: 0 0. See http://www.. be easier to be retrieved in another feature space..2 0. On the other its top-K ranked images in the pre-computed search result hand.05 images. with different features without supervision [17]. we can fuse ranking on the whole graph. Ranking by maximizing weighted multiple rank results efficiently via graph fusion. feature space. 0. its top-K ranked images are connected by q. The 0. If In belongs to NK (Im ). I2 .7 0. but republication/redistribution requires IEEE permission.005 edge weight w is defined as the Bayes similarity of connected 0. the precision is improved. there is a 0. The algorithm 0.4 0. w). Each object has 4 images with different the retained feature CNN* enhances the original performance viewpoints and illuminations. i. V components respectively. 2 and step.vn )∈Esi 256 × 256 following [16]. rootSIFT [21]. 1 Initialize subgraph G0s as ({q}. It is calculated as the area under that global features. For local optimum instead of global maximum. Most queries have 34.200 images consistently on the three datasets. HSV. Features and Baselines which introduces the maximum weighted edges is included In this paper. Datasets Search results on three datasets are presented in Table II. subgraph G0s = ({q}.31% in mAP and Flickr 1M [25]. HSV into Gi+1 s . on the three datasets. The subgraph Gs = (Vs . At the i + 1th iteration. 13. and 75. the performance of CNN is improved UKBench The UKBench dataset contains 10. histogram using 20×10×5 bins for H. images by searching for particular Oxford landmarks from To tackle this problem. See http://www. which is the recall of top-4 candidate images. tures.vn )∈Esi+1 (vm . w) and C 0 as the CNN For an input image. a 200K codebook is Gs satisfies user’s requirement. we compute a 1000-dim HSV color local ranking is illustrated in Algorithm 2. we exploit four features: GIST [36]. trained on Flickr60K [25] dataset. burstiness strategy [10]. and 12. It is because most images in Oxford contain are averaged.Besides. vertex in C i which could intro.. This dataset has a comprehensive ground truth for 11 local-based ranking.96% in mAP less than 4 ground truth images undergoing various changes.2017. multiple assignment IV. vsi+1 = arg max w(vm .: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH 7 which is called “Tightly-Knit Community Effect”. we resize the images to vsi+1 ∈C i (vm . IEEE Transactions on Image Processing ZIQIONG LIU et al. Retrieval accuracy is measured by mean Average we aim to find the subgraph Gs starting with q. on the Oxford. link analysis method [17] may lead to deviation from query Oxford The Oxford Buildings dataset consists of 5.491 personal GIST leads to poor performance on these datasets. In this dataset. In this serves as a query. and C 0 contains the vertices connected by q. 1057-7149 (c) 2016 IEEE.05% in mAP. cosine distance is defined as the duce the maximum weighted edges is introduced into Gi+1 s similarity function of images. Hamming threshold and weighting parameter are set to 52 and 26. respectively. 128-bit Hamming signature 5 Output Gs . vn ). Personal use is permitted.Network [48]. CNN feature following [53]. Citation information: DOI 10. Similarly. each image by about 10% in mAP. we also fine-tine the according to Eq. Holidays The Holidays dataset consists of 1. the maximum weighted. Es .14% in mAP. ∅. By contrast. To evaluate the effectiveness of our approach. Since a higher from different viewpoints. Based on cosine distance. The algorithm of HSV For each image. UKBench [18].583 in N-S score. Es contains every edge arbitrarily collected from Flickr. w) is Flickr 1M The Flickr 1M dataset includes 1 millon images induced by the vertex set Vs ∈ V .856 in N-S score. denoted as vs i+1 : Histogram. to their order of being incorporated into Gs . respectively. Ranking by maximizing weighted density [17] and (maximum 4).2660244. which is Precision (mAP). Convolutional Neural Network (CNN) and Bag- X X of-Words (BoW). their APs well on Oxford. nearest neighbors satisfies user’s requirement. yielding mean Average Precision (mAP). Moreover. Oxford [27] 80. 3. . The l2 -normalized histogram is used for nearest Algorithm 2 Local Ranking neighbors search with cosine distance. The nodes are ranked according search is performed. The performance is measured by N-S score situation. each containing 5 possible queries. Content may change prior to final publication. For all the query images. The vertices are ranked according to their [25] of each SIFT descriptor is embedded in the inverted file order of being incorporated into Gs .This article has been accepted for publication in a future issue of this journal.html for more information. we adopt a safe strategy to perform Flickr. but republication/redistribution requires IEEE permission. Specifically. of 2.e. as CNN*. it has a lot of true match images. The proposed ranking only considers different landmarks. These dataset can be added between the vertices in Vs . do not work the Precision-Recall curve. 4096-dim CNN descritpor from the 6-th layer in the Caffe 2 At the i + 1th iteration. mAP buildings. Specifically. We also define the candidate set into the above datasets as distractors for large scale experi- C as vertices that Vs points to. After fine-tuning. 0.062 and up-rank the noise.The re-trained feature is denoted 3 Update Gi+1 s and C i+1 . vn )− w(vm . A.ieee. GIST and CNN. Note performance of each query. 1. obtaining experiments on Holidays [25].1109/TIP. on Holidays. which are taken confused by the tightly connected outliers.550 objects. DATASETS AND BASELINES [9] and pIDF [24] are employed on both dataset to enhance the performance. An l2 -normalized 512-dim GIST (13) descriptor is extracted for each image using 4 scales and This procedure continues until the number of nodes in Gs 8 orientations. but has not been fully edited. the vertex in C i B. HSV and CNN The Average Precision (AP) is used to evaluate the retrieval result in moderate accuracy on Holidays and UKBench. we extract the l2 -normalized vertices that q points to. or distortion. respectively. UKBench and Oxford. 3 until the number of nodes in BoW For Holidays and UKBench. to filter out false matches. we conduct It shows that BoW achieves good performance. GIST To compute GIST descriptor. 1M codebook is trained on Paris6k dataset [26]. avoiding being each query. 4 Repeat step. It yields holiday images and 500 of them are queries.org/publications_standards/publications/rights/index. naturally. Moreover. Some of them have partial occlusion edge weight reflects a higher relevance to the query. which are difficult to be described using global fea- is employed to measure retrieval accuracy of the dataset. S. we initialize ments. For Oxford 5K. Moveover.008 in N-S score. baseline by 4. by combining GIST. enhanced to 90. the performance of K = 4 is not 0.28%.55%. IEEE Transactions on Image Processing 8 IEEE TRANSACTIONS ON IMAGE PROCESSING.2017.195 3. score fusion is superior to graph fusion for performance is no longer increased. with re-trained CNN feature.1109/TIP.856 3.27%. graph fusion. Parameter Tuning Moreover.14 61. the true match graph fusion and score fusion. the performance of BoW.96% in mAP. score fusion. By 0. When BoW is fused with are included into ImageGraph. C. respectively.34 80. the mAP is each feature combination.56 54. in the same scenario. which is a significant improvement compared to K limits the property of depth P. 87. K implies the in mAP. 3.This article has been accepted for publication in a future issue of this journal. On K= 4 K = 20 Oxford. Personal use is permitted. its performance is decreased by 5% in mAP. However. we set K = 10 CNN. Similar phenomenon can be images which are filtered out by small K.907 in N-S score when 81 K= 6 72 K = 30 fused with CNN. UKBench and Oxford. 8. P is set as 8 in our experiments. We speculate that breadth by 5. and 8.14 75.48% in UKBench. In our method. When fused with be retrieved using a large P. The mAP first increases with P. and GIST are fused on (a) Holidays and BoW and CNN are fused on (b) Oxford.88% and 9. we compare number of ground truths of Holidays dataset is very small. then features are fused with BoW on Holidays and UKBench.01% and 1.397 3. 7. On Holidays. and 88. their combination achieves an mAP of 79. When all potential candidates are retrieved. The N-S score mAP(%) mAP(%) 83 76 of BoW is enhanced from 3. we set different K for them. In addition. the combination of BoW with GIST. 1057-7149 (c) 2016 IEEE. mAP(%) 12. Content may change prior to final publication. 3.08%. BoW 80. we first evaluate than score fusion. mAP(%) 34.48% in mAP. the performance of BoW is not affected in P P fusion with these features. 9.06%.org/publications_standards/publications/rights/index. For HSV and CNN which have moderate performance. 6. By comparison. HSV. the performance is further boosted.92% Graph: breadth K and depth P.98%. BoW 82 74 K = 10 K= 8 K K = = 50 40 performance is further improved to 3. but republication/redistribution requires IEEE permission. Comparison with Other Fusion Approaches The experimental results are demonstrated in Fig. B. and 81. the results of multiple features fusion are demonstrated in Table III. and CNN obtains the mAP of 79. As the To further illustrate the strength of our method. BoW Holidays.html for more information. Furthermore. but has not been fully edited. since more top-ranked candidates from bad features on this dataset. respectively.87%. X.05% in mAP on Holidays.29 44. 7. are more difficult to observed when BoW is combined with HSV. In particular. our results are further Two parameters are involved in the construction of Image. It is evident that the fusion brings consistent benefit to various feature combinations. respectively. We use their released code It is notable that increase of K and R jointly helps to and default parameter in the following experiments. NO. The reason lies result comparisons are presented in Fig. In addition.46%. shown in Fig.ieee. number of candidates connected with a vertex. See http://www.2660244. Fig. P means the distance that affinity are propagated on the graph. HSV. 87. thus the fusion. Instead. Datasets GIST HSV CNN CNN* BoW On Holidays.502 3. Note that fusion of two global features also Oxford.916 in N-S score and 82.28% in mAP.920 in N-S score and 84.05 performance is boosted to 85. respectively. The fusion of four features V. We test different combinations of K and R on Holidays and Oxford. It improves the individual baseline of HSV and CNN (a) Holidays (b) Oxford 86 82 by 17.582 mAP. closing to the maximum N-S score 4.46%.22%. respectively. fusion [17] and score fusion [16]. 7. XX. score fusion improves BoW baseline by Specifically. and CNN. saturation when K = 10 on Holidays and K = 40 on Oxford.71%. recall is boosted. and K = 40 on Oxford. When multiple features are fused. by taking use of four features.22 72. and our results with two state-of-the-art fusion approaches: graph Oxford relatively large. It shows that on in that the true matches which are not directly retrieved by both datasets our method outperforms graph fusion and score query can be sufficiently exploited with a large P.83%. After 85 80 fused with GIST. Fusion results on three datasets are GIST is slightly higher than BoW baseline. Noticing that graph fusion suffers enhanced when K gets large. score fusion with the fusion of two features. HSV. although global features have poor discrimination for 80 2 4 6 8 10 12 14 16 70 2 4 6 8 10 building images. Similar results can be observed on UKBench. respectively. on Holidays.26%. N-S 1. and CNN are 84 78 increased by 5.51%. score fusion and our method boost BoW for Holidays and UKBench. and 90. Then.21 69. our method enhances the performance improved even if P increases to 16.582 to 3.703 and 3. When bad feature GIST is merged. VOL. respectively.31 boosts the overall performance. E XPERIMENTS achieves 90. .09%. A. and our method gain the mAP of 83. graph fusion. Namely. the performance of graph fusion is better To verify the effectiveness of our method.96 13.834 through the K = 12 K = 60 fusion with GIST and HSV. the fusion still yields stable improvement.89% in mAP. MONTH YEAR TABLE II P ERFORMANCE OF BASELINES ON THREE DATASETS . Citation information: DOI 10. Fusion Results On UKBench. Multiple improve the performance. the performance reaches GIST. respectively. and generally keeps stable when P becomes large. respectively. mAP results against different values of breadth K and depth P. In particular. From these results. org/publications_standards/publications/rights/index. “B+G”. natural noise is more difficult to tackle. the retrieved results are replaced with randomly assigned values. 10. In comparison. “BoW +GIST”. our method keeps the performance of 3.. due to the higher recall brought by ImageGraph. Comparison with graph fusion ([17]) and score fusion ([16]). Thus. 9. 10 that graph fusion is very sensitive to parameter K. .328 In summary. but has not been fully edited. while graph fusion decreases to 48. when K is larger than the number of ground truths. 0. The outliers in ImageGraph are introduced from two ways.This article has been accepted for publication in a future issue of this journal. “BoW + CNN”. Specifically. respectively. The green bar and blue bar represent result of the first feature and the second feature. On Holidays. and 74. “B+G+H” and “B+G+H+C”. Content may change prior to final publication. It implies that the gray bar show the results by graph fusion. Personal use is permitted. and then keeps stable. On UKBench. However. the outliers refer to the natural noise. 88.325 in N-S score through graph fusion. Compared to the random noise.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH 9 Fig.. respectively. In the experiments of [52]. and our method. compared to graph fusion. fusion with CNN brings the benefit of 0.228. IEEE Transactions on Image Processing ZIQIONG LIU et al.2660244. our method not only and 3. the performance drops significantly. On one hand. we compare our results with graph fusion [17]. but republication/redistribution requires IEEE permission.58%. while the blue bar. respectively. When combined with HSV. “BoW + HSV”. we first evaluate the fusion results when K varies. In our method.581 of graph fusion. i. which is illustrated in Fig.301.10% our method. Good features to the combination of “B+G+H” and “B+G+H+C”. Six feature combinations are presented. i. 3. a lot of outliers would be included into the graph. respectively. but also brings about superior improvement.69%. and respectively. graph fusion method achieves the best performance when K respectively. D. is resistant to bad features.18% Similarly.746. bring further benefit in the fusion. Fusion results of two features on (a) Holidays. our method yields better performance. which exists in the original rank result.1109/TIP. “HSV + GIST” and “CNN + GIST”. compared by 0. See http://www. 8.96%.904 in N-S score at K = 20. the performance of our method increases BoW is increased by 0. 3. It is shown in Fig.173. compared to 2. 67.916. orange bar and then drops after reaching a peak at K = 4.html for more information.836 3. for the three com- the three methods gain the N-S score of 3. compared to score fusion.ieee. score fusion. (b) UKBench and (c) Oxford. The K gets large. score fusion and our method. score fusion. its performance decreases when Fig. and 0. respectively. Additionally. It is showed in [52] that the graph fusion approach is robust to random noise.678. On UKBench. In order to validate our method. when K becomes large and more outliers are introduced. and binations. fusion with bad graph fusion and our approach enhance the BoW baseline feature. Citation information: DOI 10.04% and 90. our method yields the mAP of 84. and 3. random noises are added to the rank results of features. Evaluation of Robustness In this section. “HSV + CNN”.22.121 in N-S score. When all features are fused together. On Holidays. Five feature combinations are presented on (a) Holidays and (b) UKBench.e. Natural noise is usually caused by the feature itself. using graph fusion.089 and 0.841. 0.2017. when K = 20. while yellow bar shows fusion result. and 0. leads to more rapid descend. we demonstrate the robustness of our approach to outliers.e.252 in N-S score with K.894. In addition. respectively. It illustrates the robustness of our 1057-7149 (c) 2016 IEEE. using combination of “B+G”. N-S score first rises with K and yellow bar represents the BoW baseline. 3. is about the ground truths number of the dataset. 92% on Holidays. Content may change prior to final publication.920. On Oxford. Fusion results with various K on (a) Holidays and (b) UKBench. 11. More- over.14%. UKBench. where L is the expected retrieved image number. the refined by 0. 1057-7149 (c) 2016 IEEE. The reranking results on Holidays and UKBench are shown in Fig. We can see that our method outperforms cue. respectively. the baselines of GIST. With re-trained CNN feature. but republication/redistribution requires IEEE permission. refine the initial rank result. On Holidays+Flickr 1M. Time and Cost baselines greatly and consistently.91% to 75. and Oxford. and CNN. “G”. the BoW is enhanced by 5% in mAP through the combination with GIST. The examples of retrieval results on three directly searched by query. Reranking performance reflects the small since it only considers the top candidates.3%. When GIST is fused with other three features. and 4. On Holidays. and CNN are Because this operation is offline and required only once. VOL.html for more information.177.08 in N-S score. such as RANSAC [27].91%. F. On the other hand. The on-line computational cost is to 2.21% in mAP. and BoW are improved by 4. Moreover. respectively. Similarly. Comparison with the State-of-the-art experiments are performed on a server with 3. Specially.24 % in mAP. we perform large scale experiments on Holidays. Holidays + Flickr1M. Reranking with Individual Feature For individual feature. 8 and Table III. Furthermore. HSV and CNN obtains 77. Citation information: DOI 10. our result is also competitive affinity values can be propagated to images which are not to other methods. “H”. ture. but has not been fully edited.89%. the combination of BoW. due to the images could be promoted and outliers lowered down.066 in N-S score. bad features would also result in outliers.920. E. we obtain reranking methods. 15].28% in mAP. Reranking results for single features on (a) Holidays and (b) UKBench.224 in N-S score. Large Scale Experiments HSV. the average Our method achieves mAP = 90. From Fig. and mAP query time of BoW and global features are 396ms and 189ms. which can still promote true match images when there are a lot of outliers in the graph. the same situation are also observed. Then we compute and store their relevant relationships. Personal use is permitted.2017. When the four features are fused. and 128GB memory. HSV.138. One category is based on feature-level an mAP of 77. thus the effectiveness of CNN*. X. our results are comparable to [13. 10. is also enhanced from 1. On UKBench. Fig. The space complexity is O(KN) Moreover.22% on Holidays + Flickr 1M dataset. Noticing that our method refines the H. we also Our method belongs to the latter one. the performances of BoW. which exceeds other category employs the image-level cue for reranking.46 GHz CPU We compare our results with the state-of-the-art in Table IV.2660244. HSV. N-S = 3. In our method. respectively.65%. the performance achieves an mAP of 77. Fig. There are two kinds of popular For the large scale dataset. true match result is slightly higher than [16]. 11. our method can still exploit its complementary cues. The the state-of-the-art approaches on Holidays.37% in mAP. time complexity is affordable. GIST. it requires O(L2 K) for ranking on the graph.This article has been accepted for publication in a future issue of this journal. Besides. the bad feature. and 0. GIST. query expansion [41]. For the large scale experiment. 12. precision is enhanced. See http://www. when the bad feature is fused. be retrieved. Our G. It shows that fusion of multiple features can consistently improve the search accuracy in large scale dataset.82%. through the graph structure. In this experiment. time complexity for constructing ImageGraph of each query is O(K P ). To test the scalability of the proposed method. The solid line represents our method and dashed line the graph fusion ([17]). improving the performance. the dimension of global features is reduced to 128D by Principal Component Analysis (PCA). MONTH YEAR approach. XX.1109/TIP. Taking use of the achieve the best N-S score 3. On UKBench and Oxford.856 for storing connectivity. . our image-level relationships reflected by ImageGraph. CNN. each image is used as query with given fea- 2. 4. On Holidays. the performance is boosted to 77. Abbreviations “B”. Flickr 1M images are added into Holidays dataset as distractors. respectively. the result reported in [16] by 2. 0. respectively. Three feature combinations are tested and the performance are compared with graph fusion.02% to 90.82%. it can still improve the performance from 90. thus challenging candidates can datasets are shown in Fig. The results are shown in Table III. on UKBench.ieee. NO.org/publications_standards/publications/rights/index. the effectiveness of ImageGraph. By 0. IEEE Transactions on Image Processing 10 IEEE TRANSACTIONS ON IMAGE PROCESSING. our approach can be applied to = 84. It reveals the robustness of our method from another perspective. etc. we can see that the performance of our method is resistant to bad feature. HSV and CNN boost the performance of “BoW + GIST” from 69. and “C” represent BoW.26% and 76. time.9%. and the results are competitive to the state-of-the-art. 20141081253. select 2000 images from dataset as queries.This article has been accepted for publication in a future issue of this journal. for ranking.36GB. cost 5. [8] X.6% and 42. based on it. and the edges are weighted with Bayes similarity. . This work was supported in part by stores the reference book. [6] J. 52]. the by bad features or inappropriate parameters. Ji. To further validate our method. Wang. [3] F. This work was supported The memory cost of the proposed method is about 0. In large scale an image is connected to its K nearest neighbors with edges. Moreover. In Proceedings of the IEEE Conference obtain the performance of 32. HSV and GIST with relative attribute feedback. Fused with ranking.87. under Grant No. It is because Acknowledgements This work was supported by the Ini- [15] uses a supervised framework. The fusion of four scale image search.2ms for ImageGraph construction and 0.. 2010.ieee. which costs 0.1%.16ms local ranking. processing steps of our method. in Table V cost a few milliseconds. Schmid.92% is dependent on many factors. Wang and G. [4] D. respectively. image search.1877-1890. For each feature. respectively. we fuse them with our proposed method.3. images are re-ordered by usually takes 5. N-S score of 3. M. thus it is not directly the large scale experiments. 17. Lin. Szummer. which are relatively small compared to the query Through extensive experiments on three benchmark datasets. Besides. we will investigate how to efficiently method with other post-processing methods considered in update the ImageGraph structure when new images are added Table IV. and A. We can see that the CNN feature achieves localization with spatially-constrained similarity measure and k-nn re- the best performance for concept detection task. Then. Personal use is permitted. Baluja. CNN. Most of the post-processing methods strategies in the fusion. we store 4 nearest neighbors of each image. and C. our method adopts the same framework and attributes. 9. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. It Graph.1109/TIP. 2011. Parikh and K.4% in on Computer Vision and Pattern Recognition. HSV and GIST. similar to ours in theory. but republication/redistribution requires IEEE permission. graph fusion [17] and [16]. In Proceedings of the IEEE Conference on Computer Vision BoW. J. Visualrank: Applying pagerank to large-scale image search. Grauman. The memory ranklists resulted from different methods are fused via Image- cost of 1 million image of single feature is about 0. Multiple then 105 bits are needed per image per feature. Weak attributes for I.5%. For each query.2%. But it can roughly indicate the time efficiency of Holidays + Flickr 1M. We compare the time of post-processing steps of the proposed In the future work. our method belongs to the two popular fusion schemes. In Proceedings of the IEEE European Conference on with [15. Since both [14] and [11] store binary signatures of features in the inverted file. but has not been fully edited. Y. A discriminative latent model of object classes is 22. Object retrieval and mAP. which works on the given rank list.316-336.09GB. Y and S-F. Brandt. Citation information: DOI 10. respectively. post-processing algorithm. [2] L. 2010.8% in mAP. A lot R EFERENCES of image-level information is stored in [12] and the cost of it [1] Y. which are usually brought in methods. 2012. we perform concept de. the CNN result is improved to 49. and K. Wu.076GB extra memory.35GB. we calculate Intelligence. It shows that our method outperforms the proposed approach. 14. such as the machine used on Holidays.e. the ImageGraph is constructed to model the relationship among images. In addtion. Label diagnosis through itself tance [55].36ms. i.org/publications_standards/publications/rights/index. Torresani. We first define Rank Distance to measure Time (ms) 5. features obtains the mAP of 50. On the fused ImageGraph.html for more information. Jiang. After obtaining the rank lists of different features. 2008. Efficient object category recognition using classemes. Fitzgibbon. we yield an mAP of 77. 2008. pp. Mori. Moreover. Kovashka. ImageGraph construction more efforts will be made to explore the feature selection and ranking. and Pattern Recognition. we show that significant improvement can be achieved when Table IV shows the comparison of average query time and multiple features are fused. rather than the neighborhood relationship. R. Qi Tian by ARO grant W911NF-15-1-0290 The approach of [16] evaluates the retrieval quality online with and Faculty Research Gift Awards by NEC Laboratories of score curve. Chang. Douze. 1057-7149 (c) 2016 IEEE. Note that query time obtained an mAP of 90. Here we still use [7] A. Jégou. except [15]. Chang. S. vol. WhittleSearch: Image search mAP to measure the performance. we demonstrate that memory cost on 1 million dataset with the state-of-the-art our method is robust to outliers.868s.1GB. National Science Foundation of China (NSFC) 61429201. i. tuning for web image search. C ONCLUSIONS This paper proposes a graph-based method for robust feature Methods Ours [17] [52] [16] [15] [12] fusion at rank level. the memory costs of them are 6.. In and number of features of dataset. IEEE Transactions on Image Processing ZIQIONG LIU et al. which costs a lot of time tiative Scientific Research Program of Ministry of Education to build the anchors. thus the memory costs of these methods are Computer Vision. See http://www. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH 11 TABLE V P OST. Since we use four features in our experiments. The post- into the database or old images are deleted from it. Content may change prior to final publication. vol.920 and 84. Table V shows the result of comparison. Our method has average query time is about 0. Shen. [9] H. Concept detection large-scale image retrieval. its distance to each concept class. 42. 2012. pp. Yu. M. G. The rest images [5] F. Jing and S.6%. Avidan. M-H Tsai. Improving bag-of-features for large 42.7. BoW. 2012. It only America and Blippar. UKBench.2017. In Proceedings of the tection experiments on Flickr25000 dataset [56].36 1 1 10 2210 30 the relevance of images on rank level.We randomly IEEE International Conference on Computer Vision.82% on comparable.89%. Each image ID costs about 21 bits.30. we introduce the Bayes similarity to evaluate the retrieval quality of individual features. which further protects the fusion from outliers. In Proceedings of the IEEE European Conference on Computer Vision. IEEE Transaction on Pattern Analysis and Machine are seen as the database images.-F. using image-to-category dis.2660244. International Journal of Computer Vision. in part to Dr. no.PROCESSING TIME ON H OLIDAYS + 1M D ATASET VI. and Oxford datasets. D. Relative attributes. Z. Parikh. 2009. It is similar on UKBench and Oxford. and S.e.-G. and Y. no. in which respectively. Grauman. True matched images are marked with green dot.1109/TIP. XX. and false matched ones red.2017.html for more information. respectively. 12.This article has been accepted for publication in a future issue of this journal. MONTH YEAR Fig. but has not been fully edited. but republication/redistribution requires IEEE permission. See http://www. its top-10 ranked images resulted from GIST (the first row). respectively. X. . NO.ieee. BoW (the fourth row) and ImageGraph feature fusion (the fifth row) are shown. 1057-7149 (c) 2016 IEEE. UKBench (middle) and Oxford (bottom) datasets. Citation information: DOI 10. Personal use is permitted. Content may change prior to final publication. IEEE Transactions on Image Processing 12 IEEE TRANSACTIONS ON IMAGE PROCESSING. Examples of retrieval results from Holidays (top). HSV (the second row). CNN (the third row). VOL. For each query.2660244.org/publications_standards/publications/rights/index. Gammeter. 2011. N-S 3. Z. . vol. Z. T. of the IEEE Conference on Computer Vision and Pattern Recognition.905 81. Schmid. Yang. 1. Chum. Tian.60.920 84. C.76 3. Razavian.23.80 77. Content may change prior to final publication.5 74. and C. 6. Packing and padding: Coupled quantization: Improving particular object retrieval in large scale image multi-index for accurate image retrieval.1109/TIP.27 76. [28] C. Zheng. and A. pp.8 84. Ji. In Proceedings of IEEE International no. but republication/redistribution requires IEEE permission. Douze. and C. Feature Combinations Holidays.3368-3380. Jégou. 77. Stewenius.65 Memory cost (GB) 0. Yang.24 BoW + GIST + CNN* 89. [30] M. Zheng. Azizpour. [15] C. 2004. . [31] A. L. Distinctive image features from scale-invariant keypoints. off-the-shelf: an astounding baseline for recognition. M. On the burstiness of visual retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE databases. Query Co-indexing for Near-duplicate Image Retrieval. J. Tian. Z. In Proceedings of the IEEE Conference on Computer Vision of the IEEE Conference on Computer Vision and Pattern Recognition. 2014. J. and Q. and Q. Wang. 0. Sullivan. Accurate image search with 86. Schmid. R. 84.Gao. Zheng. Wang.64 87.THE . F.2017. T.82 BoW + GIST + HSV + CNN 90.97 75.23. Conference on Computer Vision. Tao.920 84. D.4 69. - [10] H. Deng. Lp-norm Idf for Large Scale neighbor: accurate object retrieval with k-reciprocal nearest neighbors. pp. O. Carlsson.916 82. 2011. Scale affine invariant interest point Workshops. Conference on Computer Vision.1. He. to object matching in videos. Adaptive Late Fusion for Image Search and Person Re-identification.22 BoW + GIST + HSV + CNN* 90. mAP (%) Holidays+Flickr1M.2660244. Object pp. Tian.91 BoW + HSV 87. and Q. Combining attributes and [18] D.21 BoW + HSV + CNN* 90.8.org/publications_standards/publications/rights/index. Zhang. and X.855 79. M.120.48 3.64 84. S. .40 3.843 80.05 77.06 79.3 Query time (s) 0. [11] L. [26] J. Sivic. mAP (%) UKBench.868 0. IEEE Transactions on Image Processing. In Proceedings International Conference on Computer Vision. and A. Lin. H. 2013. vol.96 76. Mikolajczyk. Image Search.92 77. and Q.82 . Qin. Bag-of-colors for improved image In Proceedings of the IEEE Conference on Computer Vision and Pattern search. A. Arandjelovic. Sivic. . S.749 0. mAP(%) 84. Van Gool. pp. no. Sivic.8. In Proceedings International journal of computer vision. no. L. and Q. .920 3. Liu. Qin.1.63. Query adaptive similarity for geometric consistency for large scale image search.1 85. Lowe.413 . Wengert. M. . and Zisserman. See http://www. 2007.89 3. In IEEE Transactions on Image Processing. Wengert. mAP (%) BoW + GIST 85.28 3. Hello [24] L. Liu.08 BoW + GIST + HSV 88. and L. and C. H. N-S Oxford. Wang.89 84. no.36 . 1057-7149 (c) 2016 IEEE. Van Gool. In Proceedings elements. no.26 BoW + GIST + CNN 88. . J. Metaxas.60. [25] H. 2009. Isard.2 . Lp-Norm IDF for Scalable Image Computer Vision and Pattern Recognition. [29] S. S. Philbin. In Proceedings of the specific fusion for image retrieval. S. detectors. Coupled Binary Embedding for Large.25 BoW + CNN 88.75 3. Zheng. C. Zisserman.71 3. Retrieval.841 3.html for more information. Object Weakly Supervised Multi-Graph Learning.703 79.913 84.0 42.06 75.89 3. Schmid. mAP(%) 77.09 BoW + HSV + CNN 90. Douze. In Proceedings of the IEEE European International Conference on Computer Vision. 2014. In Proceedings of the IEEE In Proceedings of the IEEE Conference on Computer Vision and Pattern Conference on Computer Vision and Pattern Recognition. K. W.7 Holidays + 1M.8 80. and Pattern Recognition. Quack. Recognition.ART. 81.4 68. In Proceedings of the IEEE Conference on Computer Vision Conference on Computer Vision and Pattern Recognition. In Proceedings of the IEEE Conference on [33] L. Wang. 2012. and L.7 85.01 77. CNN features [19] D. Wang. Semantic-aware [17] S. 2012. Douze. Nister. 6.This article has been accepted for publication in a future issue of this journal.3604-3617.82 TABLE IV P ERFORMANCE COMPARISON WITH THE STATE .92 . 2007. to improve object retrieval.98 84. A.85 . Liu. Scalable recognition with a vocabulary tree.145 . vol.64 3. and A. Tian. Zheng.749 . Personal use is permitted. [16] L. . 2015.ieee. Recognition. Isard. Three things everyone should know Vision. 84. 2004. 2008.1 22. 2011. In Proceedings of the IEEE Conference on the IEEE European Conference on Computer Vision. M. Zheng. Methods Ours [17] [52] [16] [15] [14] [13] [11] [12] [10] [9] Holidays. Y. 3. Tian. Citation information: DOI 10. and S. Query. Recognition.80 77.79 3. Zisserman. Isard. O. and Zisserman. In Proceedings of ACM Multimedia. O. Visual Reranking through [27] J. 2013.02 3. [12] D.35 .67 3. H. 2003.914 84.31 3. Video Google: a text retrieval approach Scale Image Retrieval. 0. M. Yu. .1 . vol.08 69. In International Journal of Computer [21] R.52 3. Hamming embedding and weak [13] D. 2008.51 3.907 81.91-110. Chum.0 .916 82. multi-scale contextual evidences. Bossard.: ROBUST IMAGEGRAPH: RANK-LEVEL FEATURE FUSION FOR IMAGE SEARCH 13 TABLE III F USION RESULTS OF DIFFERENT FEATURE COMBINATIONS ON BENCHMARKS . pp. X. In Proceedings of IEEE retrieval with large vocabularies and fast spatial matching. Chum. Computer Vision and Pattern Recognition. and Pattern Recognition. Fisher vectors for efficient image retrieval. S. 2014. and D.8 UKBench. J. and Q. 0.91 77. vol.83 3. Wang. 2016. Tian. Zhang. Douze. J. and Q. Tian. 75. M.2. 2013. 2013. of the IEEE Conference on Computer Vision and Pattern Recognition [20] K. S. Lost in [14] L. Sivic. 2015. Ramisa. In Proceedings of the IEEE Conference on Computer In Proceedings of the IEEE Conference on Computer Vision and Pattern Vision and Pattern Recognition.55 Oxford. Cour. M. Wang. 85. International journal of computer vision. Schmid. A. M.G. In Proceedings of large scale object retrieval.076 . IEEE Transactions on Image Processing ZIQIONG LIU et al.1-13.0 . S. Liu. Zisserman.3 .OF . Wang. S. but has not been fully edited. 2006. [22] J. Jégou. mAP(%) 90. N.77 3. Jégou. [23] J. [32] L. Tian.17 BoW + CNN* 89. Philbin. Philbin. Japan. Liu. and M. but has not been fully edited. Q. no. Dr. His current research interests include image processing. degree in Life Science from learning for collaborative image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. T. Wang. and J. In Proceedings of the IEEE European currently a Professor in the Department of Computer Conference on Computer Vision. Chum.42. Australia. Computer Corporation. Li.. L. M. Hung. and Q. 1999. He was a Conference on Computer Vision and Pattern Recognition. but republication/redistribution requires IEEE permission. vol.3.31-41. Lew. In Proceedings Liang Zheng received the Ph. China.Zhang. Fast image retrieval: query Supervised Object Localization with Progressive Domain Adaptation. . Chigorin. Yang. See http://www. 2015. [40] L. pp. L.11. and A. IEEE Transactions on Pattern Analysis and Qi Tian (SM’04) received the Ph. in 2011. of Instance Retrieval.31. Tsinghua University. vol. pp. and Q. 2008. Guest co-Editors of IEEE Transactions on Multimedia.648-659. Tian. computer Image Processing . Li. The MIR Flickr Retrieval Evaluation. IEEE Transaction on Pattern Analysis and Machine Intelligence. Yang. Huang. Brin. 2016. Kennedy. Motwani. and large scale multimedia retrieval. Personal use is permitted.124. Nanjing. envelope. Chang.1348-1358. 2014. J. Sivic. vol. 1. and is the Associate Editor of IEEE Transactions on Circuits and Systems for Unsupervised Visual Representation Learning by Graph-based Consistent Video Technology and in the Editorial Board of Journal of Multimedia. Hsu. Image classification and 120 IEEE and ACM Conferences including ACM retrieval are one. Her current Evaluation of gist descriptors for web-scale image search. 2014. Zhao. and Q. Huang. [49] L. In ArXiv:1608. Zhang.Dong.D degree in Electronic of ACM International conference on Multimedia. N. Tian. Since September 2003.html for more information. Organization Committee Members and TPC for over [55] L. The PageRank citation University of Technology Sydney. Heterogeneous Graph Engineering. Chang. Li. Babenko. N. He is the Retrieval. Y. no. 2008.5.This article has been accepted for publication in a future issue of this journal.46. Yu. Zisserman. [39] H. Spectral hashing In NIPS. and R. 2010. Y. W. Journal of the ACM. She is currently pursu- 175.L. Lett. Huiskes. Jégou. Zheng. J. 1057-7149 (c) 2016 IEEE. and B. retrieval. 2-11. K. and M. X. L.33. Jégou. Sandhawalia. tion.117-128. Paulevé. Pattern Recognit. Ahuja. Tian. He is the holder of [44] C. A. 2016. Fast and accurate near-duplicate the Internet System Research Laboratories. Isard. Weiss. He on Pattern Analysis and Machine Intelligence . Fergus. Zhang. M. [58] D. Ziqiong Liu received the bachelor degree in In- [36] A. G. been a Professor with the Department of Electronic [43] L. A. in [47] S. Neural versity of Illinois. R. and person re-identification. J. Schmid. Weakly [35] L. C. Douze. VOL. Caffe: An open source convolutional architecture for fast feature Antonio. F. Huang. Xie. postdoc researcher in University of Texas at San [48] Y. Hong. International journal of computer vision. Zheng. vision. 37. A. Semi-supervised distance metric 2015. B. Locality sensitive hashing: A comparison of hash function types and querying mechanisms. pp. R. no.E. he has Vision and Image Understanding. IEEE Transactions on more than 80 papers on image processing. Jégou and M. Tian. Zhou. IEEE Conference on Computer Vision and Pattern Recognition. A holistic representation of the spatial formation Engineering from Southeast University. H. B. Tian’s research interests include multi- search using the contextual dissimilarity measure. Yang. vol. Liu.D. H. XX. 2009. 2013. vol. ICCV. M. no. H. Session Chairs. Tian. Yang. http://caffe. 2015. vol. China. degree in Electronic Engineering of [37] M. China. T. S. In Proceedings of IEEE International Conference on Computer Tokyo. Jiang. S. pp. Slesarev. F. 2011. His ranking: Bringing order to the web. Schmid. [38] Y. Schmid. 4. Philbin. 1999. Engineering from Tsinghua University. 2007. Wang. and S. Page. In IEEE ICASSP . and pattern recognition.D. S.1109/TIP. Urbana Champaign in 2002. C. 2003. Lempitsky.604-632. Tian. China.145. degree in Machine Intelligence. 2013. SIGIR. From May 1997 to August Vision. no.11. 2014. A. L. D. Luo. S.ieee. W. Y. Beijing. B. no. H. Content may change prior to final publication. pp. 2007. He is currently a postdoc researcher embedding. Torralba. NEC image search with affinity propagation on the ImageWeb.Zhang. F.berkeleyvision. In pruning and early termination. 2016. Xie. Amsaleg. vol. Wang. electrical and computer engineering from the Uni- [53] A. and EURASIP Journal on Advances in Signal Processing [57] D. IEEE Transactions media information retrieval and computer vision.Winograd. and the B. Product quantization for nearest neighbor search. Video search reranking through random walk over document-level context graph. and S. Liu. he was a member of the research staff in [42] L. video surveillance. no.2660244. 2008. Wang. in 2010. H. no. S. SIFT Meets CNN: A Decade Survey Constraints. Y. Total recall: inghua University. NO. Hoi. Zheng. He is codes for image retrieval. pp. and S. and V. [51] Z. Noise resistant graph ranking for improved web image search. S. 2001. W.degree from Ts- [41] O. In Proceedings of the IEEE Tsinghua University. Cour. Jegou. Amsaleg. Visual reranking with improved image graph. pp. Liu.D. IEEE Transactions on Image Processing 14 IEEE TRANSACTIONS ON IMAGE PROCESSING. [46] W. Accurate image (UTSA). J.5. S. C. W. and C. Douze and C. vol. in1997. Torralba.. classifica- [50] J M. and J.4287-4298. Tsinghua University. Speech and Signal Processing . 2010.E.24. Z.org/. MONTH YEAR [34] L. S. pp. Kleinberg Authoritative sources in a hyperlinked environment.1. ACM Transactions on Intelligent Systems International Conference on Multimedia Information Retrieval. Science at the University of Texas at San Antonio [54] H. Jia. computer efficient graph-based visual reranking. USA. Journal of Computer [56] M. pp. Wang.org/publications_standards/publications/rights/index. H. H. Oliva and A. Chang. 32. 803-815. ing the Ph. 2011. Bai. [52] S. L. In Proceedings of IEEE International Conference on Acoustics. In IEEE Transactions on Multimedia. China. has been serving as Program Chairs. 2015. J. An ten patents. Automatic query expansion with a generative feature model for object degree from the Tokyo Institute of Technology. ICASSP. Xie. In ACM International Conference on Multimedia Multimedia. in Quantum Computation and Intelligent Systems. and B. vision. In European Conference on Computer Vision. In ACM Vision and Image Understanding. Q. and Technology.2017. Zhou. and pattern recognition. etc. in 1985 and the Ph.01807. Citation information: DOI 10. Shengjin Wang received the B. He has published Propagation for Large-Scale Web Image Search. and Q. Harzallah. Metaxas Query Specific Rank Fusion for Image Retrieval. Zhao. N. B. Japan. In Proceedings research interests include image/video processing of the ACM International Conference on Image and Video Retrieval. H. research interests include image retrieval.17. vol. [45] W. Cen. Verbeek. J. 2015.

Robust ImageGraph Rank-Level Feature Fusion For

Comments

Description