1 Introduction
A facet is by definition1 a particular aspect or feature of something. In the present work, this is applied to a set of resources that could be viewed under different aspects. Each aspect is called a facet and consists of several categories, facet values, which can be used to filter the initial resource set. The number of resources that are associated with a certain facet value is called value size.
Considering an example, a list of books can be viewed under the aspect of their genre. Choosing the facet value science fiction, books of this specific genre would be selected. The number of selected resources then corresponds to the value size of the facet value science fiction. The same list could be viewed under the aspect of their publication year, with each sublist containing only books published in one particular year. These two aspects, genre and publication year, are just two of the many possible facets for books.
To obtain different facets, we assume each resource to have properties assigned, linking them either to other resources (genre, with, e.g., a description for itself) or plain literal values (publication year). While our method works on any resource set possessing such properties, we use semantic models as rigorous formulation. In particular, we consider knowledge graphs (KGs). They provide significant advantages for the creation of facets: First of all, assuming the resources are drawn from a rich KG, we automatically get a large amount of direct resource information from their properties. The values of those properties may be resources themselves and can be used to generate indirect facets over the initial resource set. For example, an indirect facet for books can be an author’s place of birth, where place of birth is linked to author, not to the book itself.
However, considering continuously changing and heterogeneous resources, manually predefining facets is often impractical. Using concepts from large KGs, e.g., the Linked Data Cloud, for semantic annotation induces a large number of possible facets. Hence, an automated method has to rank the large number of candidate facets to be able to pick the most suitable ones among them.
Nevertheless, determining the single, best facet is not enough. Users generally expect a list of facets to choose from. Moreover, this list should not be extremely long, and its items should be “useful” both individually and as collection. Were it not for the requirement of usefulness also as collection, simply choosing the top-k highest-ranked facets would be sufficient. However, avoiding facets that are semantically very close to each other is important as well. After their identification, criteria need to be defined to decide which of the candidates to drop to arrive at the final list of facets.
We propose an approach for dynamic facet generation and facet ranking over KGs. Our ranking is based on intra- and inter-facet metrics to determine the usefulness of a facet, also in the presence of others. A key aspect is exploiting indirect properties to find better categorizations. Since inter-facet metrics have not been satisfactorily addressed so far, we present semantic similarity as a usefulness criterion.
Based on our previously proposed workflow [1], we integrated all methods into an initial prototypical implementation [2]. While this leverages data from a specific KG, i.e., Wikidata [3], the methods we describe and use are generally applicable without or with only minimal changes to a wide range of KGs. Possible applications include exploratory browsing of a data catalog of semantically annotated datasets, or the reduction of a search result set using facets as filters.
In Sect. 2 we first revisit some of the related works in this direction. We then discuss methods we used for candidate facet generation and ranking in Sect. 3 and propose our workflow in Sect. 4. We present evaluation results in Sect. 5. Finally, we conclude and discuss future work in Sect. 6.
2 Related Work
Faceted browsing over KGs has been the subject of various research efforts, e.g., [4]. Prominent approaches such as Ontogator [5] or mSpace [6] use statically predefined facets for data navigation and do not consider continuously changing data sources. Moreover, their evaluation scenarios suppose data homogeneity and domain-dependent collections like cultural artifacts [5] or classical music [6].
Other projects include BrowseRDF, Parallax, gFacet, Faceted Wikipedia, VisiNav, Rhizomer, SPARKLIS, SemFacet, Grafa, MediaFaces, and Hippalus ([7–17], resp.). Facets are either dynamically selected from a precomputed set of facets or dynamically generated on the fly. The latter type of facets relies on building dynamic SPARQL queries and executing them on the respective SPARQL endpoints. Grafa [15] proposed a selection strategy to precompute only a subset of possible facets to avoid indexing of all data.
Some of these projects assume a homogeneous data source [7, 17], using very specific data sets from the domains of, e.g. species [17], other contributions account for domain heterogeneity [8–16] and base their work on large scale KGs such as Wikidata [3], Dbpedia [18], or Freebase [19]. However, in some projects [9, 10, 12, 13], an initial interaction (resource type specification) is required, before any facets are generated.
Various aspects of facet generation are discussed. This includes facet ranking [7, 10–12, 15–17], entity type pivoting2 [8, 9, 11–14], visualization [8, 9, 11–13], indirect facet generation [6, 7, 9, 13, 14], or performance issues [10, 13, 15].
Facet ranking is of particular importance for dynamic facet generation in order to select from the considerable number of facet candidates. Frequency-based ranking was adopted by [10–12, 15]. In Faceted Wikipedia [10], facet values are ranked based on the value sizes. For facet ranking, the most frequent facets corresponding to the selected type are candidates. They are ranked based on their most frequent facet value. Note that a ranking is applied only in case of resource type selection, otherwise generic facets are displayed. VisiNav [11] also adopts a frequency-based approach to rank facets and facet values inspired by PageRank [20]. The respective scores are calculated based on the PageRank score of the data sources [21]. Rhizomer [12] defines relevant facets based on the properties usage frequency in the resource type instances and the number of different facet values. In Grafa [15], facets are ranked according to the number of search result resources that have a value for the specific facet and facet values are ordered by PageRank. BrowseRDF [7] proposes three metrics to measure the quality of facets: (1) predicate balance, considering faceted browsing as the operation of traversing a decision tree where the tree should be well balanced (2) object cardinality, the number of facet values as also considered in [12] (3) predicate frequency similar to [10, 12, 15]. The metrics are combined to a final score that is used to rank facets. In MediaFaces [16], facets are ranked based on the analysis of image search query logs and users tags of Flickr3 public images. Hippalus [17] introduces a different ranking approach involving user interactions where users rank facets and facet values according to their manually defined preferences.
We notice that all the previously described efforts concerning facet ranking only involve intra-facet metrics that rate facets individually without taking into consideration the significance of facet co-occurrence, or in other words inter-facet metrics. To the best of our knowledge, only Facetedpedia [22] includes a metric for measuring the collective usefulness of a facets collection. However, it does not take advantage of KGs or semantically annotated collections, but generates facets over Wikipedia4 pages based on the Wikipedia category system. They consider the navigational cost, i.e. the number of edges traversed, as an intra-facet metric that is based on the number of steps required to reach target articles and the number of choices at each step. Furthermore, facets are penalized if they have a low coverage, i.e., not all the articles can be reached using the considered facet. Besides the navigational cost, the average pairwise similarity is proposed as an inter-facet metric. However, the used metric is specifically designed to be applied on the Wikipedia category system and is not generic enough to express semantic similarity in the sense of arbitrary KGs.
3 Methods
Before presenting our proposed workflow, this section provides details on the employed methods. This includes initial candidate facet generation, handling of literal facet values, and the metrics used to compare facets. The latter discussion is split into two parts: Intra-facet metrics evaluate a facet in isolation, whereas inter-facet metrics judge facets in relation to others.
3.1 Candidate Facet Generation
We aim to generate facets over a set of resources given by their respective Internationalized Resource Identifiers (IRIs) within the KG. In such a graph we treat the relations of the given resources as their properties and thus any applicable property path is equivalent to a candidate facet. To achieve a better categorization of resources, we consider not only the direct properties (i.e., values that are connected to the resource by a single link), but also indirect properties (i.e., chained links are needed to connect a resource and a value). As an example, consider a set of resources referring to people. A direct property can be derived from a relation place of birth pointing to instances of a class city. An indirect property could then also exploit an existing link between city and country5 to arrange the connected cities into possibly fewer categories6. Indirect properties are only possible, if the range of the associated relation is not a literal, as those can not be the subject of further statements in the standard RDF model.
A candidate facet is now given by a property path within the KG. In case of direct properties this path is of length one, whereas for indirect properties any path length greater than one is possible. However, longer paths loosen the connection between resources and facets values. At some point this renders a facet useless for the given task or at least makes it unclear to users how that facet is supposed to support them. Furthermore, longer paths increase the number of candidates and thus require more computations in later phases. For these reasons, we limit the path length for candidates by a threshold .
We categorize candidate facets into two types: (1) Categorical facets that result from property paths connecting exclusively to other resources and (2) quantitative facets whose values are given by literals. While we allow quantitative candidates for numeric or date literals, we exclude string literals. The rationale is that those oftentimes contain labels or descriptions specific to single resources and, hence, are barely shared between different ones. As facets rely on common values to categorize the given input set, these properties will only rarely provide a suitable candidate facet. If a string value is common to multiple resources, there is a high chance, that this should have been modeled as a distinct resource instead of a literal. Of course, resources are often not modeled perfectly. Future work might need to include these to be able to cope with this type of data.
3.2 Clustering of Quantitative Facets
As mentioned before, facets can be created from numeric or date literals. Unlike categorical facets, it is highly unlikely that the number of distinct values is sufficiently small to generate a useful facet. However, these values can be clustered by dividing their continuous range into discrete subranges.
The clustering step is only applied to quantitative facets. It replaces the associated values with value ranges. The number of these clusters is determined by the optimum value cardinality as defined by the respective intra-facet metric (see Subsect. 3.3). The clustering technique itself is a consequence of the rationale behind another intra-facet metric, the value dispersion. It assembles approximately the same number of values in each cluster.
3.3 Intra-Facet Metrics
To select the most useful facets among the candidates, we define metrics to judge their usefulness. The first set of metrics presented here assigns scores to individual candidates independently of each other. Each metric is designed to reflect one intuition of what constitutes a useful facet.
The first requirement concerns the applicability of the facet. For each facet we also include an unknown value. This accumulates the resources that do not support the respective property path, i.e., at least one of the corresponding relations is missing for this resource. For heterogeneous resource sets, the unknown value size will be non-zero for most facets. However, for a facet to be useful, it should apply to as many resources as possible. So we strive for the value size of unknown to be small in comparison with the overall size of the resource set.














3.4 Inter-Facet Metrics
In contrast to their intra-facet counterparts, inter-facet metrics assess the relationship between different candidate facets. We use semantic similarity of facets as an inter-facet metric. The motivation is to prevent facets that are too close to one another and thus would provide about the same partitioning of the resource set. Moreover, semantically distant facets increase the chances of meeting users’ information need and/or mindset.












The previously defined semantic similarity metric takes a pair of concepts as input. Therefore, a mapping between properties and concepts needs to be available. For this purpose, we exploit a particular characteristic of Wikidata’s data model: Properties are annotated with a matching entity. For example, the property author (P50) is itself linked to the entity author (Q482980). This allows us to retrieve entities corresponding to the property path of a facet.
When comparing two facets, we first retrieve the respective entities for the first property in their property paths. We then calculate the semantic similarity between the entity pair. Two entities are considered similar, if sim is larger than a defined threshold . Since we calculate the similarity over Wikidata taxonomy, we only consider links using subclass of (P279) and instance of (P31) here.
4 Workflow

Phases of the facet generation process.
Phase 1: Candidate Generation
This first phase enumerates possible facets by querying for a list of property-paths associated with the input list of resources. As the predicate probability is a simple metric, we choose to include it as part of the query. Candidates that have a
below a predefined threshold, minPredProb, are already removed in this phase. This reduces the necessary data transfers and the calculation of computationally expensive metrics. The result is a list of candidates, each comprised of a basic graph pattern (BGP), that describes the facet, and a score to reflect the fraction of resources it applies to.
Phase 2: Intra-Facet Scoring and Ranking
As a prerequisite for the remaining intra-facet metrics, now the facet values along with the respective value size are retrieved from the SPARQL endpoint. We distinguish between object and data properties8 at this point. The latter are subjected to the clustering described in Subsect. 3.2 to derive comparable characteristics with regard to intra-facet metrics.
After augmenting the facets with their respective values, the remaining intra-facet metrics, and
, are calculated for all candidates. This allows us to compute the final intra-facet score, score(f), and accordingly rank all facets in decreasing order.
Phase 3: Selection of Better Categorization
The number of necessary inter-facet metrics calculations grows quadratically with the remaining number of candidates. To reduce the list of candidates before the next step, we exploit a key characteristic of the semantic similarity metric. The similarity only depends on the first direct property of each facet. Consequently, out of all candidates sharing the direct property, only one will be chosen for the final result, as all others will be too similar to it. Leveraging this observation, we can group the candidates by their direct properties and only choose the best-ranked one within each group.
Phase 4: Inter-Facet Scoring and Filtering
The final result is derived by consecutively applying inter-facet metrics to chosen pairs of candidates. Calculating semantic similarities is rather expensive. To minimize the comparisons required, facets are selected in a greedy fashion.
Let C be the list of candidates in decreasing order w.r.t. the intra-facet metric scoring of Phase 2 and S be the final collection of facets as returned by Phase 4.
- (i)
Initialize S with the best-ranked facet.
- (ii)
Take the next facet out of C and compare it with the facets in S.
- (iii)
If it is not closely semantically similar to any facet in S, add it to S.
- (iv)
Continue with Step (ii) until the desired number of facets is reached or there are no more candidates left.
Finally, S will contain a subset of facets deemed most suitable for the given input set of resources. The suitability has been determined by employing both the intra- and inter-facet metrics, which can be extended or changed without affecting the corresponding workflow. S can now be presented to users. Note that selecting specific value and subsequently reducing the result set will trigger a new facet generation process, as the basis for our calculations—the input resource set—might have changed substantially.
5 Evaluation
The methods described in Sect. 3 were implemented in a prototype that issues dynamic SPARQL queries to the public SPARQL endpoint of Wikidata (WDQS)9. The source code is available online [2], under an MIT license.
5.1 Benchmarking
Number of candidates depending on path length and number of IRIs.
#IRIs | 100 | 1000 | 2000 | 3000 | 4000 |
---|---|---|---|---|---|
| 37 | 52 | 65 | 66 | 75 |
| 901 | 1643 | 2039 | 2342 | 2648 |
| 16076 | 31543 | 39318 | 44619 | 50843 |



Benchmark results: average timings depending on the input IRI size.
Subsequently, we looked at the run-time of our prototype for varying sizes of input IRIs. We fixed the semantic similarity threshold (), the parameters for value cardinality scoring (
,
, and
), and the predicate probability threshold (
). Figure 2 shows a breakdown of the measured execution times, averaged over about 350 individual measurements over the course of a week. We observe a less than linear growth of run-time depending on the input IRI size. The most expensive operations are (1) candidate generation, (2) facet value retrieval, and (3) semantic similarity. Other operations such as intra-facet metric calculation and selection of better categorization do not contribute significantly. A detailed analysis revealed that the execution times are largely dominated by querying the SPARQL endpoint.

User evaluation: Fictitious interface for facet selection task.
5.2 User Evaluation
Setup. In a survey-based user evaluation, we examined whether facets generated by the proposed workflow match user expectations. Based on a fictitious scenario, we assumed an initial search with the keyword “film”.
After introducing users to the general concepts of faceted search and the given scenario, we asked for user preferences in a series of questions categorized into two kinds of situations: one for facet selection and one for facet ranking. In facet selection (cf. Fig. 3), users were presented with a static user interface that resembles a common search engine and includes three different facets, e.g., director of photography, production designer, and number of seasons. They were then given two more facets, e.g., genre and camera operator, and were asked which would be a better addition to the existing three facets. In facet ranking, we presented three to four different facets per question and asked users to rate their usefulness in the given scenario using a five point Likert scale [24].


Usage of facets. An option “never” was provided, but not chosen by any user.
Using these situations, the following order of questions was used in the survey. Overall, we created a pool of 43 questions, out of which a random subset of 15 was chosen for each user. This approach is intended to reduce the bias that might arise from certain terms used throughout.
In a first set of questions we focus on inter-facet comparisons using facet selection. In particular, this evolves around the selection of better categorization (Phase 3 in Sect. 4) and semantic similarity (Subsect. 3.4).
A second set of questions uses facet ranking with facets modeled after Wikidata. This compares multiple indirect facets with their respective direct counterparts. Here, the indirect facets also vary in their intra-facet scores, allowing us to evaluate our strategy in the selection of better categorization.
Finally, we used facet ranking, this time with abstract facets, i.e., replacing facet headers with “Facet 1” etc. and values with “Value 1” etc. The reason is again to reduce bias stemming from the actual semantics of the proposed facets. In this last part of the evaluation, we issued questions, where the proposed facets differed only with respect to one intra-facet metric10. In a similar fashion, we also examined combinations of two and all three proposed intra-facet metrics.
For the survey, we recruited 26 volunteers differing in age (18–44) and educational background. In total, they performed 130 facet selections and 936 individual facet ratings. Most of the participants stated at least an occasional use of facets, if they are provided (cf. Fig. 4). Consequently, we assume that they are familiar with the general behavior of faceted browsing.
Results. For each question in facet selection, we derive the percentage of participant selections that match the system decision. Figure 5 shows the results of the first question set with each dot representing agreement of one particular question11. For the selection of better categorization we see an overall agreement between the survey users and our system of .


Agreement of participants and system in facet selection. One dot per question.
In facet ranking, we are not interested in the specific numerical values each metric provides, but focus on the ranking induced by those metrics. To compare the ranking determined by our system with the ranking induced by the survey responses, we encoded the latter using numerical values and calculated an average rating for each facet. For each question, we ranked the presented facets according to these ratings, which results in a survey ranking. We then chose Kendall’s Tau-B12 to compare our system ranking with this survey ranking.

Rank correlation for facet ranking tasks. One dot per survey question. Value Cardinality (Card), Value Dispersion (Disp), Predicate Probability (Prop).
The final question set verified our metrics independent of semantic biases induced by real-world facets. Results are shown in the lower parts of Fig. 6. In general, survey participants agree almost completely with our approach. The only exceptions are due to a tie (Card, Disp) or a different opinion about the order of one particular pair of facets (Disp, Prob and Card, Disp, Prob).
The user evaluation suggests that the technical criteria seem well suited in isolation. However, resulting facets not only have to be evaluated against each other, but also against the semantic context of the input IRIs. While in search tasks user input can be used to assess this intent, it remains open how this can automatically be approximated for arbitrary resource sets.
6 Conclusion
We have proposed methods to enable automatic facet generation and ranking over KGs. In particular, we provided an approach for dynamic candidate facet generation for arbitrary input sets of resources. We defined intra- and inter-facet metrics to rank the candidates and reduce the possible facet space by selecting the most useful ones. We explored indirect properties to find better categorizations and consequently enhance facets’ usefulness. We proposed semantic similarity as a criterion to select among multiple candidate facets. Finally, we developed a holistic workflow that integrates all proposed methods.
Initial survey results support the used metrics. While indirect facets show promise as a helpful addition, their relevancy for the initial resource set needs to be ensured. This latter issue is also the main focus of our future efforts: How can we estimate the relatedness to the initial input for indirect facets? Another prime direction is a performance improvement of our initial prototype, to make it applicable for real-world systems (e.g., caching and parallelization of queries).

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.