Special Issue: Multimodal Neuroimaging Computing: the Methods and Applications
 Open Access
 Published:
The diversity rankscore function for combining human visual perception systems
Brain Informatics volume 3, pages 63–72 (2016)
Abstract
There are many situations in which a joint decision, based on the observations or decisions of multiple individuals, is desired. The challenge is determining when a combined decision is better than each of the individual systems, along with choosing the best way to perform the combination. It has been shown that the diversity between systems plays a role in the performance of their fusion. This study involved several pairs of people, each viewing an event and reporting an observation, along with their confidence level. Each observer is treated as a visual perception system, and hence an associated scoring system is created based on the observer’s confidence. A diversity rankscore function on a set of observation pairs is calculated using the notion of cognitive diversity between two scoring systems in the combinatorial fusion analysis framework. The resulting diversity rankscore function graph provides a powerful visualization tool for the diversity variation among a set of system pairs, helping to identify which system pairs are most likely to show improved performance with combination.
Introduction
The concept of multiple scoring systems has been applied to a variety of domains [1, 2]. In situations where multiple scoring systems are constructed, we are interested in conducting a metaanalysis to gain an understanding of the relationship between the systems, specifically the diversity between them. It has been shown that the combination of two scoring systems can outperform individual systems when there is some diversity between the systems, and they are of relatively good performance [1, 3]. To this end, quantitative measures of diversity can be used to generate diversity scores for pairs of systems, which can then be analyzed within the combinatorial fusion analysis (CFA) framework [1].
Human beings are constantly and naturally performing fusion of information within and among the senses. There is extensive research in this area on the neurological level pertaining to how fusion in the sensory system works [4–6], how visual information is combined with information from other senses [7–11], and how visual systems are combined [12, 9, 13]. In this study, however, we are focused on the interhuman level of information and fusion of the information at the decision level.
There are many situations in which two people’s observations are considered for a decision, such as referees in a football or tennis match, physicians examining a patient, copilots navigating a plane, and so on. For example, when two physicians are examining a new patient, each may observe different symptoms that can indicate different diseases; interactive consultation may lead to a final diagnosis. When two people are interactively making a decision based on visual input, research by Bahrami et al [12], Ernst and Banks [7], and Kepecs et al [13] suggests that these decisions are improved when two people are interactively making the decision, rather than an individual. The question then becomes, if we have two people making visual observations of an event, how do we integrate these observations or decisions? Do we choose one of the observer’s results, or create a combination of the two? Koriat [14] emphasizes the importance of confidence, and that it may be a good option to take the decision of the more confident person. The approach taken in our study is to combine the observations or decisions made by two people in an attempt to outperform the individual decisions. The visual observations tested in this project involve pairs of volunteers that are asked to give the location of a small object they observe being tossed in a field.
In order to perform the desired combination, by score or by rank, a scoring system must first be constructed for each participant in a trial. Each participant’s observation, or perception system, is represented as a scoring system, which is made up of a score function and a rank function. Given this multiple scoring system scenario, we then analyze the cognitive diversity between the scoring systems of a trial. A quantitative diversity measure, the distance between two rankscore functions, is used to represent the cognitive diversity between two scoring systems [1, 2]. Examining the relative diversities between the system pairs, together with the performance of their combinations, can give us insight into how diversity variation may play a role in the performance of system combinations. The diversities between systems are analyzed using the diversity rankscore functions, which are then visualized in diversity rankscore graphs. This visualization of diversity variation is beneficial in situations where there are a large number of scoring system pairs (hundreds or thousands). Interactive data visualization [15–17] is a dynamic field in which data are visualized with the intent to facilitate an end user in a particular task. The diversity rankscore function graph is such a tool that has potential to be integrated into various data analytics and software systems.
Information fusion can be applied to many situations where there are multiple scoring systems, or multiple classifiers. For example, the CFA framework [18, 1, 2] has been applied to information retrieval [19], text categorization [20], target tracking [21], sensor feature selection and combination [22], and image skeleton pruning [23]. Combinatorial fusion has also been used for enhancing the analysis of various biomedical datasets including virtual screening for molecular compounds [3], protein structure prediction [24], and ChIPseq peak detection [25]. When combining multiple models (performing information fusion), it would be useful to know in advance whether the fusion will outperform the best model. Ng and Kantor [26] identify system features that can help predict whether fusion will be beneficial. Combination of multiple classifiers has also been shown to improve results in the area of pattern recognition. [27, 28]
The content of this paper is organized as follows: Sect. 2 describes the concept of multiple visual perception systems, along with the corresponding multiple scoring systems, which are considered a generalization of multiple classifier systems. The CFA framework, which establishes each visual perception system as a scoring system and combines two such systems, is also described. The diversity rankscore function can be used as a guiding light to combine pairs of visual perception systems based on the diversity variation across a set of trials. In Sect. 3, we describe the visual perception dataset, present the results of scoring system combinations, and examine the role of the diversity rankscore function graph in the context of diversity variation and visualization. Concluding remarks and discussion are included in Sect. 4.
Multiple visual perception systems
From multiple classifier systems to multiple scoring systems
In many domains, such as biomedical informatics, finance, security, information retrieval, among others, classification models are created in order to generate class predictions for new data. Binary classifiers attempt to categorize items into one of two classes (or labels). For example, determining whether a webpage is relevant to a search term or not, or whether a patient tests positive or negative for a disease. Some binary classification problems are asymmetric, meaning one class occurs much less frequently than the other. Multiclass classifiers involve more than two classes.
The output of a classification system includes a class prediction, along with an associated probability. Treating these probabilities as scores, and sorting the results by score to generate rankings, enables us to consider classification systems as a scoring system that have a score function and a rank function.
In an effort to improve classification accuracy, it is often desired to incorporate the results from multiple classifiers that are varied in terms of their approach or algorithm. The element of variety, or diversity, is essential since different classifiers may contribute various perspectives, results, or predictions, on the data. Generally, the results from multiple classifier systems are combined using ensemble methods such as majority voting (bagging) or weighted voting (boosting). Table 1a, b contains a snapshot from a classification example in which the class label of a sample document is predicted in each of the following two cases: (a) 3 class labels, and (b) 6 class labels (Table 1a, b). The document is analyzed by 4 different classifiers, each of which output the probability that the document belongs to class A, B, or C, in the case of 3 class labels. In Table 1b, each document belongs to one of the 6 class labels: A, B, C, D, E, or F. For each classifier, the class label with the highest probability is considered the predicated class label and is assigned rank 1. Likewise, the next highest probability is assigned rank 2, and so on. The ensemble approach of majority voting is used to combine the results of the individual classifiers. For each class label, we count the number of times that class is ranked 1 (has the highest probability) by a classifier. Then, the class label with the highest number of votes is considered the predicted class for the document.
If we consider the classifiers as scoring systems (see Table 2), we can apply score and rank combinations as an alternative ensemble approach. Here, the probabilities are treated as scores, which are then ranked. Score combination (SC), in this example, is the average of the scores for a class label across the 4 classifiers. The class label with the highest average score is chosen as the result. The rank combination (RC) is computed as the average rank for a class label for all classifiers. The class label with the lowest average rank is then selected. Weighted averages can be used if the past performance of the classifiers is known. In this example, we can see that combining by score or rank may produce different results. Table 1b is a classification problem that involves more possible class labels. In this example, we see that classifiers can be viewed as scoring systems, where the scores are the class label probabilities. The concept of multiple classifier systems with multiple class labels (the case in Table 1b) is then generalized to multiple scoring systems with multiple choices (items or options) (as is the case in Table 2).
When constructing an ensemble, it is desired to have diversity among the component classifiers or scoring systems. Several techniques for measuring diversity have been proposed for regression and classification [29, 30]. It is more challenging to measure diversity between classifiers if we just consider the output class labels, without their associated probabilities [29].
Viewing classification systems as scoring systems enables us to apply the concept of diversity that has been defined for multiple scoring systems [1, 2, 26, 3].
The combinatorial fusion framework
Establishing each visual perception system as a scoring system
In situations where we have a set of documents (webpages, genes, customers, etc.) that are assigned scores or probabilities by an algorithm or classifier, creating a scoring system is straightforward. However, in cases where we do not have a set of scores to work with, a score function needs to be generated based on the value(s) given. In this experiment, when an observer is deciding on the proposed landing point of the object based on the visual input, he/she is selecting from several locations within a range. Intervals within this visual range will be considered as the items (or options) that will be scored and ranked. Since there are two subjects within each trial, the corresponding score functions must score the same set of intervals. To this end, a common visual space is created, as described in previous work [18]. First, the mean of the decisions (points) for the two observers P and Q is computed in three different versions, varying the weight given to the confidence radius σ. \(M_0, M_1\), and M _{2}, are computed as
The scoring system analysis is performed for each version of M _{ i }. Specifically, the M _{ i } values are used as a foundation point from which to create a common visual space. The M _{ i } points are always located between the P and Q original points. The visual space is also extended on both sides of P and Q. The common visual space is divided into 63 intervals. The interval scores are computed using a normal distribution around M _{ i }, using the confidence radius (0.5r) for the standard deviation. The performance of each M _{ i } is measured as the distance from M _{ i } to the actual location of the object [31]. The scores, created for the intervals for P and Q, give us the score functions \(s_P\) and \(s_Q\). Given a set of intervals \({d_1,d_2,\ldots ,d_n}\), the scoring system P consists of a score function \(s_P\), rank function \(r_P\), and rankscore characteristic (RSC) function \(f_P\) (see Fig. 1). The rank function for the scoring systems P and Q are obtained by sorting \(s_P\) and \(s_Q\) and assigning ranks to create the rank functions \(r_P\) and \(r_Q\). The RankScore Characteristic (RSC) function, as defined by Hsu et al [1, 2], is the composite function of \(s_p\) and the inverse of \(r_P\). Rankscore functions map ranks to scores, and are independent of the data items. Here, the rankscore characteristic (RSC) function for the scoring system P, \(f_P : N \rightarrow R\), is computed as
Similarly, \(f_Q\) is computed for scoring system Q.
Combining two visual perception systems
Within the CFA framework [1, 2], system combination is performed either by score or rank combination. A score combination is computed as the average of the score functions, \(s_p\) and \(s_Q\) for each interval, \(d_i\), giving us the score function of the score combination \(s_{SC}\). The rank function of the score combination, \(r_{SC}\), is achieved by sorting \(s_{SC}\) in descending order and obtaining ranks for each \(d_i\). In addition, we compute the rank combination by averaging the rank functions \(r_P\) and \(r_Q\), to give us the score function of the rank combination, \(s_{RC}\). We sort this function in ascending order and assign ranks to get its associated rank function, \(r_{RC}\) (see the example in Table 2). The performance of these combined results is measured by the distance of the newly computed points to the actual x,y coordinates where the object landed in the field.
Cognitive diversity between two scoring systems
In cases where multiple scoring systems, algorithms, or approaches exist, it is beneficial to know under what circumstances combining pairs of these systems could result in improved performance. Diversity between two scoring systems A and B can be measured in a few different ways, such as the distance between score or rank functions using covariance (between \(s_A\) and \(s_B\)) or Kendalls tau (using \(r_A\) and \(r_B\)), respectively. Another method to measure the diversity between two scoring systems, which is used here and called cognitive diversity, is to measure the distance between the rankscore functions (\(f_A\) and \(f_B\)) of the two systems [1, 2] (see formula (2) and Fig. 1). Figure 2 illustrates two RSC functions, \(f_A\) and \(f_B\), for two arbitrary scoring systems A and B. One distance measurement is the area between the two RSC functions. We note that the cognitive diversity between scoring systems A and B, as seen in Fig. 2, provides a powerful visualization tool on the similarity or dissimilarity between these two visual perception systems, A and B, in the context of the current study.
In this analysis, the concept of cognitive diversity is applied to the trials and scoring systems P and Q, which represent the 2 participants in a given trial pair. Therefore, the cognitive diversity of the two observers P and Q, d(P,Q), defined as the distance between the rankscore functions of two systems P and Q, \(f_P\) and \(f_Q\), is computed as follows:
Diversity rankscore function across a set of trials
Let \(T = \{(p_1, q_1), (p_2, q_2), \ldots , (p_n, q_n)\}\) represent a set of n trials, each consisting of an ordered pair of participants and let \(R = \{d(p_1, q_1), d(p_2, q_2), \ldots , d(p_n, q_n)\}\) represent the diversity scores for each pair in T, where \(N = {1, 2, \ldots , n}\). The cognitive diversity between each pair of scoring systems, P and Q, is measured by the diversity function d(P,Q), as shown in equation (3), where m is the number of items (intervals) to be scored; in this case m is 63, indicating the number of intervals in the common visual space. The set of diversity values itself can be treated as a scoring system, making the diversity function into a diversity score function. For this purpose, the number d(P,Q), which is the diversity between scoring systems P and Q, is considered as the diversity score function value of the trial (p,q) and is denoted as \(s_{(p,q)}\). The diversity rank function is attained by sorting the score function and generating ranks, giving \(r_{(p,q)}\). A diversity rankscore function, \(f_{(p,q)}\), is computed as
The diversity rankscore function is a mapping from diversity ranks to diversity scores. The relationship between \(s_{(p,q)}\), \(r_{(p,q)}\), and \(f_{(p,q)}\) is shown in Fig. 3.
Case analysis using diversity rankscore graph
Visual perception dataset
The setting for the data collection was in a grassy field in NYC’s scenic Central Park. A lab member was tasked with recruiting pairs of participants for the experiment. The pairs of subjects varied in terms of gender and relationship between the individuals. The subject pairs were randomly chosen and could be friends, siblings, husband and wife, colleagues, or acquaintances. A small metal object that was made of metal plates, nuts, and a bolt, and of size 1.5 by 1.5 inches was used for the experiment, since it was possible to throw it far distances, small enough to be hidden in the grass, and would not roll from its position once landed. The subject pairs stood 40 feet from a marked square of size 250 by 250 inches, and the individuals stood a distance of 10 feet away from each other. A member of our group tossed the metal object into the designated square. Each participant is asked individually to walk and point to where he/she believed the object landed. A marker is placed at these locations. The participants are also asked to give a measure of their confidence of his or her guess in the form of a confidence radius around the specified mark. Lab members helped the participants gauge their confidence radius by using tool consisting of 2 poles of length 36 by 36 inches to represent the x and y coordinates. Smaller radius values indicate higher confidence of the subject. A lab member measures the distance from the actual position where the object landed and the guess positions of the subjects.
The subjects are given feedback as to how far off their guess is from the actual landing point of the object. The values collected are: x,y coordinates for subject P and Q from each experiment, a confidence radius for each participant, along with the actual landing x,y coordinate of the object. All measurements are in inches. The values for the trials in this most recent experiment are shown in Table 3. Our group has conducted previous data collection activities of this type, the data of which can be found in [18].
The distribution functions for P and Q for a sample trial are shown in Fig. 4a. Sample rankscore functions for a trial are shown in Fig. 4b.
Analysis results of combinations
The experimental results are presented in Fig. 5. The performances of P and Q, shown in column (a), are the distances to the actual landing point of the object. The confidence radii are included in column (b), in which a shaded cell indicates that the more confident participant leads to the best performance. The performance of the weighted means M _{0}, M _{1}, and M _{2} is listed in column (c). C represents the score combination and D represents the rank combination. The last column, (d), presents information for the results using each of the weighted means, along with the score and rank combinations (C and D). For each \(i=\{0,1,\) and \(2\}\), P, Q, M _{ i }, C, and D are ranked in descending order of performance; repeated ranks indicate tied performance. Rank 1 showed the best performance, meaning the closest interval to the actual location of the object. Cases where the score (C) or rank (D) combinations either outperformed or tied the best individual system are highlighted.
The role of diversity rankscore graphs
After performing the score and rank combinations for the three different computations of M ( M _{0}, M _{1}, and M _{2}), we can summarize the results as follows: Using M _{0}, the score and/or rank combination for 14/16 trials showed either tied or improved performance compared to the best individual system; using M _{1}, 9/16 trials; and using M _{2}, 7/16 trials. The diversity rankscore functions for the scoring systems created according to the three different computations of the mean: M _{0}, M _{1}, and M _{2}, are depicted in Fig. 6. Examination of these graphs, along with the performance of the corresponding system pair combinations, can help us understand the role of cognitive diversity in system combinations by score and rank. To make the connection with the trials, Table 4 is included to show the ranking of trials according to the diversity of their component scoring systems, for each case of M _{0}, M _{1}, and M _{2}. When comparing with the performances of the system combinations, we detect a tendency for pairs of systems with relatively high diversity to have more improved performance. In this study, this observation is most strongly supported by the data in the M _{1} scenario. In new situations, where we may not be able to predict the performance, analyzing the relative diversities between scoring systems may give us insight into which pairs of systems are most likely to show improvement with combination.
We observe that the diversity rankscore graphs are good indicators for the combination outcome. For example, trials d _{5} and d _{16} appear at the very end of the graph in M _{0}, M _{1}, and M _{2} (see Figure 6 and Table 4). In these two trials, neither rank nor score combination helps improve the outcome. However, even though trial d _{9} has a very low diversity (Table 4), its combination of scoring systems P and Q is better than or equal to the best of P and Q since P has a relatively high performance.
Conclusion and further work
In this paper, we studied the combination of multiple visual perception systems using the CFA framework and the diversity rankscore function. By establishing each visual perception system as a scoring system on a set of options (possible locations, in our context) in a common visual space, the problem of combining multiple visual perception systems is treated as a problem of combining multiple scoring systems. Using a dataset of an experiment with sixteen trials where each trial consists of a pair of two observers, we studied various issues as to how the diversity between these two observers (and their individual perception systems) affects the performance of the combined system.
At the individual trial level, we illustrated that the rankscore characteristic (RSC) function graphs of the two scoring systems (perception systems) can provide a useful visualization tool on the similarity or dissimilarity between these two visual perception systems (see Fig. 2 and Sect. 2.2.3). At the population level, the diversity rankscore graphs on three common visual space definitions, M _{0}, M _{1}, and M _{2}, respectively provide a powerful visualization comparison, not only among all (sixteen) trials in an experiment, but also among all (three) analytic methods based on M _{0}, M _{1}, and M _{2}, respectively (see Fig. 5 and Sect. 2.3). Our current study suggests a few issues which are worthy of further investigation. We list three here:

1.
With the diversity rankscore function defined in formula (4) and the diversity rankscore graphs based on M _{0}, M _{1}, and M _{2}, extend the study to include higher order of M _{ i }, i = 4, 5, and so on (refer to formula 1).

2.
Establish a CFA framework to study the combination of more than two visual perception systems. In this regard, the notion of diversity among more than two systems would have to be defined differently.

3.
Apply the visualization tool illustrated in current work to combination of multiple sensing systems, multiple robotics systems, and multimodal physiological imaging systems such as MRI, EEG, and EKG.
References
Hsu DF, Chung YS, Kristal BS (2006) Combinatorial fusion analysis: methods and practice of combining multiple scoring systems. In: Hsu HH (ed) Advanced data mining technologies in bioinformatics. Idea Group Inc., Calgary, pp 1157–1181
Hsu DF, Kristal BS, Schweikert C (2010) Rankscore characteristics (RSC) function and cognitive diversity. Brain Inform 8211:42–54
Yang JM et al (2005) Consensus scoring for improving enrichment in virtual screening. J Chem Inform Model 45:1134–1146
Gold JI, Shadlen N (2007) The neural basis of decision making. Ann Rev Neurosci 30:535–574
Hillis JM et al (2002) Combining sensory information: mandatory fusion within, but not between, senses. Science 298(5598):1627–1630
Tong F, Meng M, Blake R (2006) Neural basis of binocular rivalry. Trends Cognit Sci 10(11):502–511
Ernst MO, Banks MS (2002) Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415:429–433
Ernst MO (2007) Learning to integrate arbitrary signals from vision and touch. J Vis 7(5):1–14
Ernst MO (2010) Decisions made better. Science 329(5995):1022–1023
Gepshtein S et al (2009) The combination of vision and touch depends on spatial proximity. J Vis 5(11):1013–1023
Lunghi C, Binda P, Morrone C (2010) Touch disambiguates rivalrous perception at early stages of visual analysis. Curr Biol 20(4):R143–R144
Bahrami B et al (2010) Optimally interacting minds. Science 329(5995):1081–1085
Kepecs A et al (2008) Neural correlates, computation and behavioural impact of decision confidence. Nature 455:227–231
Koriat A (2012) When are two heads better than one. Science 336:360–362
Holzinger A, Bruschi M, Eder W (2013) On interactive data visualization of physiological lowcostsensor data with focus on mental stress. In: Cuzzocrea A et al (eds) Multidisciplinary research and practice for information systems, springer lecture notes in computer science LNCS 8127: 469480. Springer, Heidelberg
Turkay C et al (2014) On computationallyenhanced visual analysis of heterogeneous data and its application in biomedical informatics. In: Holzinger A, Jurisica I (eds) Interactive knowledge discovery and data mining: stateoftheart and future challenges in biomedical informatics., Lecture notes in computer scienceSpringer, Berlin, pp 117–140
Wong BLW, Xu K, Holzinger A (2011) Interactive visualization for information analysis in medical diagnosis. In: Holzinger A, Simonic KM (eds) information quality in ehealth., Lecture notes in computer scienceSpringer, Berlin, pp 109–120
Batallones A et al (2015) On the combination of two visual cognition systems using combinatorial fusion. Brain Inform 2:2132
Hsu DF, Taksa I (2005) Comparing rank and score combination methods for data fusion in information retrieval. Inform Retr 8(3):449–480
Li Y, Hsu DF, Chung SM (2013) Combination of multiple feature selection methods for text categorization by using combinatorial fusion analysis and rankscore characteristic. Int J Artif Intell Tools 22(2):1350001
Lyons DM, Hsu DF (2009) Combining multiple scoring systems for target tracking using rankscore characteristics. Inform Fus 10(2):124–136
Deng Y et al (2013) Sensor feature selection and combination for stress identification using combinatorial fusion. Int J Adv Robot Syst 10:306–313
Liu H et al (2013) A skeleton pruning algorithm based on information fusion. Pattern Recognit Lett 34(10):1138–1145
Lin KL et al (2007) Feature selection and combination criteria for improving accuracy in protein structure prediction. IEEE Trans NanoBiosci 6(2):186–196
Schweikert C et al (2012) Combining multiple ChIPseq peak detection systems using combinatorial fusion. BMC Genomics 13(Suppl 8):S12
Ng KB, Kantor PB (2000) Predicting the effectiveness of naive data fusion on the basis of system characteristics. J Am Soc Inform Sci 51(12):1177–1189
Ho TK et al (1994) Decision combination in multiple classifier systems. IEEE Trans PAMI 16(1):66–75
Ho TK (1995) Random decision forests, Proceedings of the 3rd international conference on document analysis and recognition, Montreal, pp 278–282
Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation method: a survey and categorization. Inform Fusion 6:5–20
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. WileyInterscience, Hoboken
Mulia D et al. (2015) Joint decision making on two perception systems using diversity rankscore function graph. Brain informatics and health, Guo Y et al. (eds): BIH 2015, LNAI 9250:337346
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Schweikert, C., Mulia, D., Sanchez, K. et al. The diversity rankscore function for combining human visual perception systems. Brain Inf. 3, 63–72 (2016). https://doi.org/10.1007/s4070801600373
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s4070801600373
Keywords
 Cognitive diversity
 Combinatorial fusion analysis
 Diversity rankscore function
 Multiple scoring systems
 Rankscore characteristic (RSC) function