Commonly, confusion denotes bewildering uncertainty, often associated with delirium or even dementia. From the confusion of languages in the Biblical Genesis to Genesis the band, broader audiences mostly encounter negative aspects of confusion. But confusion can be negative and positive, sometimes both at the same time. Moreover, it’s a subject of scientific interest, a phenomenon that can’t be ignored, that requires scientific understanding, and needs to be designed and moderated.
A convenient tool to measure confusion in a system is the so-called confusion matrix. This is used in linguistics and computer science—in particular, machine learning. In principle, the confusion matrix is a table, where all criteria in the row dimension are compared with all criteria in the column dimension. A simple example is to compare all letters of the alphabet spoken by an English native with the letters actually perceived by a German speaker. An English “e” will often be confused with the German “i,” resulting in a higher value in the matrix where the “e” row crosses the “i” column. Ideally, of course, letters are confused only with themselves, resulting in high values exclusively along the matrix diagonal. Actual confusion, in other words, is characterized by patterns of higher values, off the matrix diagonal.
Unfortunately, the use of the confusion matrix is still mostly governed by what Richard Dawkins calls “the tyranny of the discontinuous mind.” Processing the confusion matrix, scholars generally derive secondary measures to quantify Type I and Type II errors (false positives and false negatives) and a number of similarly aggregate measures. In short, the confusion matrix is used to make classification by humans and artificial intelligence less confusing. A typical (and useful) example is to compare a machine classification of images with the known ground truth. No doubt, quantifying the confusion of ducks with alligators, just like pedestrians with street signs, is a crucial application that can save lives. Likewise, it’s often useful to optimize classification systems in order to minimize the confusion of human curators. A good example would be the effort of the semantic Web community to simplify global classification systems, such as the Umbel ontology or the category system in Wikipedia, to allow for easy data collection and classification with minimal ambiguity. Nevertheless, the almost exclusive focus on optimization by minimizing confusion is unfortunate, as perfect discreteness of categories is undesirable in many real systems, from the function of genes and proteins to individual roles in society. Too little confusion between categories or groups and the system is, in essence, dead. Too much confusion and the system is overwhelmed by chaos. In a social network, total lack of confusion annihilates any base for communication between groups, while complete confusion would be equivalent to a meaningless cacophony of everything meaning everything.
Network science is increasingly curious regarding this situation, dealing with confusion using the concept of overlap in community finding. Multifunctional molecules—genes and proteins, for example—act as drugs and drug targets, where confusion needs to be moderated in order to hit the target while minimizing unwanted side effects. Similar situations arise in social life. Only recently has it become possible in network science to deal with such phenomena in an efficient way. Network science initially focused mostly on identifying discrete communities, since finding them is much simpler in terms of computation. In such a perfect world, one where all communities are discrete, there’s no confusion—or, more precisely, confusion is ignored. In such a world, the confusion or co-occurrence matrix can be sorted so that all communities form squares or rectangles along the matrix diagonal. In a more complicated case, neighboring communities overlap, forming subcommunities between two almost discrete communities—say, people belonging to the same company while also belonging to the same family. It’s easy to imagine more complicated cases. At the other end of the spectrum, we find all-out, complex overlap, which is hard to imagine or visualize in terms of sorting the matrix. It may well be true, however, that complex overlap is crucial to the survival of the system in question.
There’s a known case in which confusion by design is desirable: a highly cited concept in materials science introduced in 1993 in an article in Nature. Lindsay Greer’s so-called principle of confusion applies to the formation of metallic glass. In short, the principle states that using a greater variety of metal atoms to form a glass is more convenient due to the resulting impurities, which give the material less chance to crystallize. This allows for larger glass objects with interesting material properties, such as being stronger than steel. The convenience of more confusion is counterintuitive; it’s increasingly harder to determine the material properties of a glass the greater the variety of metals involved. It wouldn’t be surprising to see something like Greer’s principle of confusion applied to other systems as well.
While such questions await solution, as a take-home we should expect critical amounts of confusion in many real-life systems, with the optimum in between, but not identical with, perfect discreteness and perfect homogeneity. Further identifying, understanding, and successfully moderating patterns of confusion in real systems is an ongoing challenge. Solving it is likely essential in many fields, from materials and medicine to social justice and the ethics of artificial intelligence. Science will help us clarify—if possible, embrace; if necessary, avoid—confusion. Of course, we should be cautious: The control of confusion can be used for peace or war, much like the rods in a nuclear reactor, with the difference that switching off confusion in a social system may be just as deadly as switching it on.