The Ugly Duckling Theorem and SOM – Mindware Research Institute

In science, there is an assumption that there is only one truth. For example, when measuring length or weight, the truth is assumed in the form “measured value = true value ± measurement error,” although the true value is unknowable to humans. The true value is not directly measured but is “estimated” using scientific methods. This idea is consistent in statistics and forms the basis of the scientific method.

However, applying this idea to classification and clustering can be tricky. This is because these are reflections of “human cognition” and not objective facts. For example, classification problems in machine learning use the measured classification results as labels to train algorithms. In other words, this is not estimating the true value of the classification, but building a model that reproduces the human measured value. In other words, this does not engage scientific truth in the above philosophy of science at all.

To put it another way, we are not exploring right classification, but rather, “How do humans classify them?” So what about clustering? Does clustering infer true classification? The answer is no.

Clustering is an exploratory method for discovering “useful” (to humans) classifications (or segmentations, groupings). This is because clustering produces different results depending on which attributes are included in the calculation. What this means is that when clustering is performed, the “context” is implicitly set by attribute selection.

After all, any classification is context-dependent, and there is no absolute classification. This was proved using pure logic in the Ugly Duckling Theorem. Satoshi Watanabe, one of the pioneers of machine learning, wondered around 1960 whether any two objects would share the same number of predicates if all predicates considered were taken into account.

And the consequence of his thought experiment was that we are forced to admit that some predicates are more important than others in order to make classification possible. In other words, the root cause that makes classification possible is attribute selection and weighting (prioritization).

Boundary Effects in SOM

For the ugly duckling theorem to make classification completely impossible, an infinite number of attributes must be taken into account. Thus, in reality, where finite attributes are taken into account, classification is never completely impossible. However, with standardized, hyper-multidimensional data, the context setting is inadequate, which can lead to poor classification.

In SOM, this appears as a boundary effect. The boundary effect is a phenomenon in which data records similar to the opposite edge correspond to one edge of the SOM map. In the SOM academic community, boundary effects are believed to occur because nodes at the edge of the map have fewer neighbors than others, disrupting the competitive learning equilibrium.

However, there is another way of looking at this. In other words, inconsistencies at the edges of the map are only noticeable, while inconsistencies in the center of the map are difficult to detect.

Principal component scores are standardized to have a variance of 1, and we have experienced in experiments that boundary effects are likely to appear if SOM is used to learn them as they are. Thus one cause of the SOM boundary effect can be thought of as the SOM ordering being disturbed, by all dimensional axes being treated too equally, and irrelevant dimensions contribute too heavily to classification in a given context.

By prioritizing according to the contribution rate of each principal component, it is possible to obtain the same appearance as the SOM learned from the original data, and the boundary effect disappears.By extending this method, by changing the priorities of the principal components, the context of analysis can be freely set without distorting the data space.