How to identify SOMs that should not be used

SOM is a well-known technique that can also be found in R and Python libraries. However, to be frank, most of those SOMs are poorly made and are the cause of widespread misconceptions about SOMs. Here is an example of an incorrect SOM implementation for data science:

  1. Normalize maximum to 1 and minimum to 0

    When presenting data to SOM, there is a problem that the impact on learning differs depending on the scale of the attribute (variable). Therefore, processing is required to unify the data scale. Academic literature on SOM often uses a scaling method with a maximum value of 1 and a minimum value of 0. However, this is undesirable because the influence of the attribute changes greatly depending on the accidental maximum and minimum values. Statistically, Z-score, the difference from the mean divided by the standard deviation,is used. In practice, techniques derived from this are used.

  2. Learning rate factor

    Professor Kohonen (1934-2021) of Helsinki University of Technology published the first algorithm for Self-Organizing Maps in 1982. The SOM that sets the ”learning rate factor” is this version of the SOM.This is the result of academic research on associative memory and is not a tool for data analysis. Also called sequential or online learning, learning involves reading data records one by one to bring the SOM node values closer to the input values. Therefore, depending on the order of the data records, the resulting SOM will be different. In 1992, the batch learning algorithm for SOM was published, which opened up the possibility of statistical applications of SOM.

  3. Set initial value of the nodes with random number.

    This doesn’t mean you shouldn’t use this method, but it doesn’t make much sense for data analysis purposes. In academic research on SOM algorithms, initial values are set using random numbers in order to prove that order can arise even from a state where the initial values are random, that is, a state where the lattice is randomly entangled. But it has nothing to do with the purpose of analyzing the data of interest. From a data analysis perspective, a reasonable method is to first perform principal component analysis on the data of interest, and then start SOM learning with node values set on the principal plane. In other words, PCA first summarizes the data on a linear plane, and then SOM extends the data summarization to a nonlinear (free-form surface).

  4. Only display the resulting map as a chart

    Typical result of SOM is often displayed as a two-dimensional chart with labels of data records pasted on it. Although this is one way to use SOM, SOM results are more than just a chart. In the first place, with this method, in the case of large-scale data, many data records correspond to one node, so it is not possible to display all the labels. The fact that SOM nodes carry similar data records is important, and the real use of SOM in data analysis is to use this to power up statistical analysis.

  5. There are no data records corresponding to the edges of the map

    This is an extra edition. In the world, there are cases where SOM learning has not been successful in the first place and is being distributed as SOM or Kohonen net. We often see charts where the labels in 3. do not spread over the entire map and are concentrated in the center of the map. Such maps have not been successfully trained by SOM and are considered like a degraded version of PCA.

In addition to the points mentioned above, various improvement technologies are applied to SOMs for practical use. Viscovery SOMine is a SOM-based data mining system that has been in continuous development since 1994 and is one of the most reliable SOMs.

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です