If you just use a machine learning library’s algorithm according to a textbook in a real problem, you won’t be able to achieve better than 70% to 80% performance. Making improvements from there is why it is called data science. There are generally three types of improvements to machine learning methods:
- Devise a completely new learning algorithm
- Improving some of the existing learning algorithms
- Combine existing algorithms
If you can do 1, you are an authority. If you can do 2 or 3, you might be a doctor. 2 is a high level because it involves rewriting the library source itself. 3 may be possible by coding in Python. However, there may be limits to that method. In some cases, it may be more productive for most users to use commercial software with these improvements.
The methods using multiple models in the various techniques have been used by our predecessors to improve the performance of machine learning algorithms are collectively called ensemble models. Currently, there are three types of ensemble models:
- Bagging
- Boosting
- Stacking
I think that the reality is that these types did not first exist and methods were developed according to them, but that this is what happened when we categorized the various improvements that our predecessors had undertaken. Bagging is a method in which multiple models are placed in parallel and their results are determined or averaged. I think it’s almost certain that the term “ensemble” comes from this.
By the way, Boosting is a method of connecting models in series. It involves creating one model and using measures such as residuals obtained from the results to determine the parameters of the next model. This is an image of the model being improved through repeated model creation.
Stacking is a method that adds the results obtained from a model using one method to the input of a model created using another method as new features. As mentioned in earlier article, this involves including the distance to the cluster centroid obtained by K-means as a new feature in decision tree learning.
I would like to think a little more about Bugging here. Think of it as an ensemble or chorus. It’s wonderful when a single professional singer sings, but a chorus performed by a group of average people also has a quality that cannot be found in the former. However, everyone is far from being a professional singer. For example, if you were the singing director of a J-Pop idol group, what would you do? You’ll do something like:
- Gather a lot of idol candidates.
- Give them tough lessons.
- As a result of the lessons, those who have reached a certain level are appointed as regular members.
- We carefully examine the characteristics of each person’s voice and ask them to sing only the parts that they are good at.
Bagging, which is currently widely used, performs 1 and 2 and takes the majority vote or average. There may be examples that have completed up to 3, but unfortunately I don’t think there are many examples that have completed up to 4. To be more practical, general Bagging creates multiple models by changing the selection of attributes used in the model, but with Self-Organizing Maps you can also change the weighting of attributes. Moreover SOM make it possible to do 4 by rejecting the results of nodes with high error rates.
Viscovery SOMine uses other techniques such as Boosting and Stacking in various places, even though they are not called that.