Information Gain is used to determine which feature/attribute gives us the maximum information about a class. It is based on the concept of entropy, which is the degree of uncertainty, impurity or disorder. It aims to reduce the level of entropy starting from the root node to the leave nodes.
Formula for Entropy
‘p’, denotes the probability and E(S) denotes the entropy. Entropy is not preferred due to the ‘log’ function as it increases the computational complexity.
Information Entropy can be thought of as how unpredictable a dataset is.
- A set of only one class (say, blue) is extremely predictable: anything in it is blue. This would have low entropy.
- A set of many mixed classes is unpredictable: a given element could be any color! This would have high entropy.
Information Gain is calculated for a split by subtracting the weighted entropies of each branch from the original entropy.
When training a Decision Tree using these metrics, the best split is chosen by maximizing Information Gain.
To learn more about Information Gain, please visit this link.