If you look at the documentation for the **DecisionTreeClassifier** class in **scikit-learn**, you’ll see something like this for the `criterion`

parameter:

The **RandomForestClassifier** documentation says the same thing. Both mention that the default criterion is “gini” for the **Gini Impurity**.

What is that?!

## Gini Impurity

Gini index or Gini impurity measures the degree or probability of a particular variable being wrongly classified when it is randomly chosen.

But what is actually meant by ‘impurity’? If all the elements belong to a single class, then it can be called pure.

The degree of Gini index varies between 0 and 1, where 0 denotes that all elements belong to a certain class or if there exists only one class, and 1 denotes that the elements are randomly distributed across various classes. A Gini Index of 0.5 denotes equally distributed elements into some classes.

**Formula for Gini Index or Impurity**

where *p** _{i }*is the probability of an object being classified to a particular class.

While building the decision tree, we would prefer choosing the attribute/feature with the least Gini index as the root node.

To learn more about Gini Impurity, please visit this **link**.

## Comments

0 comments

Article is closed for comments.