How does one hot encoding works?
You are simply creating a vector of each transaction. This vector will contain the count of items in each transaction under each item as each item is now addressed in the feature side.
Example:
Transaction Item
1) Fruits, Clothes
2) Vegetables
Transaction Fruits Clothes Vegetables
1) 1, 1, 0
2) 0, 0, 1
In the following line:
hot_encoded_bakery = bakery.groupby(['Transaction','Item'])['Item'].count().unstack().reset_index().fillna(0).set_index('Transaction')
hot_encoded_bakery.head()
You are performing chained operations as follows:
- You first grouped all the transactions and items and then only selecting count of feature "item".
- You unstacked it and it will convert all the values of item features to the axis = 1 i.e. features side.
- Then you reset the index, a new feature will be added i.e. Transaction.
- The you replaced all the NaN cells with zeros.
- Then you again set Transaction feature to index.
Instead you could simply have performed:
hot_encoded_bakery = bakery.groupby(['Transaction','Item'])['Item'].count().unstack().fillna(0)
hot_encoded_bakery.head()
Comments
0 comments
Article is closed for comments.