When I did dummification its columns increase to 2770 columns so if I do the PCA on this huge columns there is no use. So how can I handle this?
In my house data set,It consists of 80 columns in total and contains
43- categorical value and 37- numerical value.
I wanted to apply PCA on this data set. Before that, I wanted to do the scaling of the data set. So I had to convert the categorical value into numerical by doing dummification using following codes.
housedata=pd.get_dummies(housedata)
housedata.head()
scaler=StandardScaler()
scaler.fit(housedata)
housedata=scaler.transform(housedata)
housedata=pd.DataFrame(housedata)
housedata.head()27
When I did dummification its columns increase to 2770 columns so if I do the PCA on this huge columns there is no use. so how can I handle this?
Training over 2770 parameters is a tedious task and this will create an overfitted model so what you can do. Create dummy variables for the features which have less number of categories. And then you can also perform PCA and see if model performance improves or not.
Comments
0 comments
Article is closed for comments.