I was working on Insurance Data Set for Term2 Project. I have done the following steps but not able to get the accuracy score of more than 57%:
Step1. Applied Lasso to down-select the features based on Lasso Coefficients and prepare a subset from master data set based on Lasso coefficient.
Step2: Applied Random Forest Classifier model on selected data set.
Step3. Applied Feature_Selection_ functionality of Random Forest Classifier on master data set and prepare a subset based on Feature Importance
Step4: Applied Random Forest Classifier model
Step 5 : Applied XG Boost , Gradient Boosting Classifier, Decision Tree Classifier on selected data set in Step3.
After trying different models on selected data set the accuracy score improves from 37% to 55%. Following are the details:
Feature Selection |
Model |
Accuracy Score Test |
Accuracy Score Train |
Lasso |
Random Forest |
37.55% |
98.56% |
Random forest.Feature_importance_ |
Random Forest Classifier |
48.57% |
99.03% |
Random forest.Feature_importance_ |
XGBoost |
54.86% |
56.27% |
Random forest.Feature_importance_ |
Graident Boosting Classifier |
55.35% |
57.88% |
Random forest.Feature_importance_ |
Decision Tree Classifier |
40.54% |
100% |
Q1: Now my query is, how I can further improve the accuracy score of this model?
Q2: Also, I am not able to decide which parameter Precision, Recall, F1 Score should be checked and improved to get the best model in this problem or just accuracy score is needed?
- Firstly, you should try to create a baseline model. Then you should try to do the feature engineering to create an optimum model. Later part you can compare it with the baseline model.
- You can improve the model accuracy by hyper parameter tuning with grid search. Use either F1 score or ROC AUC as parameter option. Your model is probably overfitting the data. Make sure the feature engineering is computed effectively and efficiently.
- Please make sure that you consider only those functions that are useful in the notebook.
- You should try to create a proper flow of work while performing modelling.
Comments
0 comments
Article is closed for comments.