Could you please explain all the parameters for GridSearchCV?
GridSearchCV(estimator, param_grid, scoring=None, n_jobs=None, iid=’warn’, refit=True, cv=’warn’, verbose=0, pre_dispatch=‘2*n_jobs’, error_score=’raise-deprecating’, return_train_score=False)
- a) estimator: This takes in the ML model e.g. logreg = LogisticRegression(), then estimator will be logreg.
- b) param_grid: This takes in a dictionary with parameter names (string) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings. You can try with different ranges of values as this is an experimental process. We dont have fixed range of values for param_grid
- c) scoring: This takes in a single string to evaluate the predictions on the test set. For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values. eg. scoring='roc_auc' or scoring='r2' or scoring='negative_mean_squared_error' or scoring='accuracy'.
- d) n_jobs: This parameter sets the number of jobs to run in parallel. None means 1. This will depend on the number of cores in the CPU as you can run as many parallel processes as the number of cores in your CPU. To use all the cores of your CPU, set n_jobs=-1.
- e) cv: This parameter determines the cross-validation splitting strategy. It is used to perform KFold Cross-validation on the training data to check for overfitting. The ideal values of cv are 5 or 10 (anyone can be used depending upon the time constraints, 10 will take about twice as much time as 5).
These are the key parameters of the GridSearchCV algorithm and will be enough for every problem.