I have a categorical feature called Country which contains around 50 unique values, but 70% of values are having value India. Should I have only 2 values as 0 and 1 representing India and Rest or should I have one different value for each of the country, which approach is better for model accuracy
You can try different combinations for the Country column, i.e first create 0 and 1 values for India and Rest. Build a model around that. Note down the evaluation metrics.
Then trying different combination of values like dividing the column into groups based on how they are affecting the target variable (for example, use pandas groupby function to make groups of different countries with the mean value of target variable and grouping those countries together which have a similar mean, this will allow you to make as many groups you want, and then you can do one-hot encoding on them).