The model API would have a code which would be carrying out preprocessing along with train, test split in the code, how do we avoid this splitting for real data? As I understand model once built would be deployed in the form of API so the same code would contain train/test splitting which is not required when the model is deployed in production?
We don't add the train-test split code into a deployed model. We only provide the trained model and necessary code files in the API.
Before deploying a model, we create a separate module to handle the preprocessing of the incoming real data and then feed this data to the model.