Copyright restrictions are not the only possible restrictions.
If OpenAI says you're allowed to use their service under certain conditions, but you violate the conditions, then what's your legal basis for using the service? Forget about copyright, think about breach of contract or even computer fraud and abuse.
But let’s say you used the OpenAI GPT4 service to generate training data for a new model. You then train your model using that generated training data. In theory OpenAI can ban you from continuing to use their API and maybe even sue you for breach of terms of service, but that doesn’t mean the model you created based on that generated data is somehow now illegal to use or distribute. You can still sell or give away that trained model and there’s nothing OpenAI can do about that.
Let’s take specifically the case of Alpaca, the Stanford team generated a finetuning training set using GPT 3.5. Maybe OpenAI could sue them for doing that. But now that the training set exists and is freely available, I’m not using OpenAI if I finetune a new model with that existing training set. I have no contract with OpenAI, I’m not using their service, and OpenAI does not have any copyright claim on the generated dataset itself. They have no legal claim against me being able to use that dataset to fine tune and release a model.
If OpenAI says you're allowed to use their service under certain conditions, but you violate the conditions, then what's your legal basis for using the service? Forget about copyright, think about breach of contract or even computer fraud and abuse.