
Large language models, like ChatGPT, are trained on vast amounts of text data from books, websites, and other sources. And typically the data they’re trained on remains a secret.
https://stackdiary.com/chatgpts-training-data-can-be-exposed-via-a-divergence-attack/
