Personally identifiable information has been found in DataComp CommonPool, one of the largest open-source data sets used to train image generation models. Millions of images of passports, credit cards ...
There’s been a “targeted, surgical removal of data sets, or elements of data sets, that are not aligned with the administration’s priorities,” said Denice Ross at the Federation of American Scientists ...