Microsoft AI researchers unintentionally exposed tens of terabytes of secret keys and passwords while publishing a storage bucket of open-source training data on GitHub.
According to research that cloud security startup Wiz shared with TechCrunch, as part of its ongoing investigation into the unintentional disclosure of cloud-hosted data, the company found a GitHub repository that belonged to Microsoft’s AI research group.
The open source code and AI models for picture recognition were available in a GitHub repository, and readers were instructed to download the models from an Azure Storage URL.
Wiz discovered that this URL was mistakenly set up to allow permissions for the whole storage account, accidentally disclosing further confidential information.
38 terabytes of private data, including the computer backups of two Microsoft workers, were contained in the repository. Additional sensitive personal information was also included in the data, including over 30,000 confidential Microsoft Teams chats from hundreds of Microsoft workers, secret keys, and passwords to Microsoft services.
Microsoft’s Security Response Centre stated in a blog post that was shared with TechCrunch before it was published that “no customer data was exposed, and no other internal services were put at risk because of this issue.”
As a result of Wiz’s research, Microsoft claimed that it has enhanced GitHub’s secret spanning service, which checks all changes to publicly available open-source code for the unencrypted disclosure of passwords and other secrets, to include any SAS tokens that might have too lenient expirations or powers.