Microsoft has reportedly struck a deal with HarperCollins, a News Corp subsidiary, to use its nonfiction titles for training its AI models.
According to an unnamed source, Microsoft aims to use this data for a yet-to-be-announced model, though it does not plan to use the content for generating books without human authors.
Microsoft has declined to comment on the matter.
In a statement, HarperCollins confirmed it had reached an agreement with an unidentified AI technology company to “unidentified AI technology company that would “allow limited use of select nonfiction backlist titles for training AI models to improve model quality and performance,” according to Bloomberg.
HarperCollins authors will have the option to participate or not, the company said.
“Part of our role is to present authors with opportunities for their consideration while simultaneously protecting the underlying value of their works and our shared revenue and royalty streams,” HarperCollins said. “This agreement, with its limited scope and clear guardrails around model output that respects author’s rights, does that.”
Technology companies increasingly rely on diverse, high-quality data sources, such as social media, news articles, and licensed content, to train AI models.
These datasets improve the models’ accuracy and subject-specific expertise.
News Corp. entered a licensing agreement with OpenAI in May, allowing access to content from publications like The Wall Street Journal, Barron’s, and MarketWatch.
OpenAI has also partnered with publishers such as Axel Springer, The Atlantic, Vox Media, Dotdash Meredith, Hearst, and Time. Microsoft, meanwhile, has collaborated with Reuters, Hearst, and Axel Springer on various AI projects.