eWEEK content and product recommendations are editorially independent. We may earn money when you click on links to our partners. Learn more.
Microsoft has signed a three-year contract with Harper Collins to train an AI model previously unknown in the major publisher’s catalog. According to Bloomberg, the terms of the deal called for $5,000 per nonfiction book, split equally between the author and HarperCollins. The agreement is separate from other publishing agreements and does not count towards existing advancements. Additionally, the deal only applies to a selection of previously published nonfiction books, not fiction books.
404 Media announced the news but did not reveal the name of the technology company involved. Bloomberg published a follow-up article with more details, including that Microsoft is developing the AI model.
Microsoft-HarperCollins AI Agreement Terms
HarperCollins authors must adhere to the AI training program and authorize the use of their non-fiction books. Authors who decline the offer will not have their books included in the training dataset and will not receive payment. Not all HarperCollins authors will be offered this deal. Microsoft selects the books it wants to include in the training set.
The agreement would include conditions intended to alleviate the authors’ concerns about Generative AI and how this could plagiarize content or reduce demand for human writers. For example, the agreement states that “no more than 200 consecutive words and/or five percent of the text of a book” will be used to train the AI model. It also includes a commitment that Microsoft will not remove text from illegal piracy websites.
Why the Microsoft-HarperCollins AI deal matters
Large Learning Models (LLM) and others Types of AI models require large datasets to train on. Only a limited amount of content is available in the public domain. By purchasing access to HarperCollins’ nonfiction list, Microsoft is significantly increasing the pool of available data it can use to train its AI model.
While various tech companies have already made deals with publishers to form artificial intelligence models on past content, this is the first time that the specific terms of the agreement have been made public. The HarperCollins deal gives a monetary benchmark of what Microsoft – and by extension, others AI companies– are willing to spend to train their models.
A source also told Bloomberg that Microsoft’s AI model would not be used to generate books. The purpose of the new Microsoft AI model has not yet been announced.