A Manchester law firm has begun onboarding clients for a likely class action against Microsoft and Google, which it says illegally collect and use personal data to train their artificial intelligence (AI) models.
After a two-year investigation into the data practices of tech giants, Barings Law believes that the extensive information collected about users, including voice data, demographic data, application usage information, metadata, payment details and a range of other personal details. – is potentially shared for the training and development of various AI major language models (LLM).
Barings says all this is happening without proper permission or consent from usersbecause while they may understand that data is being collected, they may be unaware of the role that data plays in training AI LLMs.
“Both companies collect data such as the sports teams you follow, the programming languages you prefer, the stocks you follow, the local weather or traffic, the route you take to work and the sound of your voice,” said Adnan Malik, data breach director at Barings Law. “We are shocked and disgusted to learn of the level of data that has been and continues to be collected. »
Malik added that while the proliferation of AI is transforming the world as we know it, the development of technology should not come at the expense of people’s right to privacy.
“Individuals have the right to know what data about them is stored and what it is used for,” he said. “They also have the right to refuse to have their behaviors, voice, image, habits and knowledge used to train AI for the benefit of tech giants.
“As technologies continue to develop, individual data has become the world’s most valuable asset. We know that it is illegal to steal goods like money, gold and oil. As a society, we cannot accept that it is acceptable to steal personal data.”
Join the trial
Barings is now inviting anyone with a Microsoft or Google account, or those who have used either company’s services, to join the lawsuit. This includes those who have used platforms and services such as YouTube, Gmail, Google Docs, Google Maps, LinkedIn, OneDrive, Outlook, Microsoft 365 and Xbox.
The company said it expected to be “inundated” with registrations and planned to formally begin legal proceedings in early 2025.
Microsoft and OpenAI, the company behind ChatGPT, are facing a separate class action in the United States from law firm Clarkson, following allegations that they violated the privacy of hundreds of millions of internet users by secretly harvesting large amounts of personal data to train AI chatbots. Filed in a federal court in San Francisco on June 28, the suit seeks damages of $3 billion.
Another trial was also filed against Google, again by law firm Clarkson, which alleges that the tech giant accessed the data of millions of users for use in the development of its AI chatbot, Bard, which has since been renamed Gemini. The lawsuit claims that Google surreptitiously stole “everything created and shared on the Internet by hundreds of millions of Americans.”
Malik said that although the cases are similar and, taken together, reflect growing international concern over data security, Barings is taking action against Microsoft and Google, rather than OpenAI.
“If you are shocked, upset, dismayed or annoyed that your data is being used without your knowledge or consent, my message is simple: do something by joining the fight,” he said. “Sign up today and let’s take charge of the future of our data and AI. »
Computer Weekly contacted Microsoft and Google about the lawsuit. Although Microsoft declined to comment, Google did not respond at the time of publication.
Other AI developers have already put forward various arguments to defend their use of people’s personal data and copyrighted material in training their modelsincluding that the material falls within “fair use” (which permits the limited use of copyrighted material without permission, for purposes such as criticism, news reporting, teaching and research).
For example, in a Copyright lawsuit filed by music publishers in January 2024 against LLM developer Anthropic AIthe Amazon-backed company argued that “using works to train Claude is fair because it does not prevent the sale of the original works and, even when commercial, is still sufficiently transformative.”
Anthropic also argued that “today’s general-purpose AI tools simply could not exist” if AI companies had to pay for licenses for the hardware, adding that it is not the alone in using data “largely gathered from the publicly available Internet”; and that “in practice, there is no other way to build a training corpus with the scale and diversity necessary to train a complex LLM with a broad understanding of human language and the world in general”.