Popular social media platform to sell user data to the company behind ChatGPT

Automattic, the company behind WordPress.com and Tumblr, is looking to cash in on user content.

Feb 29, 2024 - 02:30
 0  9
Popular social media platform to sell user data to the company behind ChatGPT

A central ethical question that has surfaced amid the ongoing popularization and commercialization of generative artificial intelligence revolves around copyright infringement and what, exactly, constitutes "fair use."

As lawsuits alleging copyright infringement in both the input and output of generative AI models continue to stack up, the position of the bulk of the tech sector, perhaps represented best by OpenAI, boils down to a simple opinion: that it is fair for tech companies to train their commercially available, lucrative models — without permission, credit or compensation — on publicly available content. 

At the same time, OpenAI has been exploring content licensing deals with publishers, including Axel Springer, the parent company of Business Insider. And according to a new report from 404 Media, OpenAI is on the verge of closing a deal with a new customer: Automattic, the company behind Tumblr and WordPress.com. 

Related: OpenAI accuses New York Times of paying someone to hack ChatGPT

It remains unclear exactly what kind of content will be included in the licensing deal, as well as when the deal might occur or the price tag behind it. 

404 reviewed internal documents that showed that an initial data dump, which compiled a list of Tumblr's content between 2014 and 2023, included a number of things that should not have been included, such as private posts on public blogs, posts on deleted blogs, and explicit posts. 

Automattic did not clarify whether this compilation of data was sent to OpenAI. 

The company did not respond to questions regarding the type of content included in the deal. TheStreet additionally asked whether self-hosted sites on WordPress (separate from WordPress.com) would be included in the data sale. 

More deep dives on AI:

Automattic did not respond. 

The company instead pointed TheStreet to a public statement that says that it currently blocks AI platform crawlers and will further allow users to opt out of sharing their content.

"We are also working directly with select AI companies as long as their plans align with what our community cares about: attribution, opt-outs, and control. Our partnerships will respect all opt-out settings," the company said. "We also plan to take that a step further and regularly update any partners about people who newly opt out and ask that their content be removed from past sources and future training."

Automattic will also reportedly be selling user data to Midjourney, an AI image generation company. 

Neither Midjourney nor OpenAI responded to a request for comment. 

Related: New platform seeks to prevent Big Tech from stealing art

Social media and artificial intelligence

Automattic is hardly the first platform to enter into a licensing deal with an AI company. 

The week before, Reddit went public with a deal it had signed with Google — worth around $60 million annually — to license its user content to, among other things, train Google's AI models. 

Meta  (META) has admitted that it used public posts on its platforms to train parts of its own AI models. 

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow