Unauthorized AI Training: South African Artists' Debate

Fans of South African music are realizing that tracks from some of the nation’s most prominent artists have been included in datasets associated with generative artificial intelligence systems, prompting renewed discussions about copyright and the issue of consent.

This surge in interest followed the release of a searchable database by The Atlantic that enables users to investigate works featured in various major AI training datasets. This tool allows individuals to search for musicians, writers, actors, and directors to see how their work is represented.

The Atlantic has published a searchable database of music used by tech companies to train generative AI models. Out of curiosity, I went on the platform and searched for a couple of South African artists and wow, this is a much bigger issue than many people realise.

— Lindiwe Dhlamini (@Lisher_Rayze) June 20, 2026

An inquiry into the work of South African DJ and producer Black Coffee generated 165 entries across four datasets, including 117 songs from the LAION-DISCO 12M dataset. The findings indicate that while a piece’s inclusion in a dataset does not confirm its use for training an AI model, it implies its collection for potential utilization by AI companies.

Unauthorized AI Training: South African Artists' Works at the Center of Controversy

<pScreenshots documenting these searches have begun to spread on social media, surprising users with the amount of South African content identified in the databases.

The controversy has already made its way into legal discourse.

Author Zakes Mda is one of the writers whose works are associated with datasets involved in ongoing legal actions against the AI firm Anthropic. Reports have highlighted titles such as The Heart of Redness, The Whale Caller, and The Madonna of Excelsior as having been utilized without authorization for training AI models.

This matter is part of a larger international copyright conflict involving authors, musicians, and publishers clashing with AI firms over the permissibility of using copyrighted works for training generative AI systems without explicit consent.

As databases become more accessible for the public to investigate artistic works, issues regarding ownership, consent, and fair compensation are increasingly becoming topics of public debate, moving beyond just legal discussions.