I recently discovered that some of my books are being used illegally to “train” AI models:
This story in Books + Publishing explains what’s at stake:
The Australian Society of Authors (ASA) has responded to the large-scale unauthorised use of books by Australian authors in a dataset used to train generative artificial intelligence.
The training dataset, known as the Books3 corpus, is understood to contain at least 183,000 books downloaded from pirate sources. After the Atlantic published a search tool that allows authors to search for their books in this dataset, the ASA has ‘heard from many dismayed authors who had no idea their works had been used without permission’. Australian authors who have posted on social media about finding their titles using the search tool include Clare Wright, Gabrielle Wang, Meg Mundell, Antony Loewenstein, Jodi McAlister, Laura Jean McKay, Elias Greig and Catriona Menzies-Pike.
While generative AI technology relies upon books, journals, essays, and scripts written by authors, as well as images and artwork created by artists, the ASA notes that ‘permission was not sought nor compensation granted’, while generative AI products ‘have already been released onto the market and monetised’. Said the ASA: ‘Authors appropriately feel outraged.’
ASA CEO Olivia Lanchester said: ‘This issue is one of basic fairness. […] Tech companies will charge the end user of their products but will not pay the labour that enabled it. It’s like paying the supermarket for your fruit and vegetables without any of that revenue going back to the farmers who grew the produce.’
Lanchester expressed concern at the ‘enrichment of a few powerful companies at the cost of thousands of individual creators’. ‘This is not how a fair market functions. Writers and artists are real people who bring us joy, give our lives meaning and deserve dignity and fair payment for their very real work.’
‘To be clear, we are not anti-tech and we support emerging technologies but feel there has been a missed opportunity to develop artificial intelligence ethically; with transparency, permission and payment, unlocking new opportunities for our creative industries. Instead, authors and artists are being locked out of the AI boom. It’s not too late to turn this around and move to appropriate licensing.’
The response comes as class-action lawsuits from authors are pending against OpenAI, Meta, and Google. The ASA shared plans to ‘write to AI companies to express our serious concerns and to request action, and continue advocating on behalf of authors to the Federal Government’. The organisation recommends that authors look at the US Authors Guild’s advice here if they find their books within the Books3 dataset.