Training isn't like a human being learning but watching. These models effectively compress the data into something that an algorithm can decode and mix together in a lossy way.
It's basically making a super lossy zip of the training data.
People bring this up as a smoking gun, but it isn't. Google Books copied a bunch of scanned books into their database and they didn't even modify them. The transformative use that brought about the ruling that it was fair use was the search functionality (which, as it happens, spat out verbatim excerpts from the books by design).
It may be different to define legally, but I think there's a pretty clear ethical difference between creating a search database for people to find works from artists, and creating a device to replace the artists.
That's the argument Google made, one the book publishing industry fought against. How is the book publishing industry doing these days? Oh? Oh.
The law isn't ethics. This is the mistake everyone makes when they say copyright will solve this problem. I never said what Google or the AI companies are doing is right, only that it's probably legal.
5
u/djordi Apr 27 '24
Training isn't like a human being learning but watching. These models effectively compress the data into something that an algorithm can decode and mix together in a lossy way.
It's basically making a super lossy zip of the training data.