Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You mean the portfolio which, by definition to be in that dataset, was marked for indexing by robots.txt, and made available publicly to unauthenticated GET requests on the internet?


Amazingly enough nobody made choices about how public they wanted their work to be that anticipated "someone scraping the entire internet and putting it into a giant neural network". This whole AI training thing may technically fall under fair use but it is right on the edge and begging for this very hazy edge to be made a lot sharper.


This seems no different than the controversy over GPL code being used to train Github Copilot. Just because it's publicly available and allows indexing doesn't say anything about the license it's released under.


When using GitHub, you grant GitHub a license to use your code for Copilot and other such things. This is not necessarily the same license that you might give others separately, such as MIT or GPL.


But that argument doesn't really work, since you can upload someone else's GPL'd code on GitHub. So, it's not really possible for a random person uploading old code to GitHub, to give them new rights that the original authors hadn't granted.


You mean my intellectual property that they are now charging to buy credits to rip off with a plagiarism algorithm?

All rights reserved. I didn't agree for them to use my IP commercially.


What intellectual property? Where in the Stable Diffusion model weights does your intellectual property exist?


The intellectual property they train an algorithm with and charge money for. This is some real mental gymnastics.


Interesting, so you've given due credit and/or monetary compensation to every single photographer you've learned from along the way? Yea, thought so.


Computers aren't people, so thinking they "learn" the same way or for the same reason is childish. In addition, only one of those things is capable of and held accountable for following the law.

It's the law that matters here.


> It's the law that matters here.

So - keeping this argument purely in terms of IP law - can you explain what the case is for these images to have been illegally used?



I'm impressed, never looked at another photo, just emerged from a blank room taking pictures having never seen another photo before. Just knew photography from the womb and came out with camera in hand knowing exactly how to work it. You are the one man who never had to learn a thing or rely on someone else or their prior works for training and/or inspiration.


I've been playing with DSLRs for twenty years. I just fuck around until I have some rudimentary understanding of it. Just like how I approach anything else, including playing instruments.


Yea, the only "mental gymnastics" here is the BS you're trying to pass us that your personal neural network has absorbed exactly zero input or observation from other creators/creations that influences (consciously or otherwise) it's ability to create and profit from works.


>your personal neural network

More bullshit equating the human mind to an AI. Newsflash, boyo - humans are not NNs, there is no reason legal precedent should treat humans the same as NNs (and it won't).


Strange ad hominem.

I capture what I see, more as a frame of reference for myself with location scouting and game dev. I take quite literally hundreds of thousands of photographs. It's 1 in every thousand that's worth sharing.

I have zero interest in other photographers, I can't even name one. Whatever other people do, neat. I don't even consider myself to be a "photographer", but I guess with hotels and airports buying prints, that somehow makes me a professional.

I capture what I like. There's literally zero other motivation behind it. Most of my income comes from elsewhere. I use a print-on-demand service.

If I take a photograph of a sculpture I've made and that is then incorporated into another work without permission, that's intellectual property theft. A lot of the images are of things I've made, not just street photography or landscapes that anybody could go and take.

If I write a riff in a microtonal scale and it turns up as a sample in a record, or is covered and released without permission, that's intellectual property theft too.


More than one person can come up with the same riff, right?

If someone is worried about an original image, why not watermark all uploaded versions?

If I walk about my neighborhood scattering paper copies of an image, and someone else picks one up to hang on their wall, did anyone break any laws (besides me littering)?


Who says they aren't watermarked? These watermarks are showing up in AI generated art, that's a documented fact.


So you're trying to assert copyright on something which isn't a copy of something? Again, where in the neural weights does your original work live?

EDIT: Let's consider a simpler scenario - I take an MD5 sum of one of your photos, and then hash collide a an image from my phone camera till I find a match. Which part of this process is "stealing"?


> Again, where in the neural weights does your original work live?

That's not the standard for copyright infringement. The standard is (1) access to the original work and (2) producing a work that is "substantial similar" to the original. If the user of an AI produces an output that is substantially similar to one of the AI's training images, then it could potentially infringe.

That's the law. In an actual case, a judge or jury would look at the original and decide if the alleged infringer is "substantially similar" enough. That's a crap shoot, so cases usually settle before that.


> producing a work that is "substantial similar" to the original.

So the question “where in the model is your original?” is a reasonable and relevant question to ask.

If this person can induce the model to produce a work that is reasonably considered a copy of their original, then fair enough. All they have to do is give a prompt and a seed and they can prove copyright infringement very easily because anybody else with similar hardware and software can demonstrate the infringement on demand. If I understand correctly, this has happened with GitHub Copilot, with Copilot reproducing copyrighted works verbatim.

But if they can’t do that… why should anybody take their claims of copyright infringement seriously? If nobody can point to copyright infringement having taken place, what basis is there for believing it has? As you say, the standard is access to the work and producing a work that is substantially similar to the other. The former has been demonstrated. People are asking about the latter.

So “where can we find the original in the model?” is possibly the single most relevant question there is. It’s a clear line that divides infringement from inspiration, and it can be proved definitively if copyright infringement has been observed to occur.


If there was an actual case, the question would be whether the defendant had access to the original work. If the original work was used to train the AI model, then the answer is yes. It's not necessary for the model to contain a copy of the original work.


That wouldn’t be the question because nobody is disputing the access to the original work. You mentioned two factors. That’s the first, which nobody is questioning. The thing people are questioning is the second factor, the reproduction of the original work.

A copy must be made for there to be copyright infringement. Who has shown a copy has been made? If a copy has been observed to occur, it’s trivial to demonstrate copyright infringement. Who has done this?


I may have misunderstood. I thought we were talking about an AI output that might be substantially similar to an original used to train the AI. If we're considering only the use of the original as a training image, then I don't know if that would infringe.


They’re asserting copyright over their work.

The part they’re concerned about is where somebody uses their work as a reference for derivative works.

We can acknowledge that this is an unsettled legal question without pretending like there’s no such question for a creator to raise. This community works better when we give each other fair understanding.


Copyright owners have the right to control when, where, how and by whom their content may be used. https://www.law.cornell.edu/uscode/text/17/106


This is very much not what your link says - quoting:

---

Subject to sections 107 through 122, the owner of copyright under this title has the exclusive rights to do and to authorize any of the following:

(1) to reproduce the copyrighted work in copies or phonorecords;

(2) to prepare derivative works based upon the copyrighted work;

(3) to distribute copies or phonorecords of the copyrighted work to the public by sale or other transfer of ownership, or by rental, lease, or lending;

(4) in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works, to perform the copyrighted work publicly;

(5) in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work, to display the copyrighted work publicly; and

(6) in the case of sound recordings, to perform the copyrighted work publicly by means of a digital audio transmission.

----

This is far from an unlimited grant of power. Of this list, the only plausible grounds is (2) - derivative works.

Which means we're well into then arguing about "Fair Use"[1], which I would encourage people to read the full description of carefully - because the answer isn't whether you can come up with a snippy "gotcha!" it's whether under careful consideration in the court of law anyone would be likely to agree with you.

[1] https://www.copyright.gov/fair-use/


None of this stuff is being used as fair use (criticism, comment, news reporting, teaching, scholarship, research, etc.) nor is it used verbatim -- a fair use requirement.

I really don't know why people bother bringing it up other than to create an uninformed distraction.


The part where you used my work commercially.

Seeing as you like analogies: if you play music in a public building, you need a license even if you are selling coffee and not the music.


Assuming you're referring to Stable Diffusion, the training set photos are not used commercially. Using them in training is non-commercial. If they were to share the training set, that would be commercial. However, the final product (the neural weights) does not include any of the training data and so there is no commercial use of your photos occuring.

It's the same as if I were to look at your photos and then take my own similar photos for the background of a web store. I would have "used" your photos in some sense. But I would not have used them commercially in any way whatsoever.

You have just as much grounds for complaint in Stability's case as you would in the case of me looking at your photos as reference for my own.


Commercial use isn't required for copyright infringement.


Regardless, there is no copyright infringement occurring when I use my memory of having seen someone else's copyrighted materials to produce my own wholly new but similar looking works.

Likewise there is no copyright infringement occurring when an ML model is trained on copyrighted works.

Stable Diffusion is a 5GB matrix of floating-point numbers trained on 240TB of data. It does not, and cannot, contain infringing data from the training set. It is physically impossible for it to contain such data. There is not, and cannot be, any infringement occurring.


> ... when I use my memory of having seen someone else's copyrighted materials to produce my own wholly new but similar looking works.

That's called "non-literal" copyright infringement. Plaintiffs win on the claims and appeals courts uphold the judgments. (See, e.g., the "Blurred Lines" case. https://www.nbcnews.com/pop-culture/music/robin-thicke-pharr...).

A person could infringe the copyright in an original image if they painted their own version by hand. The AI is basically irrelevant.


If I compress an image with JPEG, I won't get the exact same pixels values in the original image after decompressing because the compression is lossy, but I'm sure you'll agree that the original image is "there". How are NN weights different from the DCT coefficients in a JPEG file?


The NN weights can't recreate anything without input values - specifically a 512x512 grid of random noise, and a transformed textual prompt.

So there's almost a kilobyte of missing data, without which nothing is produced.

Would a JPEG file with random noise in it be potentially any given original image? Of course not - even though the decompressor is perfectly capable of recreating one given suitable input data.

The "Grokking Stable Diffusion" colab posted here a week or two back makes this particularly explicit - any given 512x512 image can be reversed into a latent-space encoding of that image, and reconstructed from it. The NN weights are necessary but not sufficient to do so - ultimately you end up with a 512 byte number mapping to the expression which reconstructs an image. But that includes images which aren't part of the original training set.

[1] https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wF...


You can pontificate all you want about what is contained in the model, the fact is a case related to infringement is going to hit an EU court soon enough and any sort of commercialization of those models will be banned. And good fucking riddance to the AI bros, the monkeys of engineering.


You seem to think it's all figured out, go ahead and try it. This isn't theoretical, everyone seems to think they definitely know how this will go and we sure could use some court clarity.

Slam dunk case right? What's stopping you?


I can't "try it" because I am not an injured party, me not being an artist that has published any image ever. That being said, the existing "relationship" between google and news publishers in the context of EU court decisions is already convincing enough about the direction this is going to go down (and google's usage of news didn't even have to cost a single person their job, unlike how this will).


> "I stole your money and several other people's money and then put it into a bank account. Tough luck! It's too late, you are unlikely to prove which of the dollar bills in account are yours!"

Sort of falls flat.


I don't think this analogy holds up.


If someone's copyrighted material was used to train a model, what analogy would you use?


You might need to explain the analogy in a bit more detail because currently I have absolutely no idea what you are getting at.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: