Mastodon updates terms of service to ban AI model training on user data

Lee Duna@lemmy.nz · 1 month ago

Mastodon updates terms of service to ban AI model training on user data

it_depends_man@lemmy.world · 1 month ago

Mastodon dot SOCIAL did, the big public instance. Mastodon the software doesn’t have these restrictions.

Scrollone@feddit.it · 1 month ago

It wouldn’t even make sense for the Mastodon software to have such a restriction… The article title is misleading.

ShinkanTrain@lemmy.ml · 1 month ago

AI scrapers:

Fizz@lemmy.nz · 1 month ago

Yeah this will do absolutely nothing.

Cris@lemmy.world · 1 month ago

I agree, but I’m glad they did it anyway.

Fizz@lemmy.nz · 1 month ago

Fair, there is no reason not to.

SmolSteely@lemmynsfw.com · 1 month ago

It does provide for the possibility of future legal action. This should have been done a year or two ago

Dr. Moose@lemmy.world · edit-2 1 month ago

No it doesn’t because all mastodon data is public and does not require ToS agreement to be collected.

Mastodon could only argue damages but that would be impossible to litigate in any extent due to decentralized and free nature of Mastodon and Fediverse. Except for some backward countries like China or Japan where there’s no information freedom protections and any corporation can sue you for damages for any information infringement (even if it’s not yours).

This is a good thing. Mastodon shouldn’t control anything related to the legality of data flowing in the fediverse - that’s the entire point.

Rose@slrpnk.net · 1 month ago

The way copyright law works, by default you don’t have any right to make use of anything, even if it’s posted publicly. Why do people allow Fediverse platforms to do the thing they do? Leniency on their part.

Gathering data from Mastodon for AI training is technically feasible, but that doesn’t mean it’s legally justified. Many people will object to that. Many already do!

Dr. Moose@lemmy.world · 1 month ago

No that’s not how copyright works. Copyright prohibits distribution not copying.

Rose@slrpnk.net · 1 month ago

Er, yes, my point was copyright very much concerns what you’re allowed to do with data. But that goes beyond distribution. Derivative works are a complicated topic.

My point stands, whether you technically can copy stuff has no bearing on whether you’re allowed to use it and for what purpose.

Dr. Moose@lemmy.world · 1 month ago

Well it depends on the use. If its a movie that I copied then I can watch it, if it’s a picture I can print it and put it on a wall at my home. Even AI training currently its considered to be entirely legal to train on copyrighted data. You can even parse copyrighted data for analytics which is entirely legal as well.

So you can do a lot with copyrighted data without breaching the copyright, including AI training as it’s the article topic.

Rose@slrpnk.net · 1 month ago

Private use of the copyrighted works is pretty much a separate topic entirely.

And while the law isn’t settled on the topic, it’s wrong to argue AI training is something that happens entirely in a private setting, especially when that work is made available publicly in some form or another.

Sure, there’s a problem with the current copyright laws that has to be addressed. It’s quite similar to the “TiVo loophole” in OSS licenses. It was addressed, and certainly not in favour of the loophole exploiters. That one could be fixed on licence level because it was ultimately a licence question, but the AI training question, however, needs to be taken to the legislation level. Internationally, too.

Max@lemmy.world · edit-2 1 month ago

I don’t think this is true. While copying might fall under fair use if used for some purpose, you definitely can get in trouble for copying even without distributing those copies.

For example, you can’t rent a library book and then photocopy the whole thing for yourself

Dr. Moose@lemmy.world · edit-2 1 month ago

Those are entirely different laws you’re thinking about like DMCA, EUCA, database protection laws (yeah lol it’s a real thing) etc. Copyright on its own is about distribution.

That being said data law is really complex and more often than not turns to damage proof rather than explicit protections. Basically its all lawyer speak rather than an actual idealistic framework that aims to protect someone. This is primary argument why copyright is a failed framework because it’s always just a battle of lawyers and damages.

Max@lemmy.world · edit-2 1 month ago

I still don’t think this is correct for two reasons. 1: I believe the DMCA and friends count as copyright law. 2: just reading the text of the law (#17 U.S. Code § 106):

Subject to sections 107 through 122, the owner of copyright under this title has the exclusive rights to do and to authorize any of the following:

(1) to reproduce the copyrighted work in copies or phonorecords;

(2) to prepare derivative works based upon the copyrighted work;

(3) to distribute copies or phonorecords of the copyrighted work to the public by sale or other transfer of ownership, or by rental, lease, or lending;

(4) in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works, to perform the copyrighted work publicly;

(5) in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work, to display the copyrighted work publicly; and

(6) in the case of sound recordings, to perform the copyrighted work publicly by means of a digital audio transmission

It seems pretty clear that only the copyright owner has the rights to make copies, subject to a number of exemption.

Now IANAL so I could be missing something pretty huge, but my understanding was that this right to make copies (especially physical ones for physical media) is at the core of copyright law. Not just the distribution of those copies (which is captured by right 3)

ideonek@piefed.social · 1 month ago

I think that the point is that instances can choose thier own rules. Article is about an instance. Not about the entire platform.

CrocodilloBombardino@piefed.social · 1 month ago

That is not true. The terms of service cover “your access and use of Server Operator’s (“Administrator”, “we”, or “us”) instance”. Access includes reading data from the server.

Dr. Moose@lemmy.world · edit-2 1 month ago

No, there are several types of legal agreements on the web in this particular case there’s:

click wrap where the visitor must explicitly agree with terms of service by clicking a button - that’s what you see when you register an account.
browse wrap where the visitor implicitly agrees with ToS by just browsing the web.

The former is enforcable while the latter is almost impossible to enforce in free western countries because you just cannot agree with something just by browsing a public space as that’d be crazy.

Capricorn_Geriatric@lemmy.world · 1 month ago

No it doesn’t because all mastodon data is public and does not require ToS agreement to be collected.

ToS are legalese bullshit. They mean next to nothing since most stuff if it comes to court, gets annuled.

ToS kind of does protect you, but holding tge service hostage or not (as in you can’t watch one little youtube video without selling your soul to Google) doesn’t make a big difference - rrasonable expectations are that users own their content (as is the case in youtube’s case - youtube doesn’t ponce on your videos afaik), although they do own rights to distributing it (obviously), and using sane technological measures to prevent what they don’t want. In youtube’s case that’s watching e.g. privated videos, and in another case it can be AI scrapers.

Robots.txt is, just like a ToS, a contract. It just isn’t legalese as it isn’t meant to scare people, but be useful to programmers making the site and those using the scraper. They’re programmers, not marketers or lawyers, of course they won’t deal with legalese if they csn avoid it.

Again, law is not leagese.

A robots.txt file is a contract by use,like when you park in a charge zone - entering the zone, you accept the obigation to pay.

When you scrape a site you first check for robots.txt in all the reasonable places it should be, look for its terms, and follow them… If you don’t want to riskgetting sued.

Similarily, entering a store, you are expected to pay for what you take. There is no entry machine like on a metro where you, instead if swiping a card, read the store’s T&C’s, but know that it’s common sense security will come after you, if not the police. Yet you clicked no “I agree”? How come you don’t just take what you want?

And robots.txt is a mature technology and easily a “standard”. Any competent lawyer will point that out to the jury and judge, who will most likely rule appropristely. The Internet is not the Wild West anymore.

Dr. Moose@lemmy.world · 1 month ago

Listen man I’ve been working with web scraping for years though now I do the exact opposite (anti bot tech) and robots.txt is absolutely meaningless and there’s zero precedent in the US or elsewhere of it doing anything but providing web crawlers a map of your web site.

I can tell you the thing we tell to all of our clients - the only way to sue bots is to sue for direct damages not for automation. This has always been true and will continue to be true for foreseeable future in the US because you its impossible to set a precedent here as there are just too many players involved that benefit from web automation.

You can actually check out:

Meta v. Bright Data
hiq labs v. inkedIn

These cases are very recent and huge in web automation community and went all the way to the Ninth Circuit and settled at Supreme Court in favor of bots.

I’m telling you man copyright is so ruined that it’s really just a machine for feeding middle managers and lawyers. But hey it gives me a great job security and I can afford to work on actual free software which as you might know is invredibly hard to fund otherwise!

Ulrich@feddit.org · edit-2 1 month ago

It potentially gives them grounds for a lawsuit. Probably not but potentially. There’s no reason not to explicitly deny permission. They have everything to gain and nothing to lose.

fmstrat@lemmy.nowsci.com · edit-2 1 month ago

Gives them legal standing against scraping for if it is needed in the future.

cmgvd3lw@discuss.tchncs.de · 1 month ago

Why?

anothermember@feddit.uk · 1 month ago

That’s a really misleading headline; a Mastodon instance has done this, Mastodon as a whole can’t do this because it’s free software, it can be used for any purpose.

froufox@lemmy.blahaj.zone · 1 month ago

I’m wondering, is it possible to include that restriction in public license for the software mastodon?

anothermember@feddit.uk · 1 month ago

It wouldn’t be a free software licence by the FSF definition (rule zero). Of interest the FSF rejects the original JSON licence because it contains the clause “The Software shall be used for Good, not Evil.” Since Mastodon uses AGPL, it wouldn’t be compatible.

trevor (he/they)@lemmy.blahaj.zone · 1 month ago

This is why I hope to see rule zero get shit-canned. It’s a naive vestige from a time long before we hit late-stage capitalism. Corporate interests have slithered their way into every facet of our lives and we should be working to make software that we write hostile to their practices as much as we can.

If that means that the organizations that have a stranglehold on Open Source™️ don’t like it, so be it. We can follow in the spirit of open source without the naivety or captured interests of organizations that define the arbitrary terms by which we categorize software licenses.

anothermember@feddit.uk · 1 month ago

It just means that the decision comes down to the instance owner not the software developer, which I think is right. Everyone should be able to decide what their computer does, that’s important to hold on to.

Chloé 🥕@lemmy.blahaj.zone · 1 month ago

this reminds me of the Hippocratic License, which comes with a bunch of modules restricting the use of software based on ethical considerations (for example, there’s a module forbidding the use by police, and another one forbidding the use by any institution on the BDS list)

i think the FSF, in their eternal and unchallengeable wisdom (/s), also declared that it wasn’t foss

xor@lemmy.blahaj.zone · 1 month ago

I mean, they’re right that it’s not FOSS - the F is free as in available to anybody who may wish to use it, which is incompatible with defining who is allowed

ColdSideOfYourPillow@piefed.social · 1 month ago

the Hippocratic License

Interesting link, thanks for the discovery!

Melmi@lemmy.blahaj.zone · 1 month ago

This is interesting! I’ve been exploring this and it seems like a neat little license.

I’m not a lawyer, but one funny edge case I noticed is that the Extractive Industries module seems like it makes it a breach of license for crystal shops to use your software since you’re involved in the sale of minerals.

I would tend to agree with FSF that it’s not FOSS, though. There are so many restrictions on this license and who can use it, based on fairly arbitrary things like “if CBP claims you’re doing forced labor” or “you do business in this specific region”. It might be more moral, but it’s a different approach than FOSS, which is less restrictive than more and prioritizes “Freedom” above everything else. Maybe it’s time for a different approach, though?

froufox@lemmy.blahaj.zone · 1 month ago

cool, didnt know about this nuance. based JSON license by the way.

ms.lane@lemmy.world · 1 month ago

WTFPL is the patricians choice.

rumba@lemmy.zip · 1 month ago

Wait, they changed the TOS on a site to say that you can’t scrape it, when the entirety of the site is available without agreeing to the TOS?

Ascend910@lemmy.ml · 1 month ago

D06M4@lemmy.zip · 1 month ago

This was one of the few ToS updates I was actually glad to read. ToS changes usually mean a company is slowly rephrasing them to fuck us over.

daniskarma@lemmy.dbzer0.com · 1 month ago

I wonder how does that work with federation.

If a second instance does not have that restriction, is there any “legal” effect on the federated content?

Suavevillain@lemmy.world · 1 month ago

It is better than nothing even if it is hard to enforce.

Net_Runner :~$@lemmy.zip · 1 month ago

Just like when mastodon.social condemned Meta for their horrible moderation decisions and inability to act properly in the interest of its users, and said that the instance would be cutting ties/not federating with Threads, they kept on federating like nothing happened.

I don’t believe anything coming out of mastodon.social unless I can see action being taken with my own two eyes.

Also, blocking scrapers is very easy, and it has nothing to do with a robots.txt (which they ignore).

lazynooblet@lazysoci.al · 1 month ago

How is blocking scrapers easy?

This instance receives 500+ IPs with differing user agents all connecting at once but keeping within rate limits by distribution of bots.

The only way I know it’s a scraper is if they do something dumb like using “google.com” as the referrer for every request or by eyeballing the logs and noticing multiple entries from the same /12.

rumba@lemmy.zip · 1 month ago

Exactly this, you can only stop scrapers that play by the rules.

Each one of those books powering GPT had like protection on them already.

Ulrich@feddit.org · 1 month ago

blocking scrapers is very easy

The entirety of the internet disagrees.

andypiper@lemmy.world · 1 month ago

and said that the instance would be cutting ties/not federating with Threads,

Can you please show exactly there this was said?

mintiefresh@lemmy.ca · 1 month ago

Well done Mastodon.social.

Even if it may do much, it’s still better than not doing it.

Cocopanda@lemmy.world · 1 month ago

Looks like I’m joining Mastodon officially.

Scrubbles@poptalk.scrubbles.tech · 1 month ago

It’s honestly not bad, definitely the most mature fediverse service

LainTrain@lemmy.dbzer0.com · 1 month ago

I will create a masto instance where this is mandatory to counter balance

papigkos@lemmy.wtf · 1 month ago

Failing to train an AI model using your posts as part of the training data within 7 days of posting will result in a permanent ban.

Phegan@lemmy.world · 1 month ago

Ricky Rigatoni@retrolemmy.com · 1 month ago

Terms of Service are a joke and not legally binding. This is just a useless feel good motion.