Page MenuHomePhabricator

{{subst:REVISIONUSER}} no longer substitutes into the current user name, but the username of the last revision
Closed, ResolvedPublic

Description

Observed in https://commons.wikimedia.org/wiki/User_talk:Zhuyifei1999#User%3AFlickreviewR_2

In this edit the edit used {{FlickreviewR|[...]|reviewer={{subst:REVISIONUSER}}}}. This substitutes into {{FlickreviewR|[...]|reviewer=Tyler ser Noche}} rather than {{FlickreviewR|[...]|reviewer=FlickreviewR 2}}

To reproduce:

Event Timeline

zhuyifei1999 triaged this task as High priority.

I'll try to figure out the cause.

This was a regression probably introduced in 1.32/wmf.20, considering it worked as expected a day ago.

Looking at the changes of wmf.20 I see gerrit #456595, but with no related ticket attached. @daniel @Tgr could you comment on what happened?

zhuyifei1999 raised the priority of this task from High to Needs Triage.
zhuyifei1999 updated the task description. (Show Details)

Thanks. I'm not familiar with that so unassigning.

Change 458251 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Provide new, unsaved revision to PST to fix magic words.

https://gerrit.wikimedia.org/r/458251

Change 458526 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Require a Title as context for rendering.

https://gerrit.wikimedia.org/r/458526

Krinkle triaged this task as Unbreak Now! priority.EditedSep 6 2018, 5:22 PM
Krinkle subscribed.

Due to substitution happening as part of pre-save expansion, this is permanently altering the content of a revision in the database. It is not a problem in the viewing of revisions, it is in the pre-save parsing/saving of revisions, which means fixing the bug will not restore the corruption in the database.

The impact is that bots and users substituting values about themselves are instead inserting unexpected values about other edits/people. For example, I typed Greetings, -- {{subst:REVISIONUSER}} on testwiki at https://test2.wikipedia.org/w/index.php?title=Target_page_name&type=revision&diff=380678&oldid=79439 but it got saved as Greetings, -- A915 instead of Greetings, -- Krinkle

For User-notice: This week, on group1 from Wednesday and on group2 as of Thursday (today), the subst: syntax in wikitext was misbehaving and causing edits from users (including bots and gadgets) to be saved with a different expansion than what was intended. This means for example that when you wrote {{subst:REVISIONUSER}}, the edit was saved in the history with the name of previous user who last edited it, instead of your name. This has likely caused hard to detect problems depending on the workflows local communities have. On Wikimedia Commons, it caused FlickreviewR actions to be attributed to an unrelated person instead of the administrator or bot that reviewed the file's copyright status.

Because the use of subst: happens before the edit is saved in the database, it is not possible to know from the history which edits used subst and which not. As such, it is also not possible to know which edits were affected.

Please carefully review the last two days of edits relating to workflows in your communities that you know involve the use of subst: syntax with magic words relating to revisions (REVISIONUSER, REVISIONYEAR etc.), and adjust the content in a follow-up edit where needed.

Change 458251 merged by jenkins-bot:
[mediawiki/core@master] Provide new, unsaved revision to PST to fix magic words.

https://gerrit.wikimedia.org/r/458251

Change 458545 had a related patch set uploaded (by Jforrester; owner: Daniel Kinzler):
[mediawiki/core@wmf/1.32.0-wmf.20] Provide new, unsaved revision to PST to fix magic words.

https://gerrit.wikimedia.org/r/458545

This is not a snarky comment, but really: has no one tested that things like this shouldn’t happen? Can we instruct people to thoroughly test major changes like these so that this wouldn’t happen again?

Mentioned in SAL (#wikimedia-operations) [2018-09-06T18:12:05Z] <krinkle@deploy1001> Synchronized php-1.32.0-wmf.20/includes/: I31a97d0168 - T203583 (duration: 01m 13s)

Change 458545 merged by jenkins-bot:
[mediawiki/core@wmf/1.32.0-wmf.20] Provide new, unsaved revision to PST to fix magic words.

https://gerrit.wikimedia.org/r/458545

This is not a snarky comment, but really:

How communications come across are for one's audience to judge, not the speaker.

has no one tested that things like this shouldn’t happen? Can we instruct people to thoroughly test major changes like these so that this wouldn’t happen again?

Most of this part of MediaWiki has no unit tests. How it is "meant" to work is held in the minds of a few thousand users from their experiences. When re-writing it from scratch, sometimes some ultra-edge-cases get missed, like this one (all the REVISION* magic words are dark magic which are abused significantly). There was a great deal of testing, but sadly this was missed. The good news is that this area is now extensively covered in tests.

As an aside its too bad we cant kill this behaviour. Its a privacy risk.

As an aside its too bad we cant kill this behaviour. Its a privacy risk.

In preview or on save or both? I doubt that removal of this behaviour in previews would’ve been noticed, but most people would expect and find useful to have this feature working when saving.

has no one tested that things like this shouldn’t happen? Can we instruct people to thoroughly test major changes like these so that this wouldn’t happen again?

Most of this part of MediaWiki has no unit tests. How it is "meant" to work is held in the minds of a few thousand users from their experiences. When re-writing it from scratch, sometimes some ultra-edge-cases get missed, like this one (all the REVISION* magic words are dark magic which are abused significantly). There was a great deal of testing, but sadly this was missed. The good news is that this area is now extensively covered in tests.

Yeah, just read https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/451025/. I've been following MediaWiki development for some 10 years, and i still can't follow half of what is discussed and changed there. No one is playing loose here. It is unfortunate that so many people still don't understand the years of quicksand that MediaWiki is built on and the enormous work that is being done to retroactively repair that. Things break at times, there is no perfection.

Yeah, just read https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/451025/. I've been following MediaWiki development for some 10 years, and i still can't follow half of what is discussed and changed there. No one is playing loose here. It is unfortunate that so many people still don't understand the years of quicksand that MediaWiki is built on and the enormous work that is being done to retroactively repair that. Things break at times, there is no perfection.

I was going about this knowing that there is Beta Cluster, that there are projects that get and get test new versions earlier than others. If this feature was meddling in major way (as I understand) with revision information, it’s not bad to expect (sorry for hindsight 20/20) to test it with some of hands-on functions that used that revision information to test if all of this still works as intended. I didn’t think that magic words were that undocumented as people are trying to say here.

Declaring this Resolved.

As an aside its too bad we cant kill this behaviour. Its a privacy risk.

In preview or on save or both? I doubt that removal of this behaviour in previews would’ve been noticed, but most people would expect and find useful to have this feature working when saving.

In preview. Once you save a page your name is in the history page anyways

has no one tested that things like this shouldn’t happen?

Not with an edit done by a different user from who did the previous edit, and that's what triggers the bug. (Or I guess it always gets triggered but as long as you use a single user for testing, you end up with the right username by accident.)

I didn’t think that magic words were that undocumented as people are trying to say here.

Substing REVISION* magic words shouldn't work in theory - they refer to the properties of a revision that doesn't even exist yet at the point in time when the substitution happens. As it turns out there is a hack in the parser to make some of them work. Poorly documented hacks tend to break during refactoring - we had 2-3 unexpected bugs for each one of the major MCR refactoring patches that went out, despite extensive testing. I don't think that's avoidable.

Maybe we could alter the deploy cadence somehow, to catch these things before they reach the big wikis - we did that for one big patch (PageUpdater IIRC) and that worked pretty well. Also more community members watching Regression tasks and pointing out the potential impact would help - this bug was noticed yesterday so probably things would have gone smoother if we realized in time it should be a train blocker for group 2.

As an aside its too bad we cant kill this behaviour. Its a privacy risk.

Replacing {{REVISIONUSER}} with something like <your username here> seems not too confusing for editors, and should not be hard to implement. Doing that for subst is more trouble (it means PST would have to be done twice for content that varies on the user) but not entirely outside the realm of possibility. If the risk is serious, worth filing a task about it - everything is getting refactored anyway so this is a good time to change behavior like that.

Privacy risk => because a cross-site request can do a GET to an action api endpoint with {{subst}} and get back the current logged in user? I'm guessing here. Maybe @Bawolff can elaborate.

This is not a snarky comment, but really:

How communications come across are for one's audience to judge, not the speaker.

I'm not saying this is necessarily better, but my usual habit for stuff like that is to finish with something along the lines of "apologies if this came off as snarky; that wasn't my intention": it makes no assumptions about how the comment might be received/interpreted, while recognizing that the interpretation could be one I don't want.

Privacy risk => because a cross-site request can do a GET to an action api endpoint with {{subst}} and get back the current logged in user? I'm guessing here. Maybe @Bawolff can elaborate.

You can't do logged-in requests without being on the CORS whitelist (or to be more precise you can but the calling code won't have access to the response). In any case, probably best not to discuss vulnerabilities in a public task.

@Tgr agreed, but I was assuming @Bawolff's issue was just a risk, not an actual vulnerability; I was just putting forward a strawman guess at what he was thinking. Maybe @Bawolff can create a new issue for the privacy risk (security-tagged if it's a vulnerability), and we can discuss whether or how we might mitigate them or deprecate the REVISION* magic words independent of this particular MCR regression.

@Tgr agreed, but I was assuming @Bawolff's issue was just a risk, not an actual vulnerability; I was just putting forward a strawman guess at what he was thinking. Maybe @Bawolff can create a new issue for the privacy risk (security-tagged if it's a vulnerability), and we can discuss whether or how we might mitigate them or deprecate the REVISION* magic words independent of this particular MCR regression.

Private task at T134713

So is this fixed or not? What is the outcome?

@Petrb: It's fixed as per T203583#4563844, and should work as before. We now have regression tests that should prevent this from breaking in the future.