Page MenuHomePhabricator

CX2: <span class="Z3988"...
Open, MediumPublic

Description

Sometimes, CX2 creates completely useless span tags with many useless parameters but with just a "&nbsp;" as the text... Could you fix this ?

Example with Franz Ludwig Güssefeld on frwiki :

<span class="Z3988" style="display:none" title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rfr_id=info:sid/de.wikipedia.org:Franz+Ludwig+G%C3%BCssefeld&rft.atitle=Biographische+Notiz+von+Franz+Ludwig+G%C3%BCssefeld&rft.au=Franz+Ludwig+G%C3%BCssefeld&rft.btitle=Allgemeine+geographische+Ephemeriden&rft.date=1808&rft.genre=book&rft.place=Weimar&rft.pub=Landes-Industrie-Comptoir&rft.volume=Band+26">&nbsp;</span>

In addition to this huge span tag, there are other completely useless span tags :

<span>(NDB).</span> <span>Band</span>&nbsp;<span>7, Duncker & Humblot, Berlin 1966,</span> ISBN 3-428-00188-5<span>, S.</span>&nbsp;<span>289</span> <span>(</span><span class="plainlinks-print">[http://daten.digitale-sammlungen.de/0001/bsb00016325/images/index.html?seite=303 Digitalisat]</span><span>).</span>

Event Timeline

This is still happening, requiring cleanup by en.WP gnomes. Please make it stop. Here's an example from a few days ago:

https://en.wikipedia.org/w/index.php?title=History_of_the_far-right_in_Spain&oldid=929127143

Yes, please fix it : when will CX stop creating articles that requires gnomes to check each article and fix it ? This situation has been during for years

The history of Pedro Laín Entralgo article you linked: https://en.wikipedia.org/w/index.php?title=Pedro_La%C3%ADn_Entralgo&action=history tells that it was created in 2007. Why do you think that ContentTranslation created that article?

This is still happening, requiring cleanup by en.WP gnomes. Please make it stop. Here's an example from a few days ago:

https://en.wikipedia.org/w/index.php?title=History_of_the_far-right_in_Spain&oldid=929127143

as far as I can see this article is created using VisualEditor from scratch and not using translation. The articles created using ContentTranslation will have an edittag "contenttranslation"

Please let us know how these two issues are related to ContentTranslation. Thanks

Really? Look at the span tags. They say

span title="ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fes.wikipedia.org%3APedro+La%C3%ADn+Entralgo

and

span class="Z3988" title="ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fes.wikipedia.org%3AHistoria+de+la+extrema+derecha+en+Espa%C3%B1a

It is my impression that "CTX" is the content translation tool, but I could be wrong. The rest of this post proceeds on that assumption.

For the former article, the edit I linked to was made on 1 July 2019, as shown in the diff I linked; it does not matter when the article was created, it matters when the edit in question was made.

Edits created with the content translation tool do not have to be tagged with "contenttranslation" if they were copy-pasted from another location, such as the editor's user space in en.WP or another language's WP.

I don't know how this is happening, but a search for the relevant span tag should help someone track down the source of this bug. I hope that someone is listening, and that I am not just typing into the void. On the slim hope that someone out there is willing to work on fixing this problem and not just posting objections like the frustrating one above without doing any research at all, here's some volunteer research for you:

A search for the relevant portion of the span tag (currently 72 hits in en.WP article space, plenty of hits to begin to figure out how this problem is created):
https://en.wikipedia.org/w/index.php?sort=relevance&search=insource%3A%2Ftitle%5C%3D%5C%22ctx_ver%2F&title=Special:Search&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1

(Hint: look in the history of each article to determine when and how the span tags were added.)

A page created by the content translation tool, with the tags ContentTranslation ContentTranslation2 PHP7, in March 2019:
https://en.wikipedia.org/w/index.php?title=Bombing_of_Nuremberg_in_World_War_II&action=history

In the above page, translation of the German Template:Stadtlexikon Nürnberg resulted in the offending span tag. When you go to Special:ExpandTemplates in de.WP and expand that template, you get:

[[Michael Diefenbacher]], [[Rudolf Endres]] (Hrsg.): <cite style="font-style:italic">[[Stadtlexikon Nürnberg]]</cite>. 2.,&nbsp;verbesserte Auflage. W.&nbsp;Tümmels Verlag, Nürnberg 2000, ISBN 3-921590-69-8 ([https://www.nuernberg.de/internet/stadtarchiv/publikationen_einzeln_stadtlexikon.html online]).<span class="Z3988" title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rfr_id=info:sid/de.wikipedia.org:Spezial%3AVorlagen+expandieren&rft.btitle=Stadtlexikon+N%C3%BCrnberg&rft.date=2000&rft.edition=2.%2C+verbesserte&rft.genre=book&rft.isbn=3921590698&rft.place=N%C3%BCrnberg&rft.pub=W.+T%C3%BCmmels+Verlag" style='display:none'>&#160;</span>

As you can see, there is one of the offending span tags in that code. I can't explain it, and I don't see exactly how or if it relates to the content translation tool, but maybe this research will help one of the perceptive analysts or coders at the WMF to figure out which project, if not content translation, this bug should be assigned to. That useless span tag should not be brought into en.WP, from any language, when an article is translated to en.WP using the content translation tool.

It is my impression that "CTX" is the content translation tool, but I could be wrong. The rest of this post proceeds on that assumption.

No. CTX is not Content Translation(https://www.mediawiki.org/wiki/Content_translation). Also, English Wikipedia had restricted use of this tool- https://en.wikipedia.org/wiki/Wikipedia:Content_translation_tool

Is it possible to stop posting objections and start looking into this bug ?

This problem is not restricted to enwiki. I'm mostly working on frwiki, this problem is seen repeatedly for months or years, and nothing is done about it.
A quick search on frwiki : https://en.wikipedia.org/w/index.php?sort=relevance&search=insource%3A%2Ftitle%5C%3D%5C%22ctx_ver%2F&title=Special:Search&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1 yields more than 50 articles with this tag, while I have already fixed hundreds of them

Is it possible to stop posting objections and start looking into this bug ?

Please, keep in mind that here we have a common goal, improve our tools to help editors. I'd recommend not to generalize and focus the reporting on the specific issue. That always helps to prevent side conversations and contributes to keep people focused and motivated to solve the issue.

Templates are independent on each wiki, and they can produce many different problems. Even if those problems seem similar on the surface, they need specific investigation and resolution. Fortunately, once a specific problem is resolved, a test case is added to prevent future regressions. Your comments above may give the impression that problems persists without any progress, but I don't think that's the case. Only looking at the 32 bugs you reported about this tool, 15 have been closed already. We appreciate users reporting bugs because that helps the tool to improve, but we have many requests, and we need to organize our efforts.

One aspect that helps a lot to reduce the investigation time is to provide a specific example of the source content that produces the issue. For this ticket I extracted the problematic case in this page, and can be tested now with this quick link.
Translating the contents with Content translation (using Google Translate), resulted in the problematic tags when published:

  • {{Ouvrage|volume=Band 26}} <span>Dans:</span> [[Friedrich Justin Bertuch]] <span>(éd.</span> <span>):</span> <cite style="font-style:italic">Éphémérides géographiques générales</cite> <span>.</span> <span>Fabriqué par une société d'universitaires.</span> <span style="white-space:nowrap">bande <span style="display:inline-block;width:.2em">&nbsp;</span> 26</span> <span>.</span> <span>Landes-Industrie-Comptoir, Weimar 1808 (</span> [https://books.google.de/books?id=YN0BAAAAYAAJ texte intégral] &#x20; <span>dans Google Recherche de Livres).</span> <span class="Z3988" style="display:none" title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rfr_id=info:sid/de.wikipedia.org:Benutzer%3ACXTests%2FT218420&rft.atitle=Biographische+Notiz+von+Franz+Ludwig+G%C3%BCssefeld&rft.au=Franz+Ludwig+G%C3%BCssefeld&rft.btitle=Allgemeine+geographische+Ephemeriden&rft.date=1808&rft.genre=book&rft.place=Weimar&rft.pub=Landes-Industrie-Comptoir&rft.volume=Band+26">&nbsp;</span>

In Content translation, the template seems to no longer appear as a template, and it is decomposed in its HTML parts in the source (maybe similar to T216812 ). Which causes tags to be generated when the source elements are transferred into the translation:

Screenshot 2019-12-10 at 09.44.42.png (257×1 px, 92 KB)

See above mention, where VE was doing something similar. Perhaps CX needs the same or similar fix.

See above mention, where VE was doing something similar. Perhaps CX needs the same or similar fix.

Thanks for helping to connect the dots. This is worth investigating. Content Translation uses the same editing surface as Visual Editor so I would expect the issue to no longer happen in Content Translation if it was solved for Visual Editor. So maybe it is a slightly different issue or it was fixed in the VE specific codebase. Maybe @matmatex can comment on this.

Let me ping @matmarex since yours did not appear to go through.

I know nothing.

Although, some quick searching reveals that ctx_ver is something related to the OpenURL standard, and some futher searching reveals also that https://en.wikipedia.org/wiki/Module:Citation/CS1 generates markup with class="Z3988" and title="ctx_ver=Z39.88..." and so on.

If it appears in the wikitext, that is likely caused by users copy-pasting the citations from another article in read mode (T54091), or indeed bugs like T209493, but I can't guess whether that's the same issue as this. If it was, then it should be automatically fixed in CX too.

This could have also been caused by T273234.

Not likely if the description of that issue being about templates/modules starting with the tstyles tag, since it is only recently (maybe January or October) that the module has been tag first.

Same problem on bnwiki.

<cite class="citation web cs1" data-ve-ignore="true">[http://www.irna.ir/en/News/81827027/ "Princess of Rome released"]. ''www.irna.ir''. 5 November 2015<span class="reference-accessdate">. Retrieved <span class="nowrap">16 November</span> 2015</span>.</cite>

This article was created using content translation. VE didn't cause this.

This article was created using content translation. VE didn't cause this.

Content translation uses the editing surface from Visual Editor (with translation-specific tools on top). This is useful to reuse the capabilities of VE, but at times makes it hard to know whether a bug originates in the VE or the Content Translation codebases.