Wiktionary:Beer parlour/2013/January

Vote on script names as language headers

I've created a vote for this proposal: Wiktionary:Votes/pl-2013-01/Allow script names as L2 section headings. Feedback is welcome! —CodeCa t 01:48, 1 January 2013 (UTC)[reply]

Same goes for the favicon vote, which I didn't announce very well: Wiktionary:Votes/2012-12/New favicon. —Μετάknowledge^{discuss/deeds} 02:53, 1 January 2013 (UTC)[reply]

This is a good thing. And the vote for the logo? I think that the logo change still awaits its vote here. Lmaltier (talk) 21:25, 3 January 2013 (UTC)[reply]

I'm planning to get to that. I just thought I'd see how divided we are on the favicon issue before I bring up the logo wars again. —Μετάknowledge^{discuss/deeds} 21:26, 3 January 2013 (UTC)[reply]

Blend or Compound?

I am currently doing a German project, and am coming across many words formed from two others joined together (e.g. (deprecated template usage) Gaswolke from (deprecated template usage) Gas + (deprecated template usage) Wolke. Should I be using the {{blend}} or {{compound}} templates in their etymology sections? SemperBlotto (talk) 12:08, 1 January 2013 (UTC)[reply]

I think that is pretty much a standard example of a compound. Why do you think it would be otherwise? —CodeCa t 13:48, 1 January 2013 (UTC)[reply]

No overlapping letters, so it's a compound. Mglovesfun (talk) 15:26, 1 January 2013 (UTC)[reply]

But how exactly is (deprecated template usage) Gaswolke idiomatic? Isn't it just SOP? —Μετάknowledge^{discuss/deeds} 16:03, 1 January 2013 (UTC)[reply]

It's a word. "all words in all languages" SemperBlotto (talk) 16:04, 1 January 2013 (UTC)[reply]
Oh no. I'm pretty sure we've had discussions about this, and the conclusion went otherwise. I'll see if I can dig up something. —Μετάknowledge^{discuss/deeds} 16:20, 1 January 2013 (UTC)[reply]
Slash that, we never reached anything approaching consensus, just more arguing. Maybe I'll RFD it to see what happens. —Μετάknowledge^{discuss/deeds} 16:20, 1 January 2013 (UTC)[reply]

There are 1,873 words in Category:German compound words. Are you going to RfD all of them? SemperBlotto (talk) 16:35, 1 January 2013 (UTC)[reply]
Not the idiomatic ones. —Μετάknowledge^{discuss/deeds} 16:49, 1 January 2013 (UTC)[reply]

There have been attempts to delete SoP single word entries for (Talk:Zirkusschule for example). To the best of my knowledge, all such efforts have failed, so you won't see me nominating any of these. Mglovesfun (talk) 18:50, 1 January 2013 (UTC)[reply]

Talk:Plastikschwanz and Talk:Sportlerherz are two more examples, and the former led to a long discussion in the BP. Anyway, in answer to the original question: as the others have said, these are compounds; an example of a German blend ("Kontamination") is "Schiege", a mix of "Schaf" and "Ziege". - -sche (discuss) 19:06, 1 January 2013 (UTC)[reply]

In German, such compounds are considered as words. Therefore, there is no reason to delete them when somebody found useful to create the page and the use of this word is attested. As an example, de:Wasser provides a large list of such compounds (blue links as well as red links). Lmaltier (talk) 21:20, 3 January 2013 (UTC)[reply]

Purpose of RFDO

In a recent discussion with Codecat, it came up that there are apparently different ideas on what the purpose of Wiktionary:Requests for deletion/Others is. Codecat's understanding of RFDO seems to be that templates or categories should be sent there after being deprecated and orphaned/emptied, and there can be a quick "making sure that nobody has a problem with deleting it" before the template is deleted, ~~and that a template still being in use is a good reason to vote "keep"~~. My own view is that templates and categories should be deprecated and orphaned/emptied as a result of an RFDO, and then deleted after that is complete, and that orphaning/emptying before deleting is implied in "delete" votes. I think we should try to get consensus about what exactly RFDO is for. --Yair rand (talk) 16:48, 1 January 2013 (UTC)[reply]

I'm sure she will clarify her own position, but I didn't read CodeCat's comments as thinking "keep, still in use" was a "good" rationale. - -sche (discuss) 17:27, 1 January 2013 (UTC)[reply]

Indeed, I never meant to say that it's a good reason. Just that I noticed it had been used in the past to successfully contest a deletion. I don't remember which, though. —CodeCa t 17:34, 1 January 2013 (UTC)[reply]

Sorry about that mistake. :( --Yair rand (talk) 17:39, 1 January 2013 (UTC)[reply]

Yair's corrected interpretation makes sense to me, and CodeCat seems to largely agree. We need some forum short of BP to challenge templates, whether or not in use and whether or not deprecated. Obviously, implementation of the deletion of a template requires that it no longer be in use, at least in mainspace and probably any place other than user pages and archive pages.

In practice, aren't there three outcomes of RFDO possible for templates: keep, delete (ie, orphan-and-delete), or deprecate. I thought "deprecate" means the template should not be used in anything other than userspace and can be replace-deleted except in archives and user pages. Is that correct? DCDuring TALK 17:56, 1 January 2013 (UTC)[reply]

But deprecation is really a delayed deletion, isn't it? In practice it means that usage of a template is actively discouraged so that usually means orphaning and deleting it anyway. —CodeCa t 17:59, 1 January 2013 (UTC)[reply]

I think it can make sense to propose and discuss deprecation of widely-used 'unique' templates like the 'list' ones or—for example only!—{{context}} in the BP or GP, because deprecation of such templates changes the policy or structure/format of the site. I agreed with Ruakh, for example, that WT:RFDO#Template:Eror was in the wrong venue and should be a a BP/GP discussion, because it (re)moves some functionality (and, by my reading, a PREF). I have seen other discussions shot down as "premature" or "keep, still in use"; people who call RFDs "premature" evidently share my feeling that consensus to stop or continue using unique templates is better ascertained before listing on RFDO, not by the listing on RFDO.

On the other hand, it is my experience that discussions of whether or not to delete little-used templates (like WT:RFD#Template:ö or WT:RFDO#Template:fact) or members of a class (e.g. the context template *{{US 2nd infantry division slang}}—sure, we want context templates, but do we want that one?) and discussions of categories (like WT:RFDO#Category:Countable_nouns_by_language) result in actual discussion of whether the specific template/category should be used or not. - -sche (discuss) 18:37, 1 January 2013 (UTC)[reply]

Asturbot

I started a new vote for getting Asturbot bot status. She's a new bot for Asturian verb forms, and will run just like User:SemperBlottoBot. The bot page is Wiktionary:Votes/bt-2013-01/User:Asturbot for bot status. --Wikt Twitterer (talk) 18:43, 1 January 2013 (UTC)[reply]

Standardized usage examples- why not?

There are two (possibly more) competing standards for how examples sentences are to be formatted. The first, which is prescribed in WT:ELE:

definition
example sentence
transcription of example sentence

translation of example sentence

The other format, not explicitly stated anywhere AFAIK and with many minor variations by various editors:

definition
example sentence — translation / transcription

For editors using the second format, do you feel that taking up two or three lines for an example sentence is excessive for entries with many examples (braukt for example)? Should we document and officially support this format in WT:ELE? Or is there a consensus to use one or the other and convert entries to that format? DTLHS (talk) 22:13, 1 January 2013 (UTC)[reply]

The template {{usex}} makes it easier to standardize the formatting of example sentence, but it can be set either to put the translation on a separate line or on the same line as the example sentence and separated by a dash. I usually put the translation on the same line if the usex is short (especially if it's less than a full sentence, like just a noun phrase or verb phrase) and put it on a separate line if it's longer. In braukt, I would put the usex for sense 5 on two lines, and maybe the one for senses 3 and 4 as well. —An gr 22:33, 1 January 2013 (UTC)[reply]

I prefer the first option, for the same reason people should use paragraphs: horizontally separating text into different sections makes it more readable and the page less packed. In addition, it makes it much easier to compare the foreign language and English sentences element by element, a useful technique for language learners. — Ungoliant ^(Falai) 23:21, 1 January 2013 (UTC)[reply]

Like Angr, I use the three-line format for longer examples and a one-line format (ככה למשל — kákha l'mashál — like this for example) for shorter ones. I'd be O.K. with standardizing on either one. (Except that for affixes, I use a completely different one-line format, of the form בָּרִיא (barí, “healthy”) → בְּרִיאוּת (b'ri'út, “health”), and I'd be loath to give that up.) —Ruakh_TALK 02:35, 2 January 2013 (UTC)[reply]

I also use that for affixes. We might want to templatise it. — Ungoliant ^(Falai) 02:49, 2 January 2013 (UTC)[reply]

I use and prefer the first example. What about linking of example sentences like this one? Is that OK? I'd also suggest to use {{l}} rather than square brackets or at least script formatting templates like {{Hani}} or {{Arab}}, etc., e.g. اللغة العربية جميلة جدًا looks much better than اللغة العربية جميلة جدًا and 日本語は美しい言語です is better than 日本語は美しい言語です. Should we make script conversion mandatory in example sentences? This also helps right-to-left oriented languages, IMHO. --Anatoli ^{(обсудить}/^вклад) 11:32, 2 January 2013 (UTC)[reply]

I don't like the idea of linking every word in a usex; it seems distracting. As for your Arabic and Japanese examples, on my computer at least the plain-linked Arabic looks better than the templated Arabic, and the plain-linked Japanese looks identical to the templated Japanese. —An gr 19:31, 2 January 2013 (UTC)[reply]

It's standard to link every term in a usex for most Mandarin entries, and I find it very helpful there. Also, on my computer the Arabic is only somewhat improved but the Japanese is vastly better. Without the template, my eyes are not able to pick out the individual lines of 語. —Μετάknowledge^{discuss/deeds} 19:53, 2 January 2013 (UTC)[reply]

I seem to recall that we've agreed not to link any — let alone every — word in a usex.—msh210℠ (talk) 07:11, 4 January 2013 (UTC)[reply]

I'm with Angr and Ruakh re short/long format for (respectively) short/long usex. I think that the dash used in the short format should be the quotation dash ― (U+2015) (and that's what I use for it).—msh210℠ (talk) 07:11, 4 January 2013 (UTC)[reply]

Personally, I think it's overkill to link every single word. It is, however, very useful to occasionally link some words. --Wiki Tiki 89 07:45, 4 January 2013 (UTC)[reply]

About User:Fête

Hi. Sorry if this message is not in the correct place but I couldn't find your admin's noticeboard. I just wanted to notify you that User:Fête was blocked for crosswiki disruption after being blocked on four Wikimedia projects for various reasons (inserting false information in pages, harassing other contributors, ...). Ĉiuĵaŭde told me on my meta's user talk page said that you were the only ones not having issues with him, so if you want him to come back please tell us and we'll do the necessary. Best regards, -- Quentinv57 10:39, 2 January 2013 (UTC)[reply]

Perhaps discussion will reveal otherwise, but my impression was that there wasn't the will to block him, but there's also not the will to ask for him to be unblocked. I think saying we were not having issues with him was overstating the issue; we did not as a whole find the issues we were having with him as being block worthy. All from my perspective, of course.--Prosfilaes (talk) 01:15, 3 January 2013 (UTC)[reply]

Seconded. Mglovesfun (talk) 18:02, 3 January 2013 (UTC)[reply]

Thanks for the info. In the future, here's fine, but the specific page for notifying us of vandals is WT:VIP.—msh210℠ (talk) 07:14, 4 January 2013 (UTC)[reply]

Clear widespread use

I would propose to drop the 'clear widespread use' criterion from WT:CFI#Attestation. It is sometimes cited on RFV as a reason to pass dubious terms like suicide cable where, since there is no definition of clear widespread use in the CFI, editors will often try and argue that rare terms are in clear widespread use to avoid having to find three citations. It has also been argued that if anything is in clear widespread use, there must be three durably archived citations for it.

If the idea of that rule is to stop bad faith nominations, like maliciously nominating erm, maliciously for verification, we don't need a specific rule in the CFI to deal with that do we? My interpretation of the three citation rule is that there citations have to exist, whether or not they are copied up onto the page or the citations page or not. Mglovesfun (talk) 11:49, 3 January 2013 (UTC)[reply]

I would support ditching it. Citations for truly widespread terms should be trivial to find. If mass bad-faith nomination of everyday senses is a problem, block the user who is doing it. Equinox ◑ 13:18, 3 January 2013 (UTC)[reply]

I don't think that this will have much desirable effect on the number of nominations, but it will increase the labor required to cite senses of polysemic words and colloquial expressions in some cases. It may lead to the deletion of some that do not make it into print or bits.

I have interpreted "clear widespread use" as a consensus voting matter. Thus, all it should take is, say, one uncontested rejection of the claim to force attestation.

If someone is going to claim that citations exist for a term that has no support in other dictionaries, we need citations in the entry in all cases except those whose widespread colloquial use is colloquial. Otherwise laziness is going to push down the slippery slope to whimsy and contentious voting.

It only takes a little backbone on the part of someone closing the RfV to reject the claim of "widespread use". If the closer abuses the claim, they should be asked not to close out RfVs. DCDuring TALK 13:46, 3 January 2013 (UTC)[reply]

Regarding "it will increase the labor required to cite senses of polysemic words and colloquial expressions in some cases", surely not, as the whole point of clear widespread use is it replaces citations. Mglovesfun (talk) 18:01, 3 January 2013 (UTC)[reply]

But you "propose to drop the 'clear widespread use' criterion from WT:CFI#Attestation." That would mean presumably mean that one would need to have actual citations instead. Actual citations require more work. Ergo, presumably your proposal would require more work. DCDuring TALK 18:27, 3 January 2013 (UTC)[reply]

Well yes actual citations, but not for every definition. Unchallenged definitions won't need citations, which is already the case now. I suspect in practical terms, it would affect literally zero entries. Mglovesfun (talk) 20:41, 3 January 2013 (UTC)[reply]

Polysemic words often need citations to make it clear that the definition exists and how it is distinct from other definitions. I thought part of the point of CFI was to cause the deletion of words that do not make it into print or bits. Even now, most people will have to take the word of one or two contributors that they exist, and if they aren't recorded in any place, no one in the future will know that they existed.--Prosfilaes (talk) 00:41, 4 January 2013 (UTC)[reply]

I support ditching the criterion. I agree with you on this: "My interpretation of the three citation rule is that there citations have to exist, whether or not they are copied up onto the page or the citations page or not". --Dan Polansky (talk) 20:35, 3 January 2013 (UTC)[reply]

Small correction to myself then, they should demonstrably exist, that is to say, can be shown to exist using hyperlinks, ISBNs, ISSNs, etc. Mglovesfun (talk) 20:40, 3 January 2013 (UTC)[reply]

I agree that unchallenged definitions won't need citations. More generally, it should be mentioned in CFI that, when something is, very clearly, a word used in a language, other CFI criteria become irrelevant. CFI criteria are useful only when there is a doubt. Lmaltier (talk) 21:12, 3 January 2013 (UTC)[reply]

I don't think we need to say that, do we? Can you think of an example of a word which is very clearly a word but wouldn't meet any of the rules we have? Mglovesfun (talk) 01:18, 5 January 2013 (UTC)[reply]

This would prevent some words to be proposed to deletion (e.g. France, German compound words, etc.) I have not seen words such as birdlike proposed to deletion as SOP, but this might well happen. Lmaltier (talk) 06:01, 7 January 2013 (UTC)[reply]

Overflood of attestation time information

I for one do not like this revision of abusive. Visually, the definition lines are busy with information about time of attestation, followed with reference numbers that I suppose are there to refer to attestation dates, as we use actual quotations rather than secondary sources to source definitions. If someone else feels the same, let us do something about it. A quick fix would be just drop all the date information and the reference numbers from the definition lines, but I am afraid that will find not much support. --Dan Polansky (talk) 20:43, 3 January 2013 (UTC)[reply]

Hmm. It's very OED, I guess, but it does look messy and clutters the integral definition section. I'd like to relegate that information to the etymology, but I can't think of a good way to format that. —Μετάknowledge^{discuss/deeds} 20:49, 3 January 2013 (UTC)[reply]

(after edit conflict) I actually feel the same, but have done a poor job of articulating why to Speednat (talk • contribs). The intentions are clearly good, but I too feel like the definition lines should be saved as much as possible for definitions. I've removed his/her references before as 'unneeded' (since we try to use primary not secondary sources as much as possible). Something like 'tending to abuse, to commit abuse' doesn't need a reference as no reasonable person would contest it. Also on WT:FEED, the most common complaint is being unable to find definitions, so this sort of thing, in my opinion, harms usability. But frankly I don't think usability is high on most of our editor's lists. Mglovesfun (talk) 20:52, 3 January 2013 (UTC)[reply]

Don't we usually shorten that to "From 16^th c." rather than "First attested in the mid 16^th century."? --Wiki Tiki 89 21:01, 3 January 2013 (UTC)[reply]

I agree that this information is interesting but that it should be moved to the etymology section (considered as a section about the history of the word and its senses). Lmaltier (talk) 21:05, 3 January 2013 (UTC)[reply]

I like Wikitiki's suggestion for making this worthwhile information less intrusive. I would not want to see this information lost, as it is very useful in reading literature not from the 21st century. DCDuring TALK 22:40, 3 January 2013 (UTC)[reply]

Small tweak: "from mid 16th century". (Or "from mid 16th c."; we do tend to disfavour abbreviations, and I've changed many a "c." to "century" before accordingly, but my point is here is just that we should keep the "mid", "early", "late" info.) - -sche (discuss) 23:29, 3 January 2013 (UTC)[reply]

The suggestion of moving the info entirely to the etymology section is unworkable for polysemous entries: would you (proponents of this idea) reproduce all the senses in the etymology section in order to note when each one was attested, "the sense 'to walk; to fare on one's feet' is first attested in the 11th century and went out of use in the 19th century, the sense 'foobar' in the 17th century, the sense 'bazoo' around 1900"? That would be foolish, unnecessary duplication that would quickly fall out of sync. Or would you say "the first sense is first attested in the 17th century, the second sense..."? That would fall apart if anyone re-ordered the senses, even by adding a sense, or be unübersichtlich if an entry had many senses, like [[line]]. - -sche (discuss) 23:29, 3 January 2013 (UTC)[reply]

Or you could add the cites with {{timeline}}s to Citations:abusive. ~ Röbin Liönheart (talk) 04:28, 4 January 2013 (UTC)[reply]

That hides it away from the main definition page. It's tedious to add a lot of cites, and there's no way to distinguish detailed work from the normal "here's a splatting of cite that were at hand" that may in no way represent the actual temporal range of a word. It's just not a substitute, IMO. This is useful information to present with the definitions.--Prosfilaes (talk) 04:57, 4 January 2013 (UTC)[reply]

Can't we add a parameter to the quotation template that says "this citation is intended to serve as the earliest example of the sense in use"? That would move the dates out of the definition line and actually require evidence for a temporal claim. DTLHS (talk) 05:02, 4 January 2013 (UTC)[reply]

It's not necessary to follow the order or the number of definition lines in the etymology section. The appropriate order in this section is strictly chronological, it can logically group noun, adjective and verb senses, and the style is quite different: e.g. From ..., the word is sometimes used as a verb. Lmaltier (talk) 06:31, 4 January 2013 (UTC)[reply]

I look forward to your work on any English word with more than, say, six senses, to instantiate the model you suggest. Is such a model available for inspection at other wiktionaries? We already have numerous complaints about the length of material before the definitions, principally Pronunciation and Etymology. I find it absurd that we are complaining about one or two line inches at the far right hand side of a sense line, when we have line after line of Pronunciation material that very few "normal" users can take advantage of and line after tedious line of cognates which should be available under Descendants in etymons' entries. We normally don't even hide the material under (deprecated template usage) rel-top. DCDuring TALK 13:26, 4 January 2013 (UTC)[reply]

At other wiktionaries, I don't know. But this is done in the Robert, dictionnaire historique de la langue française, a dictionary published in 1992 and describing the etymology and history of French words and their senses. Lmaltier (talk) 22:24, 4 January 2013 (UTC)[reply]

I agree with -sche regarding the glut of information that would end up in the etymology section. If there was 1 sense and 1 date that could make sense; however there are multiple senses to most words and having the attestation dates after the sense in small type seems the least obtrusive way. I, initially when adding this information was placing it in the etymology section, but was corrected by multiple editors. After carefully looking at the feedback that I had received, I realized that the etymology section was not the best place. Speednat (talk) 22:12, 4 January 2013 (UTC)[reply]

The date of first attestation as added behind definitions is a class of information that is broken down by sense. In this it does not differ from synonyms, antonyms and translations. Thus, I would place it in a dedicated section located somewhere at the bottom of the page. Its being specific to sense does not prevent this; see synonyms, antonyms, and translations for some of the models. --Dan Polansky (talk) 22:22, 4 January 2013 (UTC)[reply]

I know I'll kick myself later for agreeing, but that idea could work. Speednat (talk) 22:25, 4 January 2013 (UTC)[reply]

I would ascribe much more weight to the opinion of those who have done the work to provide this information than to that of all those who have not. Why don't we ask them?
I don't believe that we necessarily have citations to support all the defdates. We may be relying on reference works. DCDuring TALK 22:38, 4 January 2013 (UTC)[reply]
I do not want to ask those who have created the clutter at this revision of abusive as if they knew better than myself how to best present and structure information in the dictionary. Copying and pasting information from a source is trivial; it does not earn anyone right to have more say in information presentation.

As far as I know, the content of the defdate is an import from a single reference. It borders on copyright violation, but may be barred from it by the notion that information is not protected, merely its expression. --Dan Polansky (talk) 22:46, 4 January 2013 (UTC)[reply]

And wherever we do, can we please just write plain ordinal figures (16th) and not sprinkle the text with twee Microsoftisms (16^th)? Sincerely, thank you. —Michael Z. 2013-01-24 23:01 z

Agree (though I regard them as twee faux-archaisms). Equinox ◑ 23:11, 24 January 2013 (UTC)[reply]

Abyssinian primrose

I ask a third person to take a stance on Abyssinian primrose and the use of {{reference-book}}. See also User talk:Speednat. I contend {{reference-book}} should not be used in References sections. --Dan Polansky (talk) 21:55, 4 January 2013 (UTC)[reply]

See also Wiktionary:RFM#Template:reference-book for a proposal that stands a chance of avoiding confusion in future. --Dan Polansky (talk) 21:59, 4 January 2013 (UTC)[reply]

"Contributions first, then a user page"

SemperBlotto has been deleting new users' user pages, giving the above reason. While I do think that many people join and create a user page more to showcase themselves than to help Wiktionary, this is very unwelcoming to a user who is genuinely interested in contributing. I don't think contributions should be a requirement for a user page at all; it's common for users to set up their "profile" on websites before using them. I'd do the same too, so I'd be rather annoyed if I was shooed off like that. What do others here think? —CodeCa t 22:53, 4 January 2013 (UTC)[reply]

If I'm not mistaken, he usually waits a few days before deleting such a user-page, to see if the user will start contributing. (Though I'd be O.K. with waiting much longer, even a month or so.) Also — the reason that it's common for users on other web-sites to set up their profiles immediately is that those web-sites encourage that. As soon as you create an account on <insert name of web-site here>, it either starts asking you questions to populate your profile page (actually, sometimes it even starts those questions before creating the account), or else gives you a link to the page where you can fill out your profile. Wiktionary doesn't do that. —Ruakh_TALK 23:28, 4 January 2013 (UTC)[reply]

Is there a warning for users to contribute or face user page deletion? For the most part, if all someone does is to create a user page and fiddles with it, it doesn't affect the project as a whole, and is probably not worth much project attention. bd2412 T 23:40, 4 January 2013 (UTC)[reply]

@Ruakh: Well, not really. The latest case that comes to mind is User:MuhdNurHidayat, deleted one day after it was created upon the user registering. CodeCat undeleted it, upon which SB redeleted it. The user now has a userpage, although he has only made 13 edits to date.

@BD: I have never seen a warning to be given. —Μετάknowledge^{discuss/deeds} 23:57, 4 January 2013 (UTC)[reply]

You know the old saying about catching more flies with honey than with vinegar. bd2412 T 02:22, 5 January 2013 (UTC)[reply]

Every morning I look at Special:New to see if there are any new User and User talk pages. Mainly this is to weed out the many promotional pages that we have been getting lately - added by some sort of spambot. If I see any new ones (typically containing "Hi" or similar random text) that I think are from people who have no intention of contributing, I just delete them (as above). Perhaps I shall take them to RfD in future and waste other people's time as well. SemperBlotto (talk) 08:31, 5 January 2013 (UTC)[reply]

(X)SAMPA

Can we please do away with the XSAMPA transcriptions? They are redundant to the IPA transcriptions and much harder to read. They are also a pain in the ass to keep synchronized with the IPA. --Wiki Tiki 89 23:30, 4 January 2013 (UTC)[reply]

I wouldn't object, unless there are still people out there who don't have IPA-compatible fonts on their computers. —An gr 00:00, 5 January 2013 (UTC)[reply]

I wouldn't object either. —CodeCa t 00:01, 5 January 2013 (UTC)[reply]

No objection, but don't people use SAMPA pronunciations? I'm not informed on this one. Mglovesfun (talk) 00:08, 5 January 2013 (UTC)[reply]

The only purpose of SAMPA is to enable IPA transcription using only ASCII characters. Before Unicode came along and started expanding the number of characters it was possible for people to use on their computers, SAMPA was the only way to indicate IPA transcription unambiguously on a computer. But now practically everyone has a font with at least the most common IPA characters installed on their computer, and the only reason to use SAMPA is that it can be typed directly from an unmodified keyboard. People may still use SAMPA for applications where they're going to be typing themselves (e.g. file names) or where only ASCII characters are accepted (e.g. passwords), but for our purposes I think it's sufficient to have only IPA, provided we really don't still have any users who will see nothing but little boxes or little question marks if we remove the SAMPA transcriptions. —An gr 00:18, 5 January 2013 (UTC)[reply]

Please note that SAMPA and XSAMPA are not at all the same thing. To quote Wikipedia: "Since SAMPA is based on phoneme inventories, each SAMPA table is valid only in the language it was created for." I think we should definitely do away with SAMPA. However, XSAMPA is a universal mapping of IPA using ASCII, and therefore isn't so limited. I've no intention of ever working in XSAMPA, but I've no personal objection to it either. --EncycloPetey (talk) 02:36, 5 January 2013 (UTC)[reply]

I think we already got rid of SAMPA some time ago, when {{SAMPA}} was made a redirect to {{X-SAMPA}}. I don't think X-SAMPA is particularly necessary in entries, but maybe there could be a gadget that includes it? If the mapping is one-to-one, then we could avoid the duplication by leaving X-SAMPA out of the entries, and letting the gadget auto-transliterate the IPA on demand or for those users that have enabled it. —CodeCa t 03:00, 5 January 2013 (UTC)[reply]

It's not exactly one-to-one in the sense that each individual IPA character has a single XSAMPA character, as sometimes a single IPA char. converts to a sequence of two ASCII chars., but there is indeed a specific symbol or symbols assigned to each IPA char. The only catch would occur in some situations where "fiddly bits" have been added to make the pronunciation narrowly phonetic rather than broadly phonemic. --EncycloPetey (talk) 03:08, 5 January 2013 (UTC)[reply]

Support. Every time someone uses SAMPA the dead creators of Unicode roll in their graves. — Ungoliant ^(Falai) 03:27, 5 January 2013 (UTC)[reply]

Or a kitten dies? Mglovesfun (talk) 18:34, 5 January 2013 (UTC)[reply]

No, it makes me cry whenever I see X-SAMPA. Isn't that a truly terrible thing? -- Liliana • 19:06, 6 January 2013 (UTC)[reply]

I thought that Wikimedia does have users (I remember some small number - 5%? less?) who see mojibake or boxes or something instead, can't remember the versions or specs. Might be worth looking into that, because except when Liliana cries, the X-SAMPA really doesn't hurt our entries any, and it's better if more people can view pronunciation info. —Μετάknowledge^{discuss/deeds} 19:25, 6 January 2013 (UTC)[reply]

Looking at water, it certainly adds to the line noise feeling of the pronunciation section, and it's a good way for contradictions to creep in; everytime anyone touches the pronunciation of any line that has a X-SAMPA version, they need to update both.--Prosfilaes (talk) 10:37, 7 January 2013 (UTC)[reply]

While we are here: some of our entries use {{shavian}} and {{deseret}} for pronunciation information. Is that something we want? -- Liliana • 19:36, 6 January 2013 (UTC)[reply]

We have Desert and Shavian transcriptions? Please can we not? I think the risk that they'll fall out of sync with the IPA is too great and the possibility that someone will understand them too small. I'd keep examples of them in the entries [[Shavian]] and [[Deseret]], but not elsewhere. — This unsigned comment was added by -sche (talk • contribs).

If we switch X-SAMPA to be a Javascript gadget, it probably wouldn't be too hard to make Shavian and Deseret options.--Prosfilaes (talk) 00:33, 10 January 2013 (UTC)[reply]

They would be much harder since they don't have direct IPA equivalents. It would only work if we 100% standardize our English IPA, which will never happen. --Wiki Tiki 89 00:49, 10 January 2013 (UTC)[reply]

I don’t see why not. — Ungoliant ^(Falai) 02:45, 9 January 2013 (UTC)[reply]

I don't use it but, on fr.wikt, there is a gadget for automatically displaying API as X-SAMPA, if you prefer X-SAMPA. This is probably the best solution. Lmaltier (talk) 05:54, 7 January 2013 (UTC)[reply]

{{look}}

I'd propose we set up a vote on whether to replace XSAMPA templates with a javascript gadget in the IPA template. --Wiki Tiki 89 00:38, 9 January 2013 (UTC)[reply]

Since X-Sampa is a direct transcription/recoding of IPA, then a gadget or template makes much more sense than separate text.

Does anyone use X-Sampa? Does anyone actually know of someone who does? Maybe nothing makes even more sense. —Michael Z. 2013-01-24 23:19 z

Lists of related terms

To quote Metaknowledge: "I will keep removing excessively long lists [of related terms] like my edit at patria." I feel that it is poor editing and worse policy to remove lists of related terms from entries simply because (a) they are long, or (b) they are perceieved to clutter or swamp an entry. --EncycloPetey (talk) 02:32, 5 January 2013 (UTC)[reply]

To quote WikiTiki in the same discussion: "I think it is perfectly OK for patria to have pater as a related term. All the other ones, however, are just overkill and belong as derived terms of pater." Diff removed this list of terms that did not actually do anything to improve the entry's quality. But what I think EncycloPetey is trying to say (he did add that he "concede[s]" the issue of patria specifically) is that editor input on how to use the 'Related terms' header is wanted. Is it appropriate at (deprecated template usage) corvīnus, for example, where the so-called related terms is actually the etymon? —Μετάknowledge^{discuss/deeds} 02:49, 5 January 2013 (UTC)[reply]

See User talk:Metaknowledge for the prior discussion, but please comment here. --EncycloPetey (talk) 02:55, 5 January 2013 (UTC)[reply]

I don't think DRY across pages is an accepted principle here and I doubt that, in general it should be. If it were we would have long since deleted cognate lists from English etymologies, depending instead of Latin, Proto-germanic and PIE descendants lists.

As it is not, I don't think invoking it has much normative force. IMO, a rule with more normative force is: don't delete correct content without supporting consensus, for which WT:TR is a good forum. DCDuring TALK 03:56, 5 January 2013 (UTC)[reply]

Actually, those cognate lists annoy me. I think that the etymological coverage on a page like water#Etymology or lai#Latvian is amazing, but it would be more appropriate to have it be autocollapsed. —Μετάknowledge^{discuss/deeds} 04:46, 5 January 2013 (UTC)[reply]

The big problem with the cognate lists is the "me-too" addition of cognates in contributors' pet languages, such as often happens with Kurdish and Albanian (though at least Albanian is a separate branch of PIE). The only language that should get that treatment is English, since it's a language of special interest to most users. Otherwise, it should be limited to those forms that say something interesting about the history and distribution of the term among related languages. Chuck Entz (talk) 06:08, 5 January 2013 (UTC)[reply]

The question of etymology section cognates is a bit off-topic, but I'd think that certain Classical languages also benefit from cognate lists. I expect to see them in Latin, Ancient Greek, Sanskrit, and such. --EncycloPetey (talk) 09:47, 5 January 2013 (UTC)[reply]

I agree with EP here. If the Related terms section is cluttering the page, {{rel-top}} can be used; if some related terms are more important than others, they can be emboldened; if repetition is the biggest problem, a system similar to {{list}} can be created. — Ungoliant ^(Falai) 03:57, 5 January 2013 (UTC)[reply]

And, what Ungoliant said. DCDuring TALK 03:58, 5 January 2013 (UTC)[reply]

@Ungoliant: Related-list templates? Maybe not a bad idea. But I don't plan on implementing that. Bolding is too subjective, IMO. I will try to remember to use the rel templates, though. —Μετάknowledge^{discuss/deeds} 04:46, 5 January 2013 (UTC)[reply]

I oppose the removal of related terms from "patria" done by Metaknowledge in diff. I have reverted his edit. Note that the related terms that he has removed were hidden in a collapsible box, so did not visually clutter the page. His edit is not based on previous common practice regarding related terms. --Dan Polansky (talk) 10:07, 5 January 2013 (UTC)[reply]

I don't oppose the removal; I assume he removed it because he saw my edit summary one edit before saying "some of these are only tangentially related to patria", which they are. I don't think one term derived from pater really needs a list of all other terms that are also derived from pater. They should at least have some semantic connection to patria itself, and none of the terms currently in the box, except maybe patrius, do. Just because the terms can be collapsed so they don't take up space, that doesn't mean they're relevant to the term you're looking at. —An gr 19:41, 6 January 2013 (UTC)[reply]

I didn't see your edit summary, actually, and I agree that they're "tangential". Yes, (deprecated template usage) patrius does have somewhat of a semantic connexion... but that's because it's etymologically related in a more direct manner than the table might suggest. Dan, you oppose just about everything, so I'm not surprised. It seems that you're preferring so-called common practice over what actually belongs in an entry. —Μετάknowledge^{discuss/deeds} 21:33, 6 January 2013 (UTC)[reply]

"in other languages"

At the bottom left of our Main Page is a list of pointers to other Wiktionaries. The words chosen are the names of the languages in their own language. Of those, the following words are red links. Should they be? SemperBlotto (talk) 08:11, 5 January 2013 (UTC)[reply]

Asturianu Български Brezhoneg Català Česky Ελληνικά Español Hrvatski Bahasa Íslenska Kurdî Lietuvių မြန်မာဘာသာ Português Română Русский Sicilianu Српски Svenska Українська Tiếng

Apart from capitalisation Ukrainian and Belarusian language names are feminine adjectives, abbreviation of Ukr.: англійський -> англійська (мова). tiếng is just language in Vietnamese, should be lower case. lietuvių is genitive plural of lietuvis, so "lietuvių kalba" is "language of Lithuanians", lietuviškai is used to to say "in Lithuanian". --Anatoli ^{(обсудить}/^вклад) 04:27, 10 January 2013 (UTC)[reply]

In some cases this is because the language in question does not capitalise its own name. I suppose the capitalisation on the Main Page is just "title case". Equinox ◑ 08:15, 5 January 2013 (UTC)[reply]

In all these cases either Eq. is right or you cut off the language's name improperly. Alone, (deprecated template usage) bahasa and (deprecated template usage) tiếng just mean "language" and shouldn't be capitalized. The actual names of these languages are (deprecated template usage) bahasa Melayu and (deprecated template usage) tiếng Việt. —Μετάknowledge^{discuss/deeds} 08:30, 5 January 2013 (UTC)[reply]

Correcting the Czech spelling of Czech (it ends in ý, not y) and replacing capital letters with lower-case letters (except for Burmese, which doesn't make that distinction) we get:

asturianu български brezhoneg català český ελληνικά español hrvatski bahasa íslenska kurdî lietuvių português română русский sicilianu српски svenska українська tiếng

So it looks like the only "real" red link is မြန်မာဘာသာ. But we do have other Burmese names for the Burmese language: မြန်မာစာ and ဗမာစာ for the written language, and မြန်မာစကား and ဗမာစကား for the spoken language. —An gr 12:26, 5 January 2013 (UTC)[reply]

It looks like every page with an interwiki link to Czech misspells it with y instead of ý. And not just here, but at WP as well. Whom do we notify to get that changed? Also, I remember a few months ago many interwiki links were being written lower-case if the language name was usually written lower-case in the language in question, but now they're all back to upper-case again (again, not only here but also at WP). So maybe someone has been fiddling with the interwiki language names. —An gr 12:32, 5 January 2013 (UTC)[reply]

"česky" is an adverb, while "český" is an adjective. "Mluvíte česky?" (Do you speak Czech) uses and adverb. The noun form of the language is "český jazyk" or "čeština". Compare Russian по-русски and Я говорю по-русски (ja govorjú po-rússki) — I speak Russian. --Dan Polansky (talk) 16:53, 5 January 2013 (UTC)[reply]

So when you say you speak Czech, you really say you speak "Czechly"? —CodeCa t 16:58, 5 January 2013 (UTC)[reply]

Right. It makes sense when you think of the language as an attribute of the verb "speak" in *"I speak Czechly", much like "I speak fast". --Dan Polansky (talk) 17:05, 5 January 2013 (UTC)[reply]

Perhaps it's easier thought of, from an English-speaker's standpoint, as being analogous to "to speak in English", where "in English" is an adverbial complement-or-modifier-or-something to the verb "speak". (Not a perfect analogue — "I can speak in English" does not have the full range of meanings that "I can speak English" has — but it at least helps show how it could make sense.) —Ruakh_TALK 17:12, 5 January 2013 (UTC)[reply]

(after an edit conflict) You'll find these things at do you speak English, which has cs:mluvíte anglicky?, where "anglicky" is an adverb; it has Bulgarian говориш ли английски?, which sounds similar, suggesting английски is an adverb. Obviously, you will not find this at do you speak ... or the like. So much for the usefulness of some of our phrasebook entries. --Dan Polansky (talk) 17:15, 5 January 2013 (UTC)[reply]

Hm, but английски is indicated as an adjective and a noun. I cannot make a grammatical sense of "говориш ли английски", but I know no Bulgarian. --Dan Polansky (talk) 17:23, 5 January 2013 (UTC)[reply]

Need to confirm this but I'm pretty sure that английски (anglijski) is both adjective and adverb, it's similar to Serbo-Croatian енглески/engleski. Czech and Russian (and some other Slavic languages) are different in this respect from both bg and sh (and some other Slavic languages). --Anatoli ^{(обсудить}/^вклад) 03:59, 10 January 2013 (UTC)[reply]

I found out that Slovene also uses an adverb to refer to speaking in the same way that Czech does. But it has a different form. The adverbial form of an adjective is normally the neuter singular, which ends in -o. So "I speak English" is govorím angléško while the adjective is angléški and the name of the language (as a noun) is angléščina. —CodeCa t 04:10, 10 January 2013 (UTC)[reply]

Yes, angleško is an adverb. I don't have access to my handy tools and keyboards and a bit limited on time at the moment (I am not on my regular computer today) but I could give a quick summary of how "in other languages" is formed in all Slavic languages later, especially East Slavic. The translations of I don't speak English can give an idea but there are nuances on some language names, e.g. "я не говорю на хинди" or "я не говорю на латыни" when a language name is a noun, not an adjective. --Anatoli ^{(обсудить}/^вклад) 04:27, 10 January 2013 (UTC)[reply]

One more thought: "The book was originally written in English" is an English analogue, in which the verb "write" is modified by adverbial prepositional phrase "in English"; in Czech, "kniha byla původně napsána anglicky". Another one: "How do you say this in English?" -- "Jak se to řekne anglicky?" -- again featuring "in English". Just a small addition to what Ruakh has already said. --Dan Polansky (talk) 17:28, 5 January 2013 (UTC)[reply]

OK, I didn't realize [[česky]] is a real word of Czech and not just a mistake for [[český]]. So that was (until about an hour ago) a genuine redlink of ours. As for using an adverb to mean "in language X", it's not that rare. Latin uses adverbs that way (latīnē (“in Latin”), anglicē (“in English”), etc.), and Welsh uses the adverb-forming particle yn + Soft Mutation with a language name to say "in X" (yn Gymraeg "in Welsh" from Cymraeg (“Welsh”), just like yn gryf "strongly" from cryf (“strong”). —An gr 18:14, 6 January 2013 (UTC)[reply]

(if I understand the question!) On my screen, with the Vector skin, all are shown in blue and "Ελληνικά" - normally spelt "ελληνικά" - links to "el.wiktionary.org/wiki/" — Saltmarsh^{απάντηση} 14:10, 10 January 2013 (UTC)[reply]

Using space before defdate

I propose this about {{defdate}}:

Let a space be inserted before each use of {{defdate}} after a definition rather than the definition being separated by no whitespace from the defdate information.
- A space is already used at cat: "A spiteful or angry woman. [from earlier 13th c.]". A space is further used in word and may.
- A space is not used at abaca: "... called Manila hemp.[First attested in the mid 18th century.]" A space is not used in the entries to which the template was added by an editor who has been recently adding the template in volumes, proceeding alphabetically.

Thoughts? --Dan Polansky (talk) 09:54, 5 January 2013 (UTC)[reply]

Yes I've spotted this error too. I don't think we need a discussion about it, it's just an error. A space is required after a full stop before anything else. Mglovesfun (talk) 12:07, 5 January 2013 (UTC)[reply]

Sure. DCDuring TALK 13:01, 5 January 2013 (UTC)[reply]

Fourthed.—msh210℠ (talk) 07:06, 7 January 2013 (UTC)[reply]

Template reference-book and sections

I surmise that {{reference-book}} should not be used in "References" and "External links" sections. In diff, I have said so in the template documentation.

What makes me think so:

The template creates text ending in colon.
The template uses the same format as {{quote-book}} and as described in WT:QUOTE. This format starts with boldfaced year. By contrast, reference templates such as {{R:Webster 1913}} and {{R:Century 1911}} start with the name of the reference work and do not place the year in boldface. A part of the format prescribed by WT:QUOTE and used by {{reference-book}} is this:
Year, Author, Source title, Publisher, pages #–#:

By constrast, {{R:Webster 1913}} produces this:

entry in Webster’s Revised Unabridged Dictionary, G. & C. Merriam, 1913

Any comments? --Dan Polansky (talk) 11:54, 5 January 2013 (UTC)[reply]

Request to be allowed to edit using AWB

Before I get bombarded by questions and comments let me clarify the request. I am requesting access for multiple reasons.

As I edit more outside English WP I am finding that AWB needs some tweaking to work better on the sister projects. Most of its development emphasis has been on EN and IMO its time to start expanding that out a bit. Part of this need is to determine what AWB does here and to help refine some of the functions to do things they way they are done here rather than the way they are used in English Wikipedia.
I would also like to use the functionality to compare groups of things that appear in Wikipedia and may need to be created here (some terms, acronyms, etc.).
Additionally I would like to use it to do some changes. I have reviewed how other authorized users here have used it and I understand the rules here pretty well.

Also to clarify my edit count. I haven't done a large number of edits here yet, but have done a few and have read a lot about how things are done here. I am good at using AWB in English WP and in Simple WP and I think I can do some good here too. I am not planning on doing any major changes or being controversial. I also understand if the desire is there for me to get more edits first. No big deal, just let me know. Kumioko (talk) 17:47, 5 January 2013 (UTC)[reply]

I'm not too familiar with our AWB policies, but my impression is that it's a bit premature. As far as I know, no one has nominated you for our autopatroller whitelist. That means we still don't know if your edits are reliable enough to not need checking. AWB could be interpreted as a force magnifier: it takes your normal editing ability and multiplies it exponentially by speeding things up and reducing the repetitive overhead. That means many more edits to check. You have experience on other Wikimedia projects, which is both a blessing and a curse: the similarities give the illusion of familiarity, but hide a number of critical differences. I know you're aware of this, but sometimes it takes a while to see all the ways this can play out in real editing. Your heart and head seem to be in the right place, but only experience will tell. Chuck Entz (talk) 18:26, 5 January 2013 (UTC)[reply]

Wiktionary:AutoWikiBrowser/CheckPage does say to ask here, but I too would tend to decline the request at this point as 'too soon'. Mglovesfun (talk) 18:29, 5 January 2013 (UTC)[reply]

FWIW I changed some of your initialism entries to proper nouns and nouns, are they are. Mglovesfun (talk) 18:33, 5 January 2013 (UTC)[reply]

No worries, I'll probably ask again in a few weeks or months once I get some more edits in. Thanks for looking over my edits Mglovesfun, I'll check those over. Kumioko (talk) 19:01, 5 January 2013 (UTC)[reply]

Slovene accent and vowel length

I'd like to help improve our Slovene entries but I've run into a problem. Like in Serbo-Croatian, Slovene uses diacritics to indicate stress and such but only as an extra feature, not in normal writing (so we'd use head=). But confusingly, there are two competing systems since Slovene can be both tonal and stress-based depending on dialect. In the tonal orthography, ´ marks a long rising vowel, ^ a long falling vowel, and ` a short vowel. In the stress orthography, ´ marks a long vowel, ^ a long open-mid vowel, and ` a short vowel. The conflict is in the use of the circumflex which denotes vowel quality in the stress system but tone in the tonal system. Thus stress é could be tonal ê [èː], and stress ê could be tonal é [ɛ́ː]. So which of these two systems should we use on Wiktionary? It seems that Wikipedia's w:Slovene language article uses the stress system, but it's not really a terribly well developed article and I haven't really been able to figure out that much from it. To make matters worse, some sources say vowel length is not phonemic in Slovene and say that it is conditioned by stress only, so I'm not sure what essential difference the diacritics represent in that case. Is anyone here knowledgeable enough in Slovene to shed light on the matter? (Addition: w:Pitch accent#Slovenian language may also be helpful) —CodeCa t 23:50, 5 January 2013 (UTC)[reply]

I'm sorry that no-one who is actually knowledgeable has commented here, but my weak understanding of Slavic linguistics suggests that the pitch accent ("tone") is the only one deserving to be in transcription. The reasoning has something to do with representation and phonemicity in related languages. —Μετάknowledge^{discuss/deeds} 06:18, 9 January 2013 (UTC)[reply]

It seems that the stress-based orthography is actually the more common one, though. Two Slovene dictionaries I found use it, and put the tonal spelling in brackets after the word instead. Slovene Wikipedia also uses the stress orthography to disambiguate or to clarify the headword. And it seems that quite a few of our Slovene entries already use it too. So it looks like the stress spelling is the "default" one that Slovene speakers expect to see. —CodeCa t 04:03, 10 January 2013 (UTC)[reply]

Re: "Two Slovene dictionaries I found use [the stress-based orthography], and put the tonal spelling in brackets after the word": Sounds like a winner — best of both worlds — at least on the headword line. (If it's too awkward to do this on pages that link to the entry, then I guess use the stress-based orthography for those.) —Ruakh_TALK 17:43, 10 January 2013 (UTC)[reply]

For now I've used the stress orthography in the alt-text, and added pronunciations in full tonal IPA. The IPA kind of makes the tonal orthography redundant as they map almost one-to-one to each other (none of the orthographies distinguish the two pronunciations of e and l, which I mentioned below). I suppose we could add tonal spelling to the pronunciation section in the same way we do with enPR already, if someone feels the need to. —CodeCa t 17:49, 10 January 2013 (UTC)[reply]

Saterlandic and East Frisian again

A month ago, certain Latvian loanwords of possible Germanic origin were edited by User:-sche in accordance with the decision to consider East Frisian as meaning either Saterlandic Frisian (stq), or then simply Low German (nds). I have one doubt, though. It seems that Saterlandic Frisian refers now to a small dialect spoken in only one town, whereas the impression I have from my etymological source (the Latvian Etymological Dictionary) is that the words in question were derived from a Germanic dialect, there being doubt about whether the actual source was Middle Low German, Middle Dutch (from the times of the w:Dutchy of Courland and Semigalia), or Eastern Frisian -- meaning by the latter not the present-day one-village dialect, but a larger dialectal continuum spoken over a much larger territory around the 15th century. Is it not misleading to place these words in Category:Latvian terms derived from Saterland Frisian, as if there had been contact between this one Saterland Frisian village and Latvia? Wouldn't Eastern Frisian be a better label in this case? The previous discussions didn't address this point (i.e., the difference between the present-day dialect and its presumed predecessor). --Pereru (talk) 16:25, 6 January 2013 (UTC)[reply]

Some of the previous discussions addressed this question. Liliana suggested during the (AFAICT) very first discussion of this mess, WT:RFM#Template:frs_-_Template:stq, that because no code exists for Frisian "East Frisian", we either create a dedicated code for East Frisian, or make no distinction between it and stq. There is precedent for creating exceptional codes for languages which have none (see Wiktionary:Languages#List_of_languages_with_exceptional_codes), and there is precedent for not distinguishing modern languages from their extinct predecessors (e.g. our use of {{he}} to refer not only to living modern Hebrew but to Biblical Hebrew). In contrast, use of frs (which is a living language, and thus is not the Frisian "East Frisian" and is probably the Low German one) to refer to Frisian "East Frisian" would as Liliana put it "make a precedent case, and almost certainly require a vote". I favour creating an exceptional code for East Frisian for use in etymologies like the Latvian words', and suggested as much on RFM; please consider voting in favour of the suggestion: WT:RFM#frs_to_gmw-fre. - -sche (discuss) 19:36, 6 January 2013 (UTC)[reply]

I can't imagine Latvian would have ever borrowed words from East Frisian, or any variety of Frisian. Its distribution was wider in the 15th century than just Saterland, but it was still a very small language whose speakers used Low German as their language of wider communication with outsiders. I have no doubt the Hanseatic League brought lots of Low German words to Latvian, just as it did to Estonian, Swedish, Danish, etc., etc., but not Frisian. —An gr 19:53, 6 January 2013 (UTC)[reply]

WT:ELE#Additional headings

There are a couple of things I would like to change about this section.

There are additional headings which you should include if possible, but if you don’t have the necessary expertise, resources or time, you have no obligation to add them, with the possible exception of “References”.

I'd drop the underlined bit, as we don't want empty reference headers. And we prefer primary sources (actual use) to secondary ones (dictionaries and so on)

In the order of the headings:

the horizontal line ---- should have an additional line break above it an below it (very minor)
References header level should be L3 not L4 (corrected by Mglovesfun (talk) 20:21, 7 January 2013 (UTC))[reply]
External links header level should be L3 not L4 (corrected by Mglovesfun (talk) 20:21, 7 January 2013 (UTC))[reply]
In the Finnish section, the quote translation should be prefixed with #*: not #**

Thoughts? Vote needed? More than one vote? Mglovesfun (talk) 18:48, 7 January 2013 (UTC)[reply]

In the least, remove the underlined part. Even better, replace the text on the numbered list item with something like 'A comprehesive entry features additional headings, including "Etymology", "Pronunciation", "Synonyms", and others'. The text "you should include if possible" is a misplaced allocation of someone else's resources; it is not the business of WT:ELE to tell editors where they should focus their efforts. --Dan Polansky (talk) 19:03, 7 January 2013 (UTC)[reply]
I support removing the references bit, adding line breaks around ---- and updating the quote indentation, but not making References and External links l4. I don’t think a vote (other than one here) is necessary; certainly not multiple votes. — Ungoliant ^(Falai) 19:08, 7 January 2013 (UTC)[reply]

Sorry, I meant making them L3 (from L4 in the original). Corrected and dated above. Mglovesfun (talk) 20:21, 7 January 2013 (UTC)[reply]

I agree with all points. I was surprised when I saw a few days ago that ELE suggests References should be at L4. - -sche (discuss) 20:27, 7 January 2013 (UTC)[reply]

References and External links should be L3 if they are references/links relating to the whole L2-entry and L4 if they're relating to one L3-POS/Etym (and I think Etym comes up more often than POS for this). I've seen both used right.—msh210℠ (talk) 21:48, 7 January 2013 (UTC)[reply]

Standards of Identity

Discussion continued from WT:RFD#mayonnaise.

In 2009, the community (bd2412, DCDuring, Lmaltier, Ruakh, msh210, Equinox, EncycloPetey, Carolina wren) discussed and decided (not unanimously) to include legal definitions of terms like "semisweet chocolate" ("sweet chocolate that contains not less than 35 percent by weight of chocolate liquor complying with the requirements of Sec. 163.111 and calculated in the same manner as set forth in paragraph (a)(2) of this section") and "ground beef" ("chopped fresh or frozen beef without the addition of beef fat as seasoning, with no more than 30 percent fat, and with no added water, phosphates, binders, or extenders"). Unaware of that discussion, I nominated the legal definition of "mayonnaise" for deletion, and a number of editors who had not contributed to the old discussion voted "delete" (including Chuck Entz and Widsith). Because the decision to keep or delete (or appendicise) Standards of Identity is wider than one entry and would be either a confirmation (if we keep them) or a departure from (if we delete them) the previous BP decision, discussion has been moved here.

So: should we keep such sense-lines? Move them into an Appendix of Legal Definitions? Or delete them?

In favour of keeping such senses, "any reference to a product being bought or sold in a supermarket or put on a hamburger in a fast food restaurant in the United States, is [a use of the term] mayonnaise under the legal definition" ("an edible emulsified semisolid made of: vegetable oil (at least 65%); vinegar and/or lemon juice; raw egg (whole eggs or yolks); and, optionally, any of various flavor-related ingredients, sequestrants, acids and crystallization inhibitors").
In favour of deleting such senses, any reference to a product being bought or sold in a supermarket or put on a hamburger in a fast food restaurant in the United States is use of the term mayonnaise with its usual definition ("a dressing made from vegetable oil, raw egg yolks and seasoning, used on salads and in sandwiches"), even if a more specific definition also applies. This is true across jurisdictions, whereas legal definitions also vary from jurisdiction to jurisdiction. In the same discussion that decided to define "semisweet chocolate" as "sweet chocolate that contains not less than 35 percent by weight of chocolate liquor complying with the requirements of Sec. 163.111 and calculated in the same manner as set forth in paragraph (a)(2) of this section", it was admitted that each of the fifty US states has a different legal definition of "assault"; every other English-speaking nation (and many a subdivision thereof) also has a different definition. "Mayonnaise" itself has different legal definitions in the UK/EU and Australia, New Zealand, etc. Because there are sure to be three English-language books refering to mayonnaise sold in Russian/Japanese/Egyptian grocery stores, even the Russian/Japanese/Egyptian legal definitions can be shoehorned into Wiktionary. And such definitions change in minor ways from time to time. Do we really want two dozen senses at "mayonnaise" like (to give a made-up example):
# {{historical|Australia|legal}} An edible emulsified semisolid made of: vegetable oil (at least 55%); vinegar; raw egg (whole eggs or yolks); and, optionally, any of various flavor-related ingredients, sequestrants, acids and crystallization inhibitors. {{defdate|1987–1998}}

# {{Australia|legal}} An edible emulsified semisolid made of: vegetable oil (at least 65%); vinegar and/or lemon juice; raw egg (whole eggs or yolks); and, optionally, any of various flavor-related ingredients, sequestrants, acids and crystallization inhibitors. {{defdate|since 1998}}

# {{Russia|legal}} An edible emulsified semisolid made of: vegetable oil (at least 70%); vinegar; egg yolks; and, optionally, flavoring ingredients.

? (If we do, we're sure to get the defdates wrong a lot of the time, because no-one has time to check every nation's laws on food products and add new senses to our entries whenever the laws change.) - -sche (discuss) 19:33, 7 January 2013 (UTC)[reply]

: I oppose the creation of definitions like these, except when the legal definition is significantly different from the popular definition. It’s appendix material at best. — Ungoliant ^(Falai) 19:43, 7 January 2013 (UTC)[reply]

I oppose them too, also I wonder if these could pass RFV? How would citations be able to show the 'at least 65 vegetable oil' and so on? Mglovesfun (talk) 19:56, 7 January 2013 (UTC)[reply]

If a significant number of items sold in the stores carry a label, and if failure to adhere to the legal definition of the label results in significant fines for the selling entity, then the items so labeled are an attestion of sort, aren't they? --Dan Polansky (talk) 20:06, 7 January 2013 (UTC)[reply]

Yes, and any quotation like "bought some mayonnaise [from the store]" likewise attests the legal definition at the same time as it attests the common definition. - -sche (discuss) 20:15, 7 January 2013 (UTC)[reply]

Not necessarily. He who reports to have bought mayonnaise is probably not under legal obligation to use the term in compliance with the legal definition, unlike the seller. I am not defending having the legal definitions in Wiktionary; I am trying to figure out how they could be attested, and, from what I can see, actual use of the legal definitions by the sellers applying a label seems very likely given the sold items so labeled. --Dan Polansky (talk) 20:37, 7 January 2013 (UTC)[reply]

I agree (with Dan P., 20:37, 7 January 2013 (UTC)).—msh210℠ (talk) 22:00, 7 January 2013 (UTC)[reply]

I agree (with Dan P., 20:06, 7 January 2013 (UTC)).—msh210℠ (talk) 22:00, 7 January 2013 (UTC)[reply]

In his 1985 article "Lexicalization", Andrew Pawley presented numerous criteria for inclusion of terms in the lexicon. Under "3.1 Criteria related to institutional status in a culture" was "3.1.4 Legal status" from which I quote in full for reference:
The customary status of some form-meaning pairings if formalized to the extent of being codified in legal statutes. Many phraseological terms have legal status, e.g., driving without due care and attention (due care is in numerous dictionaries), with malice aforethought, breach of promise, breaking and entering, strike with intent to injure, accidental death (MacM, CALD). A concept codified in law may have more than one legally acceptable designation, but it is usually the case that certain designations are regarded as more proper, or are more frequently used than others.

This is one of 25 distinct criteria.

As I understand it the "standards of identity" allow a manufacture whose product complies with the definition to not provide an ingredient label. In the case of mayonnaise, many folks would accept a product called "Miracle Whip" as mayonnaise, but, as it does not comply with the standard of identity (too little oil), it is labelled as a "dressing". One might say it is not "true" mayonnaise. Of course, some would say that only fresh mayonnaise is "true" mayonnaise, but such a product is not a major item of commerce and culture.

I certainly could understand why we would include an appendix by reference which contained the wordy definitions. Anyone could choose to enter any such legal definition they chose, but I expect that US, UK, EU, Australia, Canada would be the most we would be likely to see. Many smaller countries essentially piggyback on these regulatory regimes, with occasional exceptions. DCDuring TALK 20:13, 7 January 2013 (UTC)[reply]
I would expect that between large industrialized countries like the U.S., EU nations, and maybe even Russia and China, there is likely to be some similarity in standards of identity due to treaties governing commerce in these items. bd2412 T 01:29, 8 January 2013 (UTC)[reply]
I still think that, in the case of food, what laws present as legal definitions actually define rules about the product in order to guarantee quality. A product (e.g. mayonnaise) not meeting rules cannot be called this way, not because it's not mayonnaise, only because its quality is deemed too low to deserve the name. This means that the only definition worth inclusion is the usual sense. The existence of legal definitions might be mentioned, but would it be useful to readers? However, I feel that the existence of a legal definition is a sufficient condition to warrant inclusion of the term (even if it wouln't have been included otherwise). Note that legal definitions also exist for road types, etc. Lmaltier (talk) 21:14, 7 January 2013 (UTC)[reply]

(Reply to the original post by -sche: in particular, to the comparison to assault.) Mayonnaise must by law be used as defined in the CFR; assault need not (except by government officials, but those are all effectively the same author so don't count as independent citations). So any use you see of assault is likely using it in the usual sense: on the other hand, any product label with mayonnaise is not.—msh210℠ (talk) 22:00, 7 January 2013 (UTC)[reply]

(re "any use you see of assault is likely using it in the usual sense"): To use legal terminology, that is nothing but the truth—but it is inaccurate because it is not the whole truth. Any news article which writes that someone has been accused or convicted of "assault" is using the legal definition of assault used by the state where, and at the time that, the accusation or conviction is made (while simultaneously and by definition using the general sense). If a person is said in a work of fiction to have committed "assault", the definition is probably less rigorous, but there are (sadly) hundreds of newspaper articles from each US state and each other English-speaking nation (of which there are sufficiently many which are also hosted online in places we can easily find them to cite them) in which an "assault" is said to have occurred. I can furthermore find books about how people have been accused or convicted in Russia, Germany etc of "assault"; those are all using the Russian and German legal definitions. It is possible to add at least 65 legal subsenses to [[assault]] (and correspondingly many to [[murder]], etc) in this way. - -sche (discuss) 22:35, 7 January 2013 (UTC)[reply]

I've started adding legal senses to [[murder]] and [[first-degree murder]]. I haven't started adding many citations yet, but any quotation like "convicted by a French court of murder" supports the French legal sense, etc, so they're easy to find. - -sche (discuss) 23:56, 7 January 2013 (UTC)[reply]

(re "any product label with mayonnaise is not" using the general sense): Huh? The legal sense is a subsense of the general sense; uses of the legal sense are by definition uses of the general sense. It is amply possible to find uses of the general sense which are not simultaneously uses of a legal sense, for example in works describing the early history of mayonnaise or in cookbooks that teach "how to make your own mayonnaise", but any use of the US legal sense, even on a product label, is also a use of the general sense by definition. - -sche (discuss) 22:35, 7 January 2013 (UTC)[reply]

Re: "uses of the legal sense are by definition uses of the general sense": I don't think so. "This is a cat" does not attest "cat" to mean the same thing as "animal" when the referent of "this" is a cat (domestic animal) and thus also an animal. The term at the predicate applies one particular restriction, rather that all restrictions that are strictly weaker than that restriction. Put diffently, the use of a narrow sense is not at the same time a use of all broader senses. --Dan Polansky (talk) 18:27, 8 January 2013 (UTC)[reply]

(Reply to the original post by -sche: in particular, to the question of adding definitions as laws change.) Yes, we should add definitions as laws change, much as we add definitions as other uses change: we have obsolete and current senses generally, so why not these? (Assuming, of course, some contributor wants to add them.)—msh210℠ (talk) 22:00, 7 January 2013 (UTC)[reply]

(General comment.) I think the 'definition' "See quotation" (as we have at [[semisweet chocolate]]) is a poor one, and the definition line should be more like what we have at [[ground beef]] (with a citation to the law in the Etymology section). (But I think that, while it needs improvement, the sense of semisweet chocolate should be kept.)—msh210℠ (talk) 22:00, 7 January 2013 (UTC)[reply]

With respect to standards of identity, I would like to propose a two-part resolution. For terms that only exist because a legal entity has established them as a name by which to call a particular thing, such as partially defatted pork fatty tissue or tomato concentrate, we should have an entry. For pre-existing terms for which legal standards were later established, create an appendix and note under the generic entry that legal definitions are available in the appendix, by country if multiple examples can be found. Exceptions to this division might be made where the legal definition of a term is surprisingly different from what one would expect. For example, § 102.33, "Beverages that contain fruit or vegetable juice", states that if a drink product contains less than 100% juice, but the common or usual name uses the word "juice", then the name of the product "shall include a qualifying term such as "beverage", "cocktail", or "drink" appropriate to advise the consumer that the product is less than 100 percent juice (e.g., "diluted grape juice beverage" or grape juice drink")". In other words, if the product as sold as "grape juice cocktail", that by itself indicates that the product is not 100% juice. bd2412 T 01:24, 8 January 2013 (UTC)[reply]

Fine by me. DCDuring TALK 02:14, 8 January 2013 (UTC)[reply]

And by me, I suppose.—msh210℠ (talk) 06:50, 8 January 2013 (UTC)[reply]

Well, I suppose at some point the thing to do would be to propose a formal policy that can be voted on by the community. I think we should narrow the points in this discussion and the previous discussion into a general policy on legal definitions, with a specific policy within it on standards of identity. bd2412 T 04:47, 15 January 2013 (UTC)[reply]

Let's not forget about this.

Pursuant to bd2412's last comment, I've attempted to condense his proposed guidelines into something that could go into an WT:IDIOM-like page. (I don't think this should go into WT:CFI itself; I think it's inherently too nebulous to be an absolute rule rather than a guideline.)

1. If a term exists only because it has a specific legal meaning, it[s legal meaning] should be included. Example: "partially defatted pork fatty tissue".

Alternative wording: […] exists only because it was established as the designation for a thing or act which has a legally-standardised identity, it[s legal meaning] should be included.

2. If a legal standard of identity has been established for a general term, that legal standard definition should be in an appendix, which the general term should link to. Example: "margarine".

See this diff, this diff and this diff for legal senses of "murder" to go in the appendix.

3. If a legal standard of identity established for a general term is surprisingly/unusually different from the common definition of that term, the legal definition should be included in the entry. Example: murder.

I used "murder" as the example instead of "juice" because I think it's clearer.

"Murder" has, in some jurisdictions, the sense "the abetting of any criminal act which results in the death of a person", such that w:Ryan Holle was actually convicted of, and is serving life in prison for, "murder" because he loaned his car to his housemate so his housemate could go get food, and his housemate instead drove three people to a location a mile away, and one of those people inflicted an injury on a fourth person, and that fourth person died of that injury. Any definition of "murder" according to which loaning someone a car is "murder" is clearly nothing like the common definition of "murder". (In contrast, I don't grasp how the fact that "juice" has to be qualified if used to designate a product that cannot legally be designated as "juice" is a definition, as opposed to a usage note. Could you explain?)

- -sche (discuss) 08:36, 12 February 2013 (UTC)[reply]

The point about the modifiers for "juice" are not so much about "juice" itself as they are about those modifiers. If you see the word "beverage" or "drink" or "cocktail" appended to the word "juice", that means that the product contains something other than juice. The common use of words would suggest that a "cranberry juice cocktail" contains only cranberry juice, or at most some mix of fruit juices, but use of that phrase legally indicates that it includes something that is not a fruit juice. bd2412 T 18:42, 12 February 2013 (UTC)[reply]

By the way, I like the shape of this proposal, and will support it. bd2412 T 04:32, 13 February 2013 (UTC)[reply]

Are we going anywhere with this? bd2412 T 16:56, 30 March 2013 (UTC)[reply]

Yes, thank you for continually reminding everyone (me) about this. We should draft, discuss, and hold a vote. I've set up Wiktionary:Votes/pl-2013-03/Standards of identity and legal definitions of terms (I'm in the process of adding wording from here now). - -sche (discuss) 19:20, 30 March 2013 (UTC)[reply]

Definitions from technical or legal authorities don’t have to be attested in the usual way. We don’t need to find three general quotations for pound proving that the writer meant 453.592 g and not 453.58 g, for example. This is one reason why we have technical-subject labels: to show that the definition is given by authorities and accepted by practicioners in specific fields. (This is our common practice – but is it in our guidelines?)

But certainly we should not quote the letter of every single legal jurisdiction. Legal definitions relating to food, alcohol, highway safety, house wiring, taxation, and a thousand other subjects may vary in individual countries, provinces, counties, and municipalities, and may change regularly. We are not a legal reference.

Just as we don’t list every person named “Cher,” or every railway stop or lat-long coordinate named “Paris,” we should list every significantly different sense of these terms, but not offer a catalogue of the things or legal codes that they represent. I think USDA ground beef and mayonnaise go too far; they are just sense no. 1 decorated with one jurisdiction’s lawyerspeak. The entry murder seems to have about the right specificity.

If we’re discussing this, then we should identify and improve whatever guideline should be guiding us on this. —Michael Z. 2013-03-30 17:56 z

I agree with you. I'm setting up a vote based on the wording suggested earlier in this discussion; note that I intend that (creation of the vote) to ensure further discussion of this issue (on the vote talk page, here, or in a new BP thread), not to cut it off: the vote shall not start until discussion is finished, but its existence as a draft will, I hope, keep everyone moving towards formulating a policy. - -sche (discuss) 19:20, 30 March 2013 (UTC)[reply]

Inflection tables for modern Greek verbs

Discussion moved to Wiktionary talk:Greek verb inflection-table templates — Saltmarsh^{απάντηση} 04:35, 25 January 2013 (UTC)[reply]

Using ł and ə in Slovene alt-text

Slovene spelling is mostly phonemic but not quite. There are two unpredictable ways of pronouncing e and l. E can be pronounced either as e or as schwa, and l can be pronounced as either l or as w (v). We can (and should) of course make this clear in the pronunciation section. But seen as we already include accent marks in the alt-text (like Serbo-Croatian does), I wonder if it would be ok to include ł and ə in it as well, to indicate when l and e are to be pronounced as w and schwa. So, for example Slovenec would have Slovénəc as the alt-text, and prevajalec and prevajalca would display as prevajáləc and prevajáłca. I have done this in a few entries already, but I'm hesitant to add it to more of them because I'm not sure if there would be consensus for it. I am worried that it might disrupt the visual image of the words too much and make them no longer recognisable as Slovene. On the other hand, it can be a valuable teaching aid, and I found at least one teaching book that does this consistently, which I found very useful myself. —CodeCa t 14:03, 8 January 2013 (UTC)[reply]

Isn't that what phonemic transcriptions are for in the pronunciation sections? --Wiki Tiki 89 15:54, 8 January 2013 (UTC)[reply]

Yes, but the same could be argued for all the macrons and accents we already include in other languages, couldn't it? —CodeCa t 16:11, 8 January 2013 (UTC)[reply]

Latin texts are frequently published using macrons. Serbian and Croatian dictionaries are published with accents. Unless a similar thing can be said of Slovene regarding "ł" and "ə", then they should not appear in the headword. --Wiki Tiki 89 16:36, 8 January 2013 (UTC)[reply]

By "alt-text" you mean the text displayed as the headword? Then I think I oppose inclusion of a headword different from the way the word's spelled in use. (In Latin, from what I understand, the macrons are used in uses of the word.) It'd be too confusing for readers: they'd think the word's actually spelled that way.—msh210℠ (talk) 16:27, 8 January 2013 (UTC)[reply]

Macrons are not normally used for Latin outside of dictionaries and other learning materials. Running texts do not use them. The same for all other old languages (although acute accents are standard in normalised Old Norse). And the same for accents in Serbo-Croatian, Russian etc. So should our current practice for Serbo-Croatian (which includes accents as alt-text in all mentions, including links and inflection tables) be changed as well? Also note that use of macrons and accents in the headword is standard in dictionaries of the respective languages, so since we're a dictionary we should do it too. —CodeCa t 16:35, 8 January 2013 (UTC)[reply]

Do Slovene dictionaries and learners' textbooks and the like use ł and ə? —An gr 18:14, 8 January 2013 (UTC)[reply]

I found at least one that does, and it states the following (in Swedish):

I vanlig slovensk stavning används bokstaven e även för ljudet [ə]. I läroböcker och ordböcker används i stället ofta det särskilda tecknet ə.

"In normal Slovenian writing the letter e is also used for the sound [ə]. In learning books and dictionaries the special sign ə is often used instead."

För att markera uttalet [w] används i vissa handböcker skrivtecknet ł.

"To mark the pronunciation [w] the sign ł is used certain handbooks."

I have no idea if their use is as common as this book claims, but the book itself consistently uses those two signs everywhere in all Slovene words. Two main Slovene online dictionaries don't use them, but they do denote accents. The alternative pronunciation of e and l is instead shown following the word when neccessary: [1] [2] —CodeCa t 18:28, 8 January 2013 (UTC)[reply]

As far as the Serbo-Croatian the use of macrons and accents in the headword is not that frequent in bilingual/bidirectional dictionaries, if existent at all. Only in monolingual/unidirectional dictionaries this is to be seen. So maybe we should abandon it altogether now that I think of it. As for Slovene, I don't find the use of ł and ə that helpful in learning it. As msh210 puts it it can contribute to readers confusion. I am sorry I didn't grasp this when I discussed the issue with Codecat in the first place. Also IIRC the only dictionary of Slovenian I encountered, it was a SL-SC dictionary and this was in the eighties, didn't have any ł and ə. I don't remember having even accents. Also, one online dictionary offered by PONS GmbH [3] doesn't use the ł and the ə. But I don't know what is their use in other dictionaries. --biblbroks_{дискашн} 18:52, 8 January 2013 (UTC)[reply]

Requesting an admin intervention for User:Zabadu

I wouldn't prefer blocking him, but a firm warning if he continues with his behavior would be more than helpful. Dijan already warned him on Talk:čuka and afterwards I repeated that he is on the verge of being blocked when discussing with him on Talk:ska probljem but he seems oblivious to any warning as he repeated his whole insulting comment in several edits on Talk:čuka. Hope someone finds a way to deal with him other than blocking. --biblbroks_{дискашн} 19:10, 8 January 2013 (UTC)[reply]

I failed to add one important thing: on Talk:frend a user of a Croatian language variant of Serbo-Croatian was agitated with his insulting comments while I am a what one would call a Serbian language affiliated native speaker of Serbo-Croatian. As is user Zabadu. I think this speaks more than anything about the amount of patience invested when conversing with him. --biblbroks_{дискашн} 19:28, 8 January 2013 (UTC)[reply]

Wow. SC hell goes on, but I would recommend against blocking him if his edits are good quality, as judged by native speakers. —Μετάknowledge^{discuss/deeds} 06:21, 9 January 2013 (UTC)[reply]

I felt the need to make this comment, input welcome. Mglovesfun (talk) 17:30, 9 January 2013 (UTC)[reply]

I blocked him for a week because he started calling people racists again. Now he's angry at me. I'm really not sure if I did the right thing here? —CodeCa t 17:24, 10 January 2013 (UTC)[reply]

Yes, you did. His subsequent rant has earned him a permanent block. SemperBlotto (talk) 17:27, 10 January 2013 (UTC)[reply]

I ask the administrators to consider allowing the user to edit his talk page. If he understands this act of goodwill, he might respond constructively to the last comment by Mglovesfun on his talk page. --biblbroks_{дискашн} 21:27, 10 January 2013 (UTC)[reply]

Personally, I believe that after the comments he has made, there is no turning back. I appreciate your goodwill, Biblbroks, but I just don't foresee anything past insults coming out of him from now on, and so I see no reason to give him a place to rant. —Μετάknowledge^{discuss/deeds} 04:08, 11 January 2013 (UTC)[reply]

I think I would've blocked for six months or a year. Mglovesfun (talk) 11:47, 11 January 2013 (UTC)[reply]

When the most conservative opinion is to block for six months, that says a lot! Mglovesfun (talk) 01:17, 13 January 2013 (UTC)[reply]

I started off with a week, though. Am I too nice? I kind of like to be nice... You better say I am nice. Otherwise I'll block all of you for over nine thousand years. *deranged laughing* —CodeCa t 01:39, 13 January 2013 (UTC)[reply]

I think the underlying problem was his not understanding and/or accepting that Wiktionary is descriptive rather than proscriptive. From his point of view, he was protecting the dictionary from contamination by people who didn't understand what the real Serbian language was, and everyone else was trying to fob off fraudulent entries for political reasons. Add to that the paranoia that comes from lacking people skills and no doubt having attracted abuse from countless people unintentionally ticked off by him over the years, and you have someone primed not to listen to anyone.

Even if we had gotten him to stop doing the bad stuff that precipitated this, we would have spent an inordinate amount of time walking on eggshells and talking him down from one crisis after another for a long time to come. I don't know if it would have been worth the effort. Chuck Entz (talk) 03:14, 13 January 2013 (UTC)[reply]

Be a Wikimedia fundraising "User Experience" volunteer!

Thank you to everyone who volunteered last year on the Wikimedia fundraising 'User Experience' project. We have talked to many different people in different countries and their feedback has helped us immensely in restructuring our pages. If you haven't heard of it yet, the 'User Experience' project has the goal of understanding the donation experience in different countries (outside the USA) and enhancing the localization of our donation pages.

I am (still) searching for volunteers to spend some time on a Skype chat with me, reviewing their own country's donation pages. It will be done on a 'usability' format (I will ask you to read the text and go through the donation flow) and will be asking your feedback in the meanwhile.

The only pre-requisite is for the volunteer to actually live in the country and to have access to at least one donation method that we offer for that country (mainly credit/debit card, but also real time banking like IDEAL, E-wallets, etc...) so we can do a live test and see if the donation goes through. **All volunteers will be reimbursed of the donations that eventually succeed (and they will be very low amounts, like 1-2 dollars)**

By helping us you are actually helping thousands of people to support our mission of free knowledge across the world. If you are interested (or know of anyone who could be) please email ppena@wikimedia.org. All countries needed (excepting USA)!!

Thanks!

Pats Pena
Global Fundraising Operations Manager, Wikimedia Foundation

Sent using Global message delivery, 20:49, 8 January 2013 (UTC)

Sgs vs bat-smg

I tried to fix a link to the Samogitian Wikipedia (bat-smg.wikipedia.org), but Template:bat-smg doesn't exist. Should it be redirected to Template:sgs? πr² (talk • changes) 06:48, 10 January 2013 (UTC)[reply]

For some reason this was missing from the special list of Wikimedia language codes. I fixed that, so you can use {{sgs}} as usual. -- Liliana • 07:23, 10 January 2013 (UTC)[reply]

WOTD update request

Could someone please update the definition in today's WOTD to the new first sense added to the entry for zeptomole? Thanks. Astral (talk) 17:38, 10 January 2013 (UTC)[reply]

Abbreviation/acronym as PoS is deprecated?

I remember seeing a discussion about this not so long ago so I'd like to confirm this. Are such headers deprecated in favour of a "proper" part of speech? —CodeCa t 14:05, 12 January 2013 (UTC)[reply]

I thought that was the prevailing view expressed. Would it need a dreaded VOTE to be mass-implemented? DCDuring TALK 14:29, 12 January 2013 (UTC)[reply]

No it'd need a miracle, it's an insane amount of work because they all need to be sorted by hand. I do them as I find them, I have no plans to make lists of them because I'd be at it for years. Mglovesfun (talk) 18:18, 12 January 2013 (UTC)[reply]

You could use whatlinkshere for the list of entries using the templates, but, yes, there is a lot of work, especially as many have more than one PoS. Or, one could make the templates categorize into language-specific clean-up categories. That would enable some of our more diligent contributors to clean up their language's use of the offending templates.

And that would still leave headers that didn't use the templates. DCDuring TALK 19:27, 12 January 2013 (UTC)[reply]

If anyone's curious, I've generated a list at Wiktionary:Todo/Abbreviations, acronyms, and initialisms of pages containing any of these L3 headings: Abbreviation, Acronym, Initialism, {{abbreviation}}/{{abbreviation|lang=[a-z-]+}}, {{acronym}}/{{acronym|lang=[a-z-]+}}, {{initialism}}/{{initialism|lang=[a-z-]+}}. It's a long list, and a lot of them are not so easy to fix, because abbreviations often don't have obvious POSes. But hey, feel free to work on them. —Ruakh_TALK 20:47, 12 January 2013 (UTC)[reply]

I noticed Wiktionary:Initialisms thanks to the list... we should probably update it so people don't create any new malformatted entries, eh? - -sche (discuss) 22:55, 12 January 2013 (UTC)[reply]

Ruakh, thank you for the list. I have two suggestions for it... could it be split up by language (since converting them requires language knowledge) and could the lists be numbered so that it's easy to see how many items there are? —CodeCa t 22:57, 12 January 2013 (UTC)[reply]

So let's say that I want to help out or at least not cause more problems to be created. If an abbreviation refers to a word that can be an adverb, and adjective, and a preposition, do we create a header for each one. The one in question is ab. for about. Simple ones antibody make sense. Also shouldn't notes be made on the help pages and also shouldn't the headers be removed as options for when you create a new page? Thanks Speednat (talk) 17:54, 29 January 2013 (UTC)[reply]

Yes, the implication is that each PoS header which would be needed for attestable usage should be added. In the case of about I'd be skeptical concerning the adverb and especially the adjective. I don't think I've ever seen about abbreviated at all. DCDuring TALK 18:31, 29 January 2013 (UTC)[reply]

Yes, we should generally create a header for each one. The fact that it's an abbreviation, an acronym, etc. belongs to the etymology. Also don't forget that words such as OK (or RFD here) are used as verbs as well as nouns... It's necessary to state that they can be used as verbs, etc. Lmaltier (talk) 18:35, 29 January 2013 (UTC)[reply]

I see the abbreviation of template, but how about for symbols? Speednat (talk) 04:31, 7 February 2013 (UTC)[reply]

I started on some of these, as I have run across them, and I notice that the end product has a lot of red links as the unlike the first they don't have an option to link over to wikipedia, at least the way I am doing it they don't. I am using the noun/adj... header and then the abbreviation of or initialism of template. Can I still link over to wikipedia to avoid these red links Speednat (talk) 22:25, 1 March 2013 (UTC)[reply]

Something analogous to {{taxlink}}? I've used them inside other templates, but it does not always work. DCDuring TALK 22:31, 1 March 2013 (UTC)[reply]

How do I change user names?

Sorry, but I signed up with a user name slightly different than my Wikipedia user name so when I go back to Wikipedia I am automatically logged in there with the wrong name. I want to change my user name here to match my user name on Wikipedia. Should I create a new account here and delete the old account? How do I delete an account? Thanks. JimDerby (talk) 19:26, 12 January 2013 (UTC)[reply]

See here. — Ungoliant ^(Falai) 19:32, 12 January 2013 (UTC)[reply]

Thanks. I did look for a way but did not find it on my own. --JimDerby (talk) 02:25, 13 January 2013 (UTC)[reply]

By the way: this probably would have been better taken to the Information Desk rather than here(not that it matters much, since it would have been resolved quickly by your reply wherever it was moved). Chuck Entz (talk) 03:20, 13 January 2013 (UTC)[reply]

Category:en-verb with to

Am I ok to empty this category (as much as I can) by bot? It doesn't seem uncontroversial enough for me to just do it. For the record, having opposed the removal of 'to' I've now changed my mind and I think we shouldn't use 'to' in the head word (but it's ok in definition or non-English words, such as 'to go', 'to be'). Mglovesfun (talk) 13:52, 13 January 2013 (UTC)[reply]

I agree. Now that it has come up, should we do similar things for verbs in other languages that commonly place a particle before the infinitive, such as Norwegian or Romanian? —CodeCa t 14:02, 13 January 2013 (UTC)[reply]

I agree for English. But there cannot be a general rule. It all depends on what the language considers as the lemma form. Lmaltier (talk) 14:11, 13 January 2013 (UTC)[reply]

I agree. Even if we decide to display to before English verbs, a better solution would be having {{en-verb}} display it automatically. — Ungoliant ^(Falai) 16:16, 13 January 2013 (UTC)[reply]

I agree. (I created this category, actually, after Brett (talk • contribs) modified {{en-verb}} not to add to for regular verbs (without affecting verbs that had the explicit inf=). If we ever decide to change back, it would make sense to just add to regardless of inf=, so that it will be easier to be consistent about it.) —Ruakh_TALK 18:21, 13 January 2013 (UTC)[reply]

I agree for English, plead ignorance and apathy for other languages. I've been removing them as I go because I thought that was the consensus. DCDuring TALK 16:57, 13 January 2013 (UTC)[reply]

I agree. --Dan Polansky (talk) 18:23, 14 January 2013 (UTC)[reply]

I just looked, and there were fourteen entries in the category (plus one false positive), so I emptied it (semi)manually.—msh210℠ (talk) 20:41, 14 January 2013 (UTC)[reply]

I see where I went wrong with MglovesfunBot now, I forgot to click 'skip if only minor replacement', so in those 14 entries, it made minor edits, but did not contain {{en-verb|head=to which is what the bot was looking for. Mglovesfun (talk) 20:46, 14 January 2013 (UTC)[reply]

Asturbot to be donated

If anyone with a bot fancies running Asturbot and generating a few thousand decent Asturian entries, it's available for them. The code is at User:Asturbot/code and the model .txt file at User:Asturbot/ast.txt. --Donkeyhorse (talk) 21:52, 13 January 2013 (UTC)[reply]

Accelerated entry creation from translations using User:Ruakh/Tbot.js

After mine and Metaknowledge's request, User:Ruakh has created a nice tool User:Ruakh/Tbot.js that creates entries from translations:

This is how I have created воздержание (vozderžanije) from abstinence#Translations. I clicked on the green link (after an initial setup):

=={{subst:ru}}==

===Noun===
{{ru-noun|tr=vozderžánije|g=n}}

# [[abstinence]] {{gloss|the act or practice of abstaining}}

The tool knows the language, the part of speech, the gender and transliteration.

I have added the declension manually. I have created a few Russian entries using this accelerated method. The tool works fine for Russian and some other languages with similar templates. Ruakh said he is open for discussion for other languages. Perhaps, it may not be very useful for Mandarin and Japanese because of the complex templates, though. --Anatoli ^{(обсудить}/^вклад) 04:13, 14 January 2013 (UTC)[reply]

I'm using it with Japanese now, and it is helpful although there is a lot of tweaking required to adapt the output to the elaborate system of templates. It took me a little while to figure out how to make the links green, and I couldn't make it work until I did that. By the way does anyone think it is bad style to leave the gloss snippet intact? I think it looks good and I find that it clarifies the gloss in ways that I didn't even realize were necessary had I not seen the other senses on the English entry. Maybe depends on the word. --Haplology (talk) 16:03, 14 January 2013 (UTC)[reply]

new-user-page tag

I've created this tag with the abuse filter. It tags an edit whenever a user with less than five edits makes an edit to their user page. So far it has been very effective in tracking down the typical "put promotional material on user page" kind of edit. So I hope it's useful. —CodeCa t 04:41, 14 January 2013 (UTC)[reply]

Using numbers to indicate tone in IPA pronunciation transcriptions

Is it something we want to do? See [4] and Talk:五. (I wondered whether to post this here or on WT Talk:AZH. Because such numbers are not, AFAICT, official IPA symbols, and because they could be used in other languages that exhibit tone sandhi, not just Chinese, I decided that this was the more appropriate place.) - -sche (discuss) 07:08, 14 January 2013 (UTC)[reply]

I was brought up on old country, homemade IPA. But I wouldn't mind this, if it weren't for the inconsistency. —Μετάknowledge^{discuss/deeds} 07:25, 14 January 2013 (UTC)[reply]

The main problem is that different languages use different numbering systems for tones, so this would work against the relative universality of the IPA. For instance, Mandarin has 4 tones (5 if you count toneless as a tone), while Cantonese varies (depending how you describe them) from 6 to 9. Both have their own numbering systems, which mostly don't match, especially since there's not that much overlap- and that's just two out of dozens of major tone languages. In other words, the number means nothing unless you know the tone structure of the language and how the numbers are allocated among the tones. Chuck Entz (talk) 08:54, 14 January 2013 (UTC)[reply]

You don't understand. This is not using 1 for Mandarin's first tone, it's using 55 as a translingual, easily convertible form of the IPA ˥˥. —Μετάknowledge^{discuss/deeds} 09:15, 14 January 2013 (UTC)[reply]

Indeed. A testament to my keen powers of observation and superior facility for overlooking the obvious. D'oh! :( Chuck Entz (talk) 13:30, 14 January 2013 (UTC)[reply]

Sorry for my rather rude way of broaching that above. —Μετάknowledge^{discuss/deeds} 15:59, 14 January 2013 (UTC)[reply]

Well, I don't know much about tones, but what are the advantages of "5" over "˥"? --Wiki Tiki 89 15:12, 14 January 2013 (UTC)[reply]

The idea is for the number to indicate the height of the pitch: if I remember correctly, the higher the number, the higher the pitch. The advantage is that it quantifies things, so you can compare a 15 to a 25 and easily see that the 15 starts from a lower pitch than the 25, but ends up on the same pitch. Graphical symbols can be harder to tell apart at a glance when they deal with lots of minor differences. Of course, it also means that you have to assign a number, even if you don't know the exact pitch, so it might give a false sense of more precision than there really is. Chuck Entz (talk) 15:43, 14 January 2013 (UTC)[reply]

You essentially have to assign a number anyway, and for languages with good literature, that's easy. Lao, for example, would be a bit more difficult, due to broad dialectal variation and scholarly non-consensus. —Μετάknowledge^{discuss/deeds} 15:59, 14 January 2013 (UTC)[reply]

The pitch numbers are convenient and no more abstract than the pitch letters, so they should be applicable in theory to all languages with pitch tones, but in practice they're used almost only for Chinese languages. I've never seen them used for pitch tone languages outside Asia, such as those in Africa and North America. Users familiar with languages like Xhosa or Navajo would probably be quite astonished to see them there. —An gr 16:37, 14 January 2013 (UTC)[reply]

Swedish and Norwegian pitch accents are sometimes numbered 1 and 2. —CodeCa t 17:27, 14 January 2013 (UTC)[reply]

But that's like numbering Mandarin's pitches 1–4. This is about something different: using 1, 2, 3, 4, 5 as synonyms for ˩ ˨ ˧ ˦ ˥ respectively. —An gr 17:31, 14 January 2013 (UTC)[reply]

I oppose since apparently the numbers are used completely differently for different languages. (In some languages, 5 is the highest tone, in others, 1 is.) As such, only the tone bars should be used. -- Liliana • 17:32, 14 January 2013 (UTC)[reply]

I think you're misunderstanding the proposal. IPA apparently includes tone numbers that are language-independent, and which don't correspond to tones in any given language. See Angr's comment immediately above. —CodeCa t 17:39, 14 January 2013 (UTC)[reply]

Well, strictly speaking, IPA doesn't include them. But they are widely used in Chinese linguistics as a typographically friendlier equivalent to the IPA tone letters. But according to w:Tone letter#Numerical values Liliana is right, because numbers are also used to stand for tones in Mesoamerican linguistics, but there 1 = ˥, whereas in Chinese linguistics 1 = ˩. There's also a potential for confusion because a contour tone like ˦˥ (i.e. 45) can look an awful lot like a numeral 1 in a sans-serif font. —An gr 17:57, 14 January 2013 (UTC)[reply]

In that case I don't think I support this. The purpose of IPA is to have a single system that everyone understands. If we start to include our own extensions to it, it loses some of its purpose. —CodeCa t 21:29, 14 January 2013 (UTC)[reply]

Yeah, I don't think this is a good or useful idea. The tone symbols in IPA are very easy to understand: the higher the bar, the higher the tone. Numbers would only be easier for those who are more used to numbers, but then you can also say that about any IPA symbol, not only tones. --Wiki Tiki 89 21:50, 14 January 2013 (UTC)[reply]

As long as this notation is used only with East Asian languages, this alternative notation should be fine. The tone letter symbols are not as easily discernable as the numbered values, especially when combined and the font is missing: cf. ˨˧˨ vs ²³², ˨˦˨ vs ²⁴², ˦˨˧ vs ⁴²³. They also offer an easy way to indicate tone sandhis: eg. 五 /u²¹⁴/; when in compounds 五官 /u²¹⁴⁻²¹¹ ku̯an⁵⁵/, 五感 /u²¹⁴⁻³⁵ kan²¹⁴⁻²¹⁽⁴⁾/, 十五 /ʂʐ̩³⁵ u²¹⁴⁻²¹⁽⁴⁾/. 129.78.32.21 03:43, 15 January 2013 (UTC)[reply]

I don't see why the same sandhis can't be indicated by the IPA symbols. --Wiki Tiki 89 04:04, 15 January 2013 (UTC)[reply]

As 五 /u˨˩˦/; when in compounds 五官 /u˨˩˦-˨˩˩ ku̯an˥˥/, 五感 /u˨˩˦-˧˥ kan˨˩˦-˨˩(˦)/, 十五 /ʂʐ̩˧˥ u˨˩˦-˨˩(˦)/,

老保管好好保管好好老鼠 ("(speaking to the old keeper) keep the good mice well")

/lɑʊ̯²¹⁴⁻²¹¹pɑʊ̯²¹⁴⁻³⁵ku̯an²¹⁴⁻²¹⁽⁴⁾ xɑʊ̯²¹⁴xaʊɻ²¹⁴⁻⁴ pɑʊ̯²¹⁴⁻³⁵ku̯an²¹⁴⁻³⁵xɑʊ̯²¹⁴⁻²¹⁽⁴⁾ xɑʊ̯²¹⁴⁻²¹¹lɑʊ̯²¹⁴⁻³⁵ʂu²¹⁴⁻²¹⁽⁴⁾/

or /lɑʊ̯˨˩˦-˨˩˩pɑʊ̯˨˩˦-˧˥ku̯an˨˩˦-˨˩(˦) xɑʊ̯˨˩˦xaʊɻ˨˩˦-˦ pɑʊ̯˨˩˦-˧˥ku̯an˨˩˦-˧˥xɑʊ̯˨˩˦-˨˩(˦) xɑʊ̯˨˩˦-˨˩˩lɑʊ̯˨˩˦-˧˥ʂu˨˩˦-˨˩(˦)/? 129.78.32.21 04:30, 15 January 2013 (UTC)[reply]

I find the graphical symbols easier to read because they seem more intuitive. You can visually see the tone contour, which is far less obvious with numbers. —CodeCa t 04:32, 15 January 2013 (UTC)[reply]

I agree with CodeCat. Also I don't think hyphens belong in IPA at all so it doesn't matter how they look with the tone bars. --Wiki Tiki 89 04:35, 15 January 2013 (UTC)[reply]

How can the tone letters in the example be the easier ones to read? I found it a bit hard to imagine as my head hurts even when typing the different levels, as they are so structurally alike. But fair enough. 129.78.32.21 04:45, 15 January 2013 (UTC)[reply]

Shortcuts to policy pages' talk pages

I've found myself wanting to link to policy pages' talk pages before, particularly the talk pages of "WT:About Language" pages. Does anyone else think it would be useful to have shortcuts to such pages? They could be of the form WT:T:foo, or we could ask the devs to make us a short synonym of "Wiktionary talk:" (WTT:?), the way Wikipedia has WT:MOS. - -sche (discuss) 07:17, 14 January 2013 (UTC)[reply]

Easy solution: just make the page Wiktionary:T:foo and have it redirect to whatever you want. Redirects by no means must be in the same namespace as what they redirect to. —Μετάknowledge^{discuss/deeds} 16:00, 14 January 2013 (UTC)[reply]

I've thought of creating [[WT:FOO talk]] as a shortcut to the talkpage subsidiary to the target of the shortcut page [[WT:FOO]] (for various values of FOO). This is memorable (for me): "CFI talk" is talk about the CFI etc. Of course, having one shortcut to the CFI's talkpage doesn't preclude having another.—msh210℠ (talk) 21:10, 14 January 2013 (UTC)[reply]

I've created a few redirects of the form WT:T:ADE. I like your idea (WT:ADE talk, right?), too. Let's have both kinds. :) - -sche (discuss) 22:05, 14 January 2013 (UTC)[reply]

Attestation, idiomaticity, predictable word forms and combinations

The discussion at RFD#fasque has made me wonder about a few things. It is common practice to treat inflected forms of words as part of the same lemma as their base form, and so they count as a single "unit" for attestation purposes. That means that 3 attestations of any form of the lemma count as attestations for the lemma itself. This practice is normally extended in the other direction as well: once a lemma is attested, all of its inflected forms are also considered includable. This practice is what allows us to add inflection tables to entries and to run bots to create all inflected forms of words. It makes sense, because inflected forms are considered predictable and readily productive. It would not make much sense to include, say, the past singular of openen but omit the past plural. Certain words and forms are, therefore, considered "available" to be produced at any time, and so they technically exist in the language even if unattested by sheer accident.

It's not always easy to judge when this kind of reasoning applies, though. The debate about fasque shows that it's not always clear whether something forms part of the repertoire of forms that are readily available for use by any speaker, or whether they are separate lemmas, or a sum of two or more lemmas. That is, is fasque a separate lemma with its own part of speech (I wouldn't know which!), is it simply a "form" of fas, or is it a phrase (in which case it is not idiomatic)? The current practice on Wiktionary so far has been to treat words as idiomatic by default, which is based on the mission statement "all words in all languages". But this has been questioned many times, mainly because it is not always clear what is a word and when a certain combination of lexical units is idiomatic or not. I think for simplicity we can distinguish a few kinds of combination:

Base + inflection - These are currently considered part of the base lemma and so are not lemmas in their own right. Examples: opened, civitatem, grandes, yöllä
Base + derivation - This type involves combinations where only one part is a proper lemma, and another part is not an independent unit but is rather prefixed or suffixed to create shades of meaning or to convert one part of speech to another. Examples: unopened, faithless, tenedor, verbranden.
Base + base - Also called "compounds", often a subtype of the above. Many languages (in particular Indo-European ones) create compounds by converting one of the bases into a special derivational form which has the specific purpose of being attached to another lemma. Examples: weekend, луноход, staatsgeheim, tietokone.
Phrasal part of speech - This differs from a compound in that it is connected not at the morphological level but at the syntactic level. In languages like English, each part still functions as a distinct word with its own recognisable part of speech, and the order between parts may change. The combination also has its own part of speech which is determined by the "head". On the other hand, in languages like Finnish or Zulu there can also be prefixation or suffixation involved, and the parts can't stand by themselves. Examples: give up, aufmachen, at home, my friend, taloni (talo + -ni).
Phrase - A true phrase has no real part of speech, and typically includes a finite verb. Examples: my name is..., I am hungry. I think fasque may also belong to this category, as it has no part of speech of its own. It is technically only a "half phrase" as it can't stand on its own syntactically, but is not a single word either, just like its translation and religion.

It's not always obvious when something belongs to one type or the other. Sometimes grammars treat a certain kind of word formation as inflection where others treat it as derivation. The formation of comparative and superlative is a good example of that. We also treat the types differently on Wiktionary depending on whether they are orthographical units; whatever we consider a "word" is generally considered idiomatic by default. That in turn has led to WT:COALMINE as a "fix", since coalmine and coal mine are really both lexically identical and both belong to type 3. I think the problem really lies in that we look too much at what a "word" is and not enough at the real underlying structure. Analytical languages like Chinese often treat all five of the above types as separate words, and so our idiomaticity requirement is extended to all five as well. On the other hand, there are also languages (polysynthetic languages) where all five types can be incorporated into single words. Therefore our mission statement "all words in all languages" is seriously skewed and probably doesn't reflect what we really want.

Current policy, and the statement "all words in all languages" does not help to clarify this at all, and the continuing lack of consensus means that these same discussions crop up over and over again. So I'd like to call on other editors to come to some kind of agreement on just what a word is and idiomaticity is, in particular a single definition that is not biased towards one kind of language or another. I think that this will mean that either we clarify what a word is, which will probably mean that a "word" can have punctuation, such as coal mine, or something without punctuation is more than one "word", like fasque. Or it will mean that we treat words as purely orthographical and we adopt some other terminology for describing the "units" that we deal with. —CodeCa t 20:02, 15 January 2013 (UTC)[reply]

I don’t think the problem is in our mission statement, I think it’s in the interpretation that anything without spaces or hyphens is a word. It’s a reasonable interpretation for English lexicography, but falls short for other languages like Finnish, Portuguese and Latin. Even in English it fails sometimes. — Ungoliant ^(Falai) 23:58, 15 January 2013 (UTC)[reply]

That's why I said either our idea of what is a word is wrong, or we shouldn't use "word" in our mission statement. The five types I listed above show that it's not easy to define "wordness", because when multiple morphemes are involved there is a gradual scale. —CodeCa t 00:12, 16 January 2013 (UTC)[reply]

I agree that the word "word" doesn't really belong in our mission statement if you take it literally. Many things that are not words do belong here and many things that are words don't. On the other hand, the mission statement does not have to be taken literally, since, after all, it is explained in more detail. Maybe we should mention in that explanation that the word "word" (which is the only thing that we can really put there and maintain euphony) should not be taken literally and give an explanation of what we mean by it. --Wiki Tiki 89 00:37, 16 January 2013 (UTC)[reply]

Not a single on of the five terms in our slogan can be read without qualification or extension. The slogan is about as helpful to guiding a dictionary as "Liberte, Egalite, Fraternite" is to running a government. DCDuring TALK 02:59, 16 January 2013 (UTC)[reply]

Then maybe we should qualify it more explicitly, because from time to time editors try to use it as an argument against deleting something... —CodeCa t 03:15, 16 January 2013 (UTC)[reply]

Please, no. —Μετάknowledge^{discuss/deeds} 03:31, 16 January 2013 (UTC)[reply]

I've stated before and still think that, in foreign languages (like Latin, Finnish, Russian, and Hebrew) (and probably English too) in which words are generally separated by a space, we should include any contiguous string of letters attested in works that also generally separate words by spaces (so not, for example, including the entirety a work that is written wholly without spaces as a lark), because that is what anglophones unfamiliar with the term will look up, since they won't, in all likelihood, know where to split it. Thus we should include (if attested) fasque, aufmachen, and ולחשך. This is independent of the classification above. That said, I think most people here disagree with me (though I don't know whether they disagree with the proposal to include what anglophones are likely to look up, or with my conception of what anglophones are likely to look up). Discounting my own view as just stated, the classification above seems to be a useful start. People might look up name when they come across misname. (If we don't include misname on those grounds, though, then we'd have to include sive, since people are likely to look it up when they see missive.)—msh210℠ (talk) 18:15, 16 January 2013 (UTC)[reply]

I tend to agree with Msh210 above, for the following reason:

The question of what is or isn't a word cannot really be decided here; linguists are baffled by it (as they are by the question "what is a language?") and very well they should be, because "word" is really a category with a prototype but with countless deviations from this prototype that different languages may classify differently (what is a word in English may be three or four in Chinese; see also polysynthetic languages like Eskimo). So whatever solution Wiktionary gives to this problem has to be pragmatic, not scientific (because the science of it is that there is no exact solution: "word" is not a precisely definable technical term). And the only pragmatic solution that I think fits the bill is to consider anything written between spaces in a given language as a word. Yes, that includes irksome cases like fasque, or like Romance verbs with pronominal clitics like Italian dammelo, Spanish pregúntaselo, Portuguese dar-se-ia, Catalan despenja'l, French parle-m'en; and yes, that would exclude the perfectly parallel cases in which the clitics are written as separate words (mi hai datto, se lo pregunté, eu te disse, em dic, tu le dis, etc). But the point is pragmatic: what is it that a casual user of Wiktionary is likely to search for as a single word? Answer: basically anything written between spaces, especially if said casual user doesn't know much about word morphology in the language of the word s/he is looking up.

In principle, here is my view: because languages, words, phrases, sentences, etc. are continua without sharply marked boundaries, any solution to the question "what is a word at Wiktionary" will necessarily include irksome cases. We will always' have "words" that we wish we could get rid of but which fit the criteria we have, so they have a right to be here. The best we can hope for is not to have too many of them, and also not to make things too complicated for the casual user. --Pereru (talk) 18:03, 18 January 2013 (UTC)[reply]

The trouble with that pragmatic solution is that it only works for languages whose written form uses spaces. It still doesn't offer us a solution for Chinese and Japanese, or Burmese (where online material tends to use spaces, but rather stingily, and printed books don't), and for that matter is hardly helpful for Vietnamese, which uses too many spaces: all sorts of things that intuitively "feel like" single words are written with spaces in Vietnamese, and I don't want us to have to decide for every single one of them whether it's SOP or not. —An gr 19:13, 18 January 2013 (UTC)[reply]

These issues can be solved easily by explicitly using one of the senses of word, the purely linguistic one, i.e. an element of the vocabulary of the language. After writing this sentence, I Googled "word" + "element of the vocabulary" and found these pages http://www.pulib.sk/elpub2/FF/Ferencik2/pdf_doc/29.pdf and http://dictionary.reference.com/help/faq/language/t02.html. I think that it makes sense, and would make RFD discussions much easier. It would be necessary to add that all forms of words are accepted (probably with some exceptions, especially most verbal phrase forms) + affixes + characters + ... Lmaltier (talk) 21:45, 22 January 2013 (UTC)[reply]

Doesn't that really shift the problem rather than solve it? What is an element of the vocabulary? —CodeCa t 21:50, 22 January 2013 (UTC)[reply]

It's something you might have to learn in vocabulary lessons when you learn the language (this also applies to dead languages). It can also be called lexeme, or lexical unit. Yes, this helps: is fasque an element of the Latin vocabulary, something that could be learned during a vocabulary lesson? No. Is red fox an element of the English vocabulary, despite its space? Yes. Always keeping this principle in mind does not solve everything, but it helps very much. Lmaltier (talk) 22:03, 22 January 2013 (UTC)[reply]

I highly doubt anyone learns "red fox" as a vocabulary word when learning English. --Wiki Tiki 89 22:18, 22 January 2013 (UTC)[reply]

Of course. But it's possible, especially if you learn zoology and its vocabulary at the university. Even specialists may learn new words all along their life (especially new words). I don't propose to include only common words, not at all. Even famous hapaxes may be included. But you understand what I mean. Lmaltier (talk) 22:29, 22 January 2013 (UTC)[reply]

In other words, lexical items are stored by the brain to be available at will, they are not necessarily reconstructed at each use (although they may be in some cases, e.g. most -like adjectives). Atlantic salmon is stored in the brain (as the common species name), while red bicycle or fasque are not stored, only red, bicycle, fas and -que are stored. When there is a systematic reconstruction by the brain from its parts, it should be included only when it's very clearly considered as a word by the language nonetheless (e.g. compound nouns in German). Lmaltier (talk) 21:38, 23 January 2013 (UTC)[reply]

I think you're confusing psychology with linguistics. --Wiki Tiki 89 21:40, 23 January 2013 (UTC)[reply]

No, he's discussing psycholinguistics, which is fair enough, only we as a volunteer-run dictionary with no budget for such things are not in a position to be conducting pyscholinguistic experiments on native speakers of all the world's languages in order to decide what is and isn't stored as a lexical item in their brains. (And for extinct languages it wouldn't be possible anyway.) —An gr 21:45, 23 January 2013 (UTC)[reply]

I wasn't aware that it was psycholinguistics. And I think that, in most cases, common sense can decide. What I explain is directly related to the definition of set phrase. Lmaltier (talk) 07:00, 24 January 2013 (UTC)[reply]

Unblock request

GhalyBot was blocked on 6 January, it was doing interwiki links starting on that day. Originally I was not aware of the policy that prevents global bots from working on enwiktionary. I have now changed the setting and it will not be working on enwiktionary. Would you mind unblocking it, please. I am happy to apply for the required permission and do any test edits required , but I can't do any test edits with the bot being blocked. --Ghaly (talk) 15:38, 18 January 2013 (UTC)[reply]

Done. Good luck. — Ungoliant ^(Falai) 15:56, 18 January 2013 (UTC)[reply]

Thank you very much.--Ghaly (talk) 16:36, 18 January 2013 (UTC)[reply]

List templates, `{{list helper}}`, improvements, format

I've recently nominated {{list:days of the week/lv}}, {{list:seasons/lv}} and {{list:basic colors/lv}} for deletion, because I tend to agree with those who, like Liliana, criticize list templates for being too onerous resource-wise, and also because I personally find the format chosen for those templates rather unpleasant. I've therefore reverted changes that inserted these templates into Latvian pages. But now CodeCat tells me that {{list helper}} is being improved so as to become leaner and meaner, something I would support, since list templates are in principle a good idea for standardizing list-like materials like points of the compass, days of the week, basic colors and numbers, etc. I wondered where this is taking place (does anybody know of any ongoing discussions about it?), and I also wondered whether it would be a good idea to discuss the format that a revamped, spiffied-up list template should have. Here are my personal €.02:

The current format, with its less-than-one-line spacing and superscripts, is irksome and rather non-standard, since it looks like nothing else used here at Wiktionary. I suggest that something like a table format be used, as is already the case for translations and for long lists of derived or related terms (with e.g. {{top4}}, {{mid4}} and {{bottom}}). Also, when there is a logical ordering, this ordering should be followed (spring-summer-fall-winter) rather than alphabetic ordering. Also, in some cases, something like {{picdic}} is probably better than a list (say, points of the compass, perhaps also seasons of the year), though one might argue that it's possible to have both {{picdic}} at the beginning and a list at the end of the entry (and a category including only those terms!) since redundancy is not necessarily bad here at Wiktionary. --Pereru (talk) 18:21, 18 January 2013 (UTC)[reply]

The thing is, any list that is so long it needs a table should be a category instead. But I agree that the superscript looks weird; perhaps a non-superscript “(see more)” text following the items and linking to the category should be used instead. — Ungoliant ^(Falai) 02:08, 19 January 2013 (UTC)[reply]

I don't understand the objection to the template's appearance. In what way does it not look like anything else on Wiktionary? Alternative forms, derived terms, related terms, synonyms and other semantic relations are all usually listed * foo, bar, and only put in tables if exceptionally numerous (as in [[mundungus]]). I agree that the superscript formatting of the last bit can be removed; it was, AFAICT, simply carried over from the old {{list}}, not innovated by CodeCat. Thank you, CodeCat, for redesigning the template(s) to be less slow and greedy with resources! - -sche (discuss) 03:24, 20 January 2013 (UTC)[reply]

When we list related or derived terms, we usually get one per asterisk, i.e. * foo /carriage return * bar. There are sometimes hierarchies, with the second term getting two asterisks for indentation. Except for the list templates (and lines introduced by {{sense}} under ====Synonyms====, ====Hyponyms====, etc., as in Spain), I don't really see something like * foo, bar, ber, bir..... In some cases -- see {{list:countries of Europe/en}} --- there are lots of terms; wouldn't a table be better? Aren't these cases be better served by columns, with {{top3}}, {{mid3}}, {{bottom}}, or even with a real table, like {{der-top4}}? --Pereru (talk) 22:43, 22 January 2013 (UTC)[reply]

A table would be better, but I maintain that lists with too many items should be categories instead. — Ungoliant ^(Falai) 22:50, 22 January 2013 (UTC)[reply]

By the way, I've just had a look at the new list:' templates, and they do look a lot better structure-wise. I didn't know CodeCat had redesigned them -- that is indeed a giant leap forward. Aside from the formatting, I have no objections to the use of list templates anymore.--Pereru (talk) 22:43, 22 January 2013 (UTC)[reply]

Hm... I can see your point, Ungoliant. But then again, we do have tables with very long lists in many entries -- derived terms, related terms, synonyms, etc.; why not have a long list of "see also" items, too? Then you don't have to leave the entry to see all the terms, as you would have to if you wanted to check the category. (Or else, why not make long "derived terms" and "related terms" lists into categories, too? Something like Category:Terms derived from "fire" under ====Derived terms==== in fire, instead of the table that is currently there?)

Now, just to see what I'm suggesting, compare the Slovak list of countries of Europe at {{list:countries of Europe/sk}} -- the original list template format, kept by CodeCat -- with two versions of it I've just sketched at User:Pereru/list:sk:countries of Europe and User:Pereru/list:sk:countries of Europe-2. It seems to me these are better (and more similar to other such tables for related and derived terms, used all over Wiktionary) than the original format.--Pereru (talk) 23:55, 22 January 2013 (UTC)[reply]

Well, for excessively long synonyms, antonyms, ..., coordinate terms we have Wikisaurus; for derived terms there is no need for such a system because each different set of derived terms will only be used in one entry (and I’ve seen categories for derived terms); for related terms, we should create a system similar to lists.

The drop-down table looks pretty good though. It would certainly be an improvement. — Ungoliant ^(Falai) 22:03, 23 January 2013 (UTC)[reply]

Perhaps the best solution is to create two or three list helper templates, one with the original list format, and the other one(s) with table formats, so that each individual list:XXX template can have the format that is deemed more appropriate by its user? (I would be in favor, by the way, of creating a consistent system for related terms, and even for derived terms -- depending on how inclusive you want to be, you could mention terms like (deprecated template usage) unreadability under (deprecated template usage) unreadable, (deprecated template usage) readable, (deprecated template usage) read... In fact all terms derived from (deprecated template usage) readable could in principle also be listed under (deprecated template usage) read (unless we have a "level priority" system so that only those terms directly derived from the base word with only one affix are listed under ===Derived terms=== or something like that...). --Pereru (talk) 16:19, 24 January 2013 (UTC)[reply]

On derived term categories see #Subcategories_for_compounds_by_component below and previous discussions linked from there. DCDuring TALK 18:17, 24 January 2013 (UTC)[reply]

Pronunciations

I've been around for a while, but this is something I've never been sure about. In the pronunciation section, is it acceptable to put the IPA pronunciations for regional accents? Because I see a lot of words that have pronunciations tagged with, e.g., UK, but there is such a variety of accents within the UK that it's pretty much meaningless. So, where appropriate, is it alright to have a few different pronunciations marked clearly which part of the country they correspond to?

This leads onto another question about rhymes: are rhymes pages such as Rhymes:English:-ʌri based on Received Pronunciation? Or a different accent? Because what rhymes to one person might not to another, and vice versa. For example, in Lancashire where I'm from, most of the words on that page actually end in /ʊrɪ/ rather than /ʌri/. BigDom (t • c) 23:51, 18 January 2013 (UTC)[reply]

Are you from Bury? I notice that this rhyme was removed from that page, and it belongs there in that town. Dbfirs 15:37, 26 January 2013 (UTC)[reply]

It’s acceptable (and, personally, I very much encourage it). The problem is that people use {{a|UK}} (and I’m guilty of this too) when they mean {{a|RP}}. As for rhymes, usually a note is added to the rhymes page (see Rhymes:English:-ɔːɹə for an example). — Ungoliant ^(Falai) 00:00, 19 January 2013 (UTC)[reply]

As explained at Rhymes:English, it's based on the pronunciation familiar to the person who started it, who is from the UK. That's our current de facto compromise on UK-vs-US variation: since there are good arguments for either way, we just arbitrarily stick with whichever was used first in a given place. A dialectologist could quibble a bit about the exact relationship of generalized UK (really English) pronunciation to true RP, but "RP" is close enough for our purposes. As for regional rhymes, it wouldn't hurt to include them, with a parenthetical note on where it's used- as long as it doesn't overwhelm the entries. Chuck Entz (talk) 00:24, 19 January 2013 (UTC)[reply]

As for regional rhymes, fuck handles this quite well by saying the /ʊk/ rhyme is in some parts of the North of England (northwest for my money, Yorkshire pronunciation is /fʌk/). Mglovesfun (talk) 10:46, 19 January 2013 (UTC)[reply]

It also rhymes in some parts of Ireland, including Dublin. —CodeCa t 14:07, 19 January 2013 (UTC)[reply]

The UK pronunciation is usually a version that would not identify the speaker as being from one particular region. I prefer to call it "modern BBC English" rather than "RP" which was used by most BBC announcers 50 years ago. The rhyming problem can be solved by adding more of the rhyme notes mentioned by Ungoliant. Regional accents usually have fairly consistent vowel changes. Perhaps we could link the general pronunciations to a page showing regional variants for particular vowels? Dbfirs 15:28, 26 January 2013 (UTC)[reply]

"A previously men's game"

I was talking to someone about the existence of a separate Women's Grandmaster title in chess (which is a strange gender split in a purely non-physical game, to me), and suggested it might be a relic "from when women entered a previously men's game". This immediately set off some kind of grammar alarm. "A previously male game" would be fine. "What was previously a men's game" would be fine. Is "a previously men's game" grammatically sound or not, and why? Equinox ◑ 01:58, 19 January 2013 (UTC)[reply]

"a previously men's game" seems off, if not outright wrong. I suspect it's the difference been a genitive construction and a true adjective. Perhaps the gradability test for adjectivity has more to do with being able to be modified by an adverb? Chuck Entz (talk) 02:18, 19 January 2013 (UTC)[reply]

I agree with the oddness assessment and Chuck's first two sentences.

CGEL likes, not a general gradability test with arbitrary degree adverbs, but specifically the degree adverbs very and too because they cannot modify verbs and clauses nor the closed-set adverbs in the degree-adverb sense, which is readily distinguished from other senses of the words.

I could not find men's being used that way at Google Books, though I could imagine it being attempted. DCDuring TALK 02:50, 19 January 2013 (UTC)[reply]

Right. Because there's nothing wrong with "he previously lived there". Adverbial modification can help distinguish adjectives from attributive and genitive nouns, but only a few adverbs are adjective-specific. —Ruakh_TALK 06:03, 19 January 2013 (UTC)[reply]

I think this ("A previously men's game") is fine. Mglovesfun (talk) 10:42, 19 January 2013 (UTC)[reply]

I'm not sure whether this is a new development or continuation of an older pattern. I suspect it's the latter, because in the past, genitives like "men's" could also be used in predicative position. Something like "the game is men's" used to be possible. So genitives behaved syntactically much more like adjectives before than they do now. —CodeCa t 14:10, 19 January 2013 (UTC)[reply]

I'm gonna have to disagree there. Even if this kind of construction was possible sometime in the past, it's use today seems like a relatively recent development (and is still nonstandard). --Wiki Tiki 89 15:19, 19 January 2013 (UTC)[reply]

Wikimedia sites to move to primary data center in Ashburn, Virginia. Read-only mode expected.

(Apologies if this message isn't in your language.) Next week, the Wikimedia Foundation will transition its main technical operations to a new data center in Ashburn, Virginia, USA. This is intended to improve the technical performance and reliability of all Wikimedia sites, including this wiki. There will be some times when the site will be in read-only mode, and there may be full outages; the current target windows for the migration are January 22nd, 23rd and 24th, 2013, from 17:00 to 01:00 UTC (see other timezones on timeanddate.com). More information is available in the full announcement.

If you would like to stay informed of future technical upgrades, consider becoming a Tech ambassador and joining the ambassadors mailing list. You will be able to help your fellow Wikimedians have a voice in technical discussions and be notified of important decisions.

Thank you for your help and your understanding.

Guillaume Paumier, via the Global message delivery system (wrong page? You can fix it.). 15:12, 19 January 2013 (UTC)[reply]

Category:English pluralia tantum

This category is a heterogeneous mix of types of English nouns that do not have uniform implications for usage, principally agreement with the verb, but also comptability with the determiners that mark uncountability. Templates such as {{en-plural noun}} display (plural only), with the piped link not explaining usage.

I think that the information displayed, the piped link, and the categorization fail to convey information useful to normal users. Accordingly, I would like to replace such information and categories for English entries with information about verb agreement and countability. Such information might be placed on the inflection line (where plural only is often displayed) or on sense lines where it only applies to the senses, though I could see advantage in putting the information only on sense lines. I do not think this kind of rather information should be in Usage notes as it can be conveyed rather tersely (eg, with plural forms of verbs). DCDuring TALK 19:12, 21 January 2013 (UTC)[reply]

I kind of agree. A plurale tantum is specifically a word that is grammatically plural but doesn't have a (strictly) plural meaning. I think that glasses would be a true plurale tantum, since it refers to a single object; which of course is described by the two pieces of glass, but that is etymology. It is kind of an analogue to deponent verbs in Latin (which are grammatically passive but have active meaning). I don't know whether plural nouns that refer to substances or collections of objects that are seen as a substance would count, though. —CodeCa t 19:21, 21 January 2013 (UTC)[reply]

Maybe the problem is the existence of several different classes of words as regards their behavior with respect to the category of number -- formally plural words with singular referents, plural words that lack a singular counterpart, plural words seen as "mass", plural-collective words, plural words which theoretically have a singular form which is however almost never used.... and similar categories for singular words. How would the information provided convey these nuances so as to enable the reader to understand correct usage? --Pereru (talk) 00:16, 23 January 2013 (UTC)[reply]

I divide the information into two components: current usage and history. For English the usage issues are just verb (and pronoun) agreement and countability/uncountability. The rest is history - which I find interesting and sometimes instructive, but suspect most normal users do not - which probably could fit into etymology or dated senses. DCDuring TALK 02:26, 23 January 2013 (UTC)[reply]

I agree with your top-level division (though I'm not sure why you bring it up, since no one else seems to have said anything different), but I disagree with your statement that the only usage issues are agreement and countability. For example, what of constructions like "pair of glasses" and "head of cattle"? Surely they bear mention at [[glasses]] and [[cattle]]. —Ruakh_TALK 05:18, 23 January 2013 (UTC)[reply]

I brought it up because Pereru was discussing the heterogeneity of the category membership. I don't think the usage history is essential.

Uncharacteristically, I avoided any weasel words to see what other kinds of usage concerns folks here might have, because I would like to surface all issues to get to an implementable conclusion. [[cattle]] is an exemplary entry which contrast the low information content of the category and usual display. The inflection line specifically mentions that the word is both singular and plural and is usually - not only - plural. The usage notes go into considerable detail. This is the kind of thing that characterizes most of the members of the category, though the category membership and display say plural only.

As we are unlikely to match the detail of [[cattle]] for more than dozens of the category members, I think we should have some kind of generic coverage, possibly in an appendix or appendix section, which could be referenced in a context tag or in the usage notes. DCDuring TALK 14:10, 23 January 2013 (UTC)[reply]

Picture of the Year voting round 1 open

Dear Wikimedians,

Wikimedia Commons is happy to announce that the 2012 Picture of the Year competition is now open. We're interested in your opinion as to which images qualify to be the Picture of the Year for 2012. Voting is open to established Wikimedia users who meet the following criteria:

Users must have an account, at any Wikimedia project, which was registered before Tue, 01 Jan 2013 00:00:00 +0000 [UTC].
This user account must have more than 75 edits on any single Wikimedia project before Tue, 01 Jan 2013 00:00:00 +0000 [UTC]. Please check your account eligibility at the POTY 2012 Contest Eligibility tool.
Users must vote with an account meeting the above requirements either on Commons or another SUL-related Wikimedia project (for other Wikimedia projects, the account must be attached to the user's Commons account through SUL).

Hundreds of images that have been rated Featured Pictures by the international Wikimedia Commons community in the past year are all entered in this competition. From professional animal and plant shots to breathtaking panoramas and skylines, restorations of historically relevant images, images portraying the world's best architecture, maps, emblems, diagrams created with the most modern technology, and impressive human portraits, Commons features pictures of all flavors.

For your convenience, we have sorted the images into topic categories. Two rounds of voting will be held: In the first round, you can vote for as many images as you like. The first round category winners and the top ten overall will then make it to the final. In the final round, when a limited number of images are left, you must decide on the one image that you want to become the Picture of the Year.

To see the candidate images just go to the POTY 2012 page on Wikimedia Commons.

Wikimedia Commons celebrates our featured images of 2012 with this contest. Your votes decide the Picture of the Year, so remember to vote in the first round by January 30, 2013.

Thanks,
the Wikimedia Commons Picture of the Year committee

This message was delivered based on m:Distribution list/Global message delivery. Translation fetched from: commons:Commons:Picture of the Year/2012/Translations/Village Pump/en -- Rillke (talk) 04:18, 22 January 2013 (UTC)[reply]

Subcategories for compounds by component

Category:Compound words by language has member categories for 149 languages, populated by template {{compound}}. For each language, it's just a flat category of words, without subcategories. Only a handful of languages have any large number of member articles in these categories (Finnish 11K, English 8K, and 2K each for Dutch, German, Hungarian, and Swedish). The template {{compound}} is related to {{prefix}}, {{suffix}} and {{confix}}, but these create subcategories for each pre- and suffix. This makes it easy to find, for example, English words prefixed with tele- or French words suffixed with -graphe. Wouldn't it be nice to have such subcategories for components of compounds as well? Then we could easily find English words that are compounds of bird (birdbrain, blackbird, bluebird). We could also see at a glance whether bird is more or less productive in compounds than bank. I guess all it takes is a trivial change to {{compound}} (plus the creation of a number of new categories). Is that a good or bad idea? --LA2 (talk) 09:31, 23 January 2013 (UTC)[reply]

I think there already are some categories like this, but I'm not sure if they are welcomed by everyone. —CodeCa t 10:33, 23 January 2013 (UTC)[reply]

I'm fairly sure they were hated by some. We have a few, which are a kind of museum exhibit of the concept, like the entries with the "Shorthand" heading. DCDuring TALK 14:15, 23 January 2013 (UTC)[reply]

A partial implementation that might be more acceptable might be to create a category only for a term that is the head of a compound, ie, start in redstart. Another, more limited partial implementation would be only for terms that can justifiably be labeled "often in compounds". DCDuring TALK 14:20, 23 January 2013 (UTC)[reply]

Any pointers to previous discussions? A partial implementation might be that I cteate my own template sv-compound that passes all parameters to {{compound}} and then adds subcategories, and gradually introduces this template for my Swedish compounds.--LA2 (talk) 01:12, 24 January 2013 (UTC)[reply]

I'll dig 'em up. There were two waves of implementation. One was an open experiment intended to lead to auto-population of derived terms by creating categories for every word that was an etymon within a language. Eventually all derived terms would have been a category display. Something similar could have been done for related terms, albeit less completely. There was a substantial movement to delete all the categories in support of the experiment and the template {{derv}} that populated the categories. I begged that they not be deleted but ceased populating them.

Later someone else started to apply {{derv}} again, ignored warnings, and possibly was blocked (I don't remember).

Your idea of only doing compounds in a single language might be a far better experiment, but some apparently don't like where it could lead. I personally believe in a high degree of autonomy for languages, but this has implications. DCDuring TALK 01:37, 24 January 2013 (UTC)[reply]

See Wiktionary:Beer_parlour/2010/September#.22Synchronic.22_and_.22diachronic.22_etymologies and Wiktionary:Beer_parlour/2011/July#Derived_terms. DCDuring TALK 02:43, 24 January 2013 (UTC)[reply]

Thanks for pointing out those discussions from 2010 and 2011, to which I contributed, albeit not in regard to categories. Most of my contributions fall between August 2010 and March 2011, when the Swedish entries grew from 11K to 80K. In addition, there is a Template talk:compoundcat. As a very small experiment, I now created {{sv-compound}}, {{sv-compoundcat}}, and {{compoundsee}} and introduced subcategories for 3 compounds, of which the interesting example is Category:Swedish compounds with maskin. These compounds are listed under Derived terms in maskin#Swedish. (The +expand/-collapse bullet brings up an error text I don't understand, but I think that is a minor detail.) --LA2 (talk) 10:56, 24 January 2013 (UTC)[reply]

One possible goal for the experiment might be an estimate of the number of categories that would be required for various implementations (eg, head-only) of the concept in Swedish. A theory-based estimate could also be derived from some assumptions, but I don't have a good basis for making the relevant assumptions, especially in a language other than English. DCDuring TALK 14:26, 24 January 2013 (UTC)[reply]

I have asked that the "template tiger" utility on the Toolserver should be activated for en.wiktionary. I don't know if or when that will be possible. But it can provide numbers of the various parameter values used by each template. As I mentioned, there are some 2-3 thousand Swedish articles that use {{compound}}, so at most 5-6 thousand categories can be needed for Swedish, perhaps only 1/10 of that. --LA2 (talk) 15:23, 24 January 2013 (UTC)[reply]

Cool. I could use that. DCDuring TALK 18:11, 24 January 2013 (UTC)[reply]

The word maskinist (engineer, machine operator) uses {{suffix}} and is thus categorized among Swedish words suffixed with -ist. Should it also be categorized among Swedish compounds with maskin or maybe "Swedish words with the stem maskin" or what is the appropriate terminology? --LA2 (talk) 16:40, 24 January 2013 (UTC)[reply]

I personally wouldn't call it a compound in my idiolect. But you might want to get other opinions. Presumably all words formed from maskin whether by compounding or by affixation could be in the same category, rather than having three (or four [infixes]?) categories. DCDuring TALK 18:11, 24 January 2013 (UTC)[reply]

Yes, I want them in one category. The question is whether it should be called "Swedish compounds with ..:" or something else? --LA2 (talk) 20:03, 24 January 2013 (UTC)[reply]

Then I don't see how you can use the word compound in the category name. It could be 'Swedish words (or terms) with the stem (or derived from) maskin'. 'Terms' only if you intend to include multi-word phrases, etc. 'Derived from' implies morphological derivation to me, which may not be the same as a word's diachronic derivation. In English this would be a significant issue. In German, for example, less so. Swedish is more like German in this regard, isn't it. DCDuring TALK 20:13, 24 January 2013 (UTC)[reply]

I don't like that very long category name. Would "Swedish words related to maskin" be too generic? --LA2 (talk) 22:43, 24 January 2013 (UTC)[reply]

Sorry. I meant the parentheticals as alternatives. I've done some long category names and regretted them. I do think that "related" is much too generic. So "Swedish words derived from maskin" or "Swedish words with the stem maskin" would be be my favorites. Others may differ. DCDuring TALK 23:19, 24 January 2013 (UTC)[reply]

In analogy with the short and nice category name category:Swedish words prefixed with tele-, would "Swedish words stemming from maskin" sound too much like a pun? Stemming = originating, deriving from, and stem = component of a compound. --LA2 (talk) 09:23, 25 January 2013 (UTC)[reply]

That sounds good. In any event if all the categorization is by template (no hard categorization) then renaming would be easy. DCDuring TALK 14:25, 25 January 2013 (UTC)[reply]

The most detailed discussion on categories of compounds derived from a particular word is at Template_talk:derv#Deletio_debate (should be Template_talk:derv#Deletion_debate), AFAIK. I oppose creation of categories like 'English terms derived from "time"' or 'English compounds derived from "time"' or Category:Swedish compounds with maskin. I oppose a plan or scheme that leads to the creation of a large number of small categories with 5-10 entries. --Dan Polansky (talk) 10:19, 27 January 2013 (UTC)[reply]

The remaining categories of the previous experiment mentioned above by DCDuring are these: Category:English words derived from: load (verb), Category:English words derived from: load (noun). --Dan Polansky (talk) 10:43, 27 January 2013 (UTC)[reply]

Thanks for these interesting pointers. However, "a large number of small categories with 5-10 entries" is exactly what we have under Category:English words by prefix. Are those 700 subcategories something we should regret? --LA2 (talk) 22:42, 29 January 2013 (UTC)[reply]

It's always interesting to note number of items in Special:UnusedCategories, ie, no members whatsoever, now 2,376. They don't seem to be bringing the servers down, nor is there any obvious rush to clean them up with any other motivation. DCDuring TALK 23:07, 29 January 2013 (UTC)[reply]

Special maintenance pages

bugzilla:15434

Four special pages have not been run at all since 2009, Special:DeadEndPages, Special:AncientPages, Special:FewestRevisions, and Special:WantedPages. Of those, Wanted Pages seems the most useful. It is fairly clear that the technical folks at MediaWiki do not want to run Wanted Pages for large wikis at all, even annually, apparently for resource reasons.

Do we want these pages to be updated? Is Wanted Pages the most useful?

A possibility is that we could give up some of the special pages which are not particularly useful to us in exchange for getting some runs of those that have not been run for three years. There are a few that seem to me to be quite useless because the number of members far exceeds the 5,000 to which the display is limited and the vast bulk of the members are in the category as a result of decisions that we are unlikely to reverse. These are Special:OrphanedPages, Special:WithoutInterwiki, Special:UncategorizedTemplates, Special:UnusedCategories, Special:UnwatchedPages.

Does anyone make use of these pages?

Even some of the other pages would not necessarily need to be run as often as they are now. If our entire maintenance run could be run less often, we might be able to get other resource-intensive items, such as Wanted Pages. This would include the pages like most-used pages, most linked-to templates, etc.

Do we need any of the maintenance pages more frequently than weekly, biweekly, or monthly rather than every four days?

I would be willing to take our requests to MW technical folks, but someone more technically adept than I might be better suited for the job and to such volunteer I would defer. DCDuring TALK 15:44, 23 January 2013 (UTC)[reply]

I think Special:UnusedCategories could still be useful, because it points out which categories were created to be filled but never were. Special:UncategorisedTemplates would be useful if only we could exclude the mass of language code templates from appearing in it. —CodeCa t 16:04, 23 January 2013 (UTC)[reply]

How often would we need Unused Categories updated? Monthly?

I don't think we can ask MW for a special run of Uncategorised Templates of that type. I think a dump run is needed for that. DCDuring TALK 16:51, 23 January 2013 (UTC)[reply]

Monthly, maybe less. I don't think it would be needed that often... more something that people should occasionally check on. —CodeCa t 16:58, 23 January 2013 (UTC)[reply]

Even quarterly seems like enough, but perhaps the experiment should be for a month. I've come to have the feeling that none of these need to be updated as frequently as twice a week. It may be that some or all of these are a byproduct of something the MW guys have to do twice a week to maintain the integrity of the underlying database, in which case perhaps some of them may be virtually free.

A possibility might be to have different ones (or groups) updated at each of periodic maintenance runs that MW does. DCDuring TALK 18:26, 23 January 2013 (UTC)[reply]

At User:DCDuring/SpecialPagesRelativeCost are the times to run several special pages for Wikipedia. I take the times as a reasonable indication of the relative costs of various of our pages as well. Very expensive pages include:
Most linked pages

Most revisions

Fewest revisions

Less expensive are:
Most linked templates

Wanted pages

Do we get any benefit whatsoever from once-every-four-days runs of the most expensive pages?

OTOH, Most linked templates seems vital, though I am not sure we need it every four days. DCDuring TALK 17:30, 30 January 2013 (UTC)[reply]

Polish pronunciation

I have started Appendix:Polish pronunciation, ripped more or less wholesale from w:Help:IPA for Polish. It would be great if people who (unlike me) actually know Polish took a look at it and made sure it corresponds to the reality of our transcriptions. —An gr 19:30, 23 January 2013 (UTC)[reply]

I'm not sure if the transcription of the nasal vowels is accurate. It may be the normal phonemic way of writing it, but in practice they are nasal closing diphthongs: the first element is like the corresponding normal vowel e or o, but the second is more like a nasal u. —CodeCa t 19:41, 23 January 2013 (UTC)[reply]

But the question isn't so much whether the page is phonetically accurate, it's whether the page reflects how we actually transcribe Polish words here. And having looked at some Polish entries with nasal vowels, I see we aren't consistent. Some use plain ɛ̃ and ɔ̃, while others use things like ẽĩ̯ and ɔ̃w̃. Anyway, I just wanted to announce this page's existence to the community; people who work on Polish are welcome to tweak it in whatever way necessary to bring it into line with actual practice. —An gr 20:02, 23 January 2013 (UTC)[reply]

Created some new entries

Hey all, I created some new entries and while I think I'm getting the format down I'm sure there are some minor details somewhere I'm overlooking. Could someone please double check the entries? They are geopotential height, virtual temperature, potential temperature, equivalent potential temperature, convective temperature, Skew-T Log-P diagram and pressure gradient force. The one I'm most concerned about is Skew-T Log-P diagram due to the capitalization and the fact I was having a hard time defining it without getting too detailed (thus both an etymology section and the definition, which may be better as one or something). Thanks in advance, Ks0stm ^(T•C) 14:24, 24 January 2013 (UTC)[reply]

ISO 639-3 changes

SIL has posted a summary of changes made for change requests from 2012. Usually enwikt implements changes that aren't controversial (often for codes that we have no entries in). It's also a good time to debate the changes that are controversial or non-trivial. Check to see if your favorite unappreciated language has been added or updated! --Bequw → τ 15:00, 24 January 2013 (UTC)[reply]

They added Talossan! — Ungoliant ^(Falai) 20:27, 24 January 2013 (UTC)[reply]

But they still haven't clarified exactly what language vmf is supposed to be. —An gr 23:01, 24 January 2013 (UTC)[reply]

They haven't clarified frs vs stq, either. They did do away with mld and myq, though.

I note that they added a code for "Slavomolisano", a Čakavian dialect we might prefer to call "Molise Croatian" (or at least "Slavo-molisano", with a hyphen) if we accept it at all. I don't have time to look into the other codes they added. As for the renames, they might should be looked at individually. - -sche (discuss) 02:04, 25 January 2013 (UTC)[reply]

Is the new {{xdk}} (Dharuk) describing the same thing as our {{aus-syd}}? —Μετάknowledge^{discuss/deeds} 02:46, 25 January 2013 (UTC)[reply]

Whether aus-syd was intended to be Dharuk/Dharug or not (it probably was, but it's less than completely clear and people have wondered since 2008), in practice, the language aus-syd is used for is the same as xdk, yes. We should migrate our aus-syd entries to xdk. - -sche (discuss) 06:37, 10 February 2013 (UTC)[reply]

It definitely was, because before 2008, it was called Category:Dharuk language. -- Liliana • 08:55, 10 February 2013 (UTC)[reply]

I have deleted {{mld}} and {{myq}}. I started adding the new codes, but realised it would be better for a bot to do that — anyone up for it? ({{xbp}} through {{unu}} need to be added.) I think it would be best to review the rename proposals individually, and plan to do so. - -sche (discuss) 04:04, 10 February 2013 (UTC)[reply]

See Wiktionary:Grease_pit/2013/February#Import_new_ISO_codes regarding what, specifically, AFAICT, should be done. - -sche (discuss) 04:29, 10 February 2013 (UTC)[reply]

I've moved Talossan to {{conl:tzl}}; if anyone spots other constructed-language codes, they should be moved as well. - -sche (discuss) 18:48, 10 February 2013 (UTC)[reply]

Category:Commonwealth English

Hi, all.

The category is described thus:

This category is for words used in British English and in most Commonwealth countries, as opposed to North American English. In most English dictionaries this corresponds to the label British.

It is applied to a term by use of the regional label {{Commonwealth}}. I just added the label except Canada to 14 of the 41 items in this category (34%).

Problem is that the British Commonwealth doesn't correspond to a linguistic group. It includes Canada, but “British English” never includes Canadian English, because en-CA is an example of the other English, (North) American English. The term Commonwealth English doesn’t have any academic currency nor an unambiguous definition. I won't be around forever to check on the way this is being used. —Michael Z. 2013-01-24 19:44 z

It is really only spelling-related. Supposedly the English-speaking world is divided along spelling lines into the US on one side, and the Commonwealth on the other. But it's not always that clear-cut, and there are countries like Ireland that use those spellings but are not in the Commonwealth. —CodeCa t 20:45, 24 January 2013 (UTC)[reply]

Even just in spelling, Canada sometimes goes with American spelling (tire, curb, yogurt, analyze), so tyre, kerb, yoghurt, analyse can't really be called Commonwealth English. —An gr 21:02, 24 January 2013 (UTC)[reply]

We also have the uniquely-Canadian yogourt. —Michael Z. 2013-01-24 22:18 z

CodeCat, the majority of these are not spellings, but regionalisms or regional senses. Canadians don't use the “Commonwealth”-labelled senses of borstal, building society, char, clangour, denizen, driving licence, fanny, flat, hockey, member of parliament’s legislative motion, member's bill, nappy, rubbish, but they do use Act of Parliament, belt up, counsellor, Crown prosecutor, gallon, graduand, indictable offence, Mareva injunction, private member's bill, simple majority, summary offence, and zed.

Be that as it may, Canadian English has much vocabulary, usage, and spelling in common with either British or US English, but not the other. Its existence disproves the usefulness of any so-called “Commonwealth English.” —Michael Z. 2013-01-24 22:11 z

I don't think Act of Parliament should be marked Commonwealth. Americans use it too (to refer to British Acts of Parliament for example). --Wiki Tiki 89 22:16, 24 January 2013 (UTC)[reply]

Good point. The term has been labelled with the regionality of the thing. I've removed the label & category. —Michael Z. 2013-01-24 22:23 z

So no ideas on how to deal with this problem? I can only interpret Commonwealth English as “British English,” and most Commonwealth countries as a euphemism for “all the British-English–speaking countries, and not Canada, but the term ‘British English’ isn't politically correct for us.”

I will delete the template and recategorize the terms. Cool? —Michael Z. 2013-01-30 02:26 z

I don't like the way that template "alternative spelling of" turns "British" into "UK" and "Commonwealth" into "Commonwealth of Nations", but I'm not sure how to improve the display to make usage clear. Any suggestions? Dbfirs 08:12, 13 June 2013 (UTC)[reply]

I agree. These things have been discussed many times, but can’t be properly resolved until we can agree on a scheme for regional tags for English. —Michael Z. 2013-06-13 14:14 z

I don't think your replacement of "Commonwealth" with "Canada" is helpful. What about NZ, Oz, SA etc? Dbfirs 16:30, 14 June 2013 (UTC)[reply]

Commonwealth is replaced by British, as in British English. Commonwealth wasn’t in the least bit helpful in the first place, because there isn’t a single language feature that is used in the Commonwealth and not outside of it. Furthermore, some editors interpret Commonwealth as “the United Kingdom and the Commonwealth,” others as “the Commonwealth countries outside of the UK.”

Canada doesn’t conform to British spellings, so it has to be labelled cases where Canadian usage coincides with the British.

New Zealandisms, Australianisms, and South Africanisms, etc., should be labelled as such. But a normal British-English spelling entry needn’t be considered incomplete because it doesn’t have 50-odd country labels after it. —Michael Z. 2013-06-14 18:22 z

That would be fine if the template didn't convert "British" to "UK" (as I complained about above). Your alteration produced "UK and Canada" which is misleading. "UK and Commonwealth" is accurate" because Canada does conform to British spellings in general, but allows American spellings in some cases, and also uses American senses of some words. I'd be happy with your suggested use of "British" if it worked. Dbfirs 01:04, 15 June 2013 (UTC)[reply]

I was opposed when all of our British labels were changed to “UK,” wholesale. Since there was no corresponding effort to change the way they are applied, or even clear recognition of the difference in their meaning, I am treating these as poor synonyms. I am trying to encourage improvements in our English regional labelling, starting with this little-used and practically meaningless Commonwealth label.

UK and Commonwealth is redundant, because the UK is part of the Commonwealth.
... Or is it implying that “Commonwealth” means the Commonwealth countries outside of the UK? I don’t believe this has ever been suggested.
UK and Commonwealth is inadequate, because it omits Ireland, for example.

Canada does conform to British spellings in general – I don’t think I agree, or that this idea is helpful. The regional labels are applied to terms, senses, usages, and spellings. As a branch of (North) American English, Canadian English is generally different from the English of Britain and most other Commonwealth countries. The Commonwealth is a political entity, and doesn’t correspond to a language variety. —Michael Z. 2013-06-15 15:14 z

There seems to be a move back to British spellings in Canada, but I agree that Canadian spelling is much more tolerant of American variants. I agree with you that the template should not change "British" to "UK". How can we stop it doing this? Dbfirs 08:09, 16 June 2013 (UTC)[reply]

What evidence is there of a move (back?) to British spellings in Canada? I know of some people and organizations who overcompensate by choosing the British spelling dictionary in MS Office, because its misguided idea of “Canadian” is to accept all US and British spellings. But Canadians seem to consistently use Canadian spellings (cf. http://joeclark.org/en-ca), and are tolerant of both US-isms and Britishisms in the things they read. In my experience it is unexpected British spellings that stand out more.

Thanks for the link to Canadian spellings, though Joe Clark does make the error of describing the "ize" forms as American when they are the only forms accepted by the Oxford English Dictionary. I admit that there are some exceptions to British spelling mentioned by Cloudcuckoolander below, but, in general, when I read Canadian text, I don't recognize it as being close to American. (The evidence was a recent article, but, apologies, I can't remember where, so I'll have to withdraw the claim for now.) Dbfirs 16:16, 18 June 2013 (UTC)[reply]

... (later) ... The evidence for the change (back) to British spellings in Canada is summarized in the Wikipedia article on w:Canadian English (quoted below):

"More recently, Canadian newspapers have adopted the British spelling variants such as -our endings, notably with The Globe and Mail changing its spelling policy in October 1990.{{cite news | title=Contemplating a U-turn | first=John | last=Allemang | date=1 September 1990 | publisher=The Globe and Mail | page=D6 } Other Canadian newspapers adopted similar changes later that decade, such as the Southam newspaper chain's conversion in September 1998.“Herald's move to Canadian spellings a labour of love”, in (Please provide the book title or journal name), Calgary Herald, 1998 September 2, page A2 The Toronto Star adopted this new spelling policy in September 1997 after that publication's ombudsman discounted the issue earlier in 1997. Honderich, John (1997 September 13) “How your Star is changing”, in (Please provide the book title or journal name), Toronto Star, page A2 The Star had always avoided using recognized Canadian spelling, citing the Gage Canadian Dictionary in their defence. Controversy around this issue was frequent. When the Gage Dictionary finally adopted standard Canadian spelling, the Star followed suit." Dbfirs 16:36, 22 June 2013 (UTC)[reply]

This change may be controversial. I will make a BP proposal about English language labelling shortly. —Michael Z. 2013-06-16 18:50 z

Thanks. Dbfirs 16:16, 18 June 2013 (UTC)[reply]

Jumping in as another Canadian to confirm that Canadian spelling is distinct from British spelling. Canadian spelling might be described as a sort of hybrid between British spelling and American spelling. As a rule, Canadians use colour, metre, and travelled like Britons, but we tend to favour -ize and -yze like Americans:

"Brandon teens charged after cars vandalized by slingshot", CBC.ca, 3 June 2013
Ashley Prest, "Paralyzed paramedic inspires can-do project", Winnipeg Free Press, 15 June 2013
Sean Silcoff, "Trudeau offers to reimburse organizations $20,000 for speeches", The Globe and Mail, 16 June 2013

Not to mention tire over tyre, aluminum over aluminium, mom over mum, airplane over aeroplane, curb over kerb, fetus over foetus, draft over draught, gynecology over gynaecology, and skeptical over sceptical. -Cloudcuckoolander (formerly Astral) (talk) 08:25, 17 June 2013 (UTC)[reply]

The "ize" forms are equally acceptable in British English (Oxford spelling as opposed to Cambridge spelling), and "mom" is common in some regions of the UK, but I accept the exceptions that need special treatment in the entries. Do Canadians not allow the "draught" spelling for any senses at all? Dbfirs 16:23, 18 June 2013 (UTC)[reply]

Draught and draft have a lot of senses, some of which seem to overlap, some of which don't. I know Canadians tend to favour the draft spelling for the more common/familiar senses (the "current of air" sense, for example). I'm not sure about the less common/familiar senses. I remember seeing draught used for the "amount of liquid drunk in one swallow" and "dose of medicine in liquid form" senses, but can't recall whether it was in specifically Canadian sources.

Sorry I can't provide a greater insight in this area. -Cloudcuckoolander (formerly Astral) (talk) 22:46, 18 June 2013 (UTC)[reply]

Canadians expect the road to be plowed after a snowfall, but might not be surprised to find a plough in a farm field. I suspect they might talk about the aesthetics of fine art while getting a treatment at an esthetics place.

these things are not easy to convey succinctly and clearly with our labelling system. —Michael Z. 2013-06-19 04:03 z

List Helper encore

The discussion on list: templates above is winding down, so I thought it may be a good idea to just bring up the main question: should there be more than one main List Helper -- the one CodeCat did, preserving the original format (see {{list:countries of Europe/sk}}), and perhaps one with a table format (like the ones I sketched, User:Pereru/list:sk:countries of Europe and User:Pereru/list:sk:countries of Europe-2)? Editors could then decide which one to use in a given list: template (if too many, then use table). --Pereru (talk) 23:15, 24 January 2013 (UTC)[reply]

There could be {{list table helper}} or something like that? —CodeCa t 01:19, 25 January 2013 (UTC)[reply]

Yes, or maybe {{list helper/table}}, or whatever name seems more appropriate. Likewise, {{list helper/columns}}, {{list helper/line}} (the latter being the original list format). What do y'all think? If nobody objects, I'll probably create these templates officially in the template namespace. --Pereru (talk) 23:18, 27 January 2013 (UTC)[reply]

How exactly would a single helper template help in case of columns? For derived terms and translations, we normally use three. Would an extra helper template add anything to that? (Not that it can't... I just wonder how) —CodeCa t 23:21, 27 January 2013 (UTC)[reply]

Citation format vs quotation format

User talk:Astral#Citations_format has made it apparent that WT:Citations currently suggests (by linking to Citations:trade and Citations:parrot) the use of a different format for quotations than WT:" prescribes. I propose that we change Citations:trade and Citations:parrot (which use two different formats), and all other citations pages, to format their quotations according to the master policy, WT:". If we like the dashes better than commas, we should change WT:"... but I see no reason to reformat the date-separator in a quotation every time I move it from a citations page into an entry or vice versa. - -sche (discuss) 02:25, 25 January 2013 (UTC)[reply]

I understand the wish to ensure uniformity, but I don't see an issue with having one format for entries, and another for citations pages. I can't say why, exactly, but I think dashes look optimal on citations pages and commas look best in entries. Maybe it's to do with spacing: streamlined commas look better in the smaller drop-down quotations space in entries, dashes look better in the more open citations page space. Astral (talk) 03:07, 25 January 2013 (UTC)[reply]

There is no firm justification for the inconsistent appearance of the same thing in different places. The inconsistency interferes with practical editing and introduces errors. And it looks sloppy. It will have to go away if we’re ever able to eliminate redundancy by using some sort of database scheme to have citations maintained in one place rather than duplicated all over the wiki. —Michael Z. 2013-01-25 15:12 z

Illustrations and the noun project

I found today commons:Category:The Noun Project. Probably you're already aware of it, but shouldn't more icons be imported and added to en.wiktionary entries? Then all the other Wiktionaries could import them as well. --Nemo 13:19, 25 January 2013 (UTC)[reply]

Any Commons file can be used without being imported:

. DCDuring TALK 14:32, 25 January 2013 (UTC)[reply]

Icons don’t illustrate words very well. — Ungoliant ^(Falai) 14:59, 25 January 2013 (UTC)[reply]

They could conceivably be useful for categories. The most important categories at a multilingual dictionary are languages. There is an option at gadgets under preferences to "Add country flags next to language headers."

Among the characteristics of many words, especially in English, is that they have many definitions and might be represented by multiple icons, which would have to be small to avoid clutter. Another major difficulty is what Ungoliant is referring to: many important definitions are figurative and/or abstract and do not have any particular obvious icon that would be intelligible to users from diverse cultural backgrounds.

I have been working on taxonomic names, for which some simple icons might be helpful, especially for higher taxa (families and higher) as any particular photo is likely to be misleadingly unrepresentative of the taxon. For example, only some fungi produce mushrooms, which would seem to be a perfect candidate to be the basis for an icon. Further, what would be a good icon for viruses? for bacteria? for liverworts? DCDuring TALK 16:45, 25 January 2013 (UTC)[reply]

Do those words have any icon at the noun project? You have to worry about their quality only if they exist... And no, I'm not proposing to add each illustration to all the vaguely related entries, only to the most relevant one. Sure, only basic vocabulary would be covered, but isn't it important? In February en.wikt had only 21 thousand images included (and that's not even as many entries with an image). --Nemo 21:03, 26 January 2013 (UTC)[reply]

We use Commons photos all the time to illustrate things. We try to avoid using photos which are pretty, but misleading, or which clutter entries which contain many definitions. Some of the paltry few icons now in the noun project may solve the problem of illustrating in a not-too-misleading way very inclusive terms, like bug. Yes, there are lots of terms that need illustrations, some of which are in Category:Requests for photographs, which now has mostly taxonomic names and other names of living things. Many technical terms, such as from mechanical engineering, badly need illustrations, but they are not easy to find at Commons, if they exist there at all. DCDuring TALK 23:42, 26 January 2013 (UTC)[reply]

Sure, that's why I'm asking if there's interest in adding to Commons some images specifically for Wiktionary use. There's many more in that website that nobody else will ever bother uploading, as far as I can see. Nemo 22:05, 27 January 2013 (UTC)[reply]

I'm still not sure what you mean by "uploading". We don't want files uploaded to Wiktionary, we just want links to Commons files to be included in Wiktionary entries that might benefit from them.

There are many definitions that could use visual help. Many of them are technical, in fields ranging from knitting to furniture to machine design to aerospace. Technical terms outside of computing and linguistics tend to get short shrift here so the need is not well recorded in requests for photographs. You could start with any technical "usage context", say, those entries in Category:en:Knitting or Category:en:Juggling and find many terms needing illustrations. You could mark (and thereby categorize) them using {{rfi}} or just go to Commons to find illustrations as you find terms needing them.

I am not sure what benefit there is to have really simple concepts like "bug" illustrated, except with small icons that might offer users a way to determine whether an entry or definition was close to their interest without having to read everything. DCDuring TALK 00:01, 28 January 2013 (UTC)[reply]

Sort parameter in German templates

Some German templates implement a sort= parameter, but it is not listed in the documentation. I have made an experiment with (deprecated template usage) ägyptisch and it seems to work. Are we supposed to use it or not? (There is no mention of it in Wiktionary:About German). SemperBlotto (talk) 15:14, 25 January 2013 (UTC)[reply]

I think it can be used... I don't agree with sort parameters in general, though, because they don't really do what we want them to do. Instead of actually adapting the sort order to the language, it adapts the language to the sort order. I think I mentioned before that Wiktionary badly needs a way to customise the sorting order for each category individually, to match the customary sorting order of the language. —CodeCa t 15:24, 25 January 2013 (UTC)[reply]

OK, I think that I'll carry on not using them for German. The alternative is to add them to all the many existing words - but even then, their inflected forms would remain. SemperBlotto (talk) 17:22, 25 January 2013 (UTC)[reply]

A blast from the past.

The following article is from Notes and Queries, Sept. 22, 1855:

Affected Words — In Phillips's New World of Words (first edition, 1657; fourth, here used, 1678) is "a collection of such affected words from the Latin or Greek, as are either to be used warily, and upon occasion only, or totally to be rejected as barbarous, and illegally compounded and derived." These words are 188 in number; those which have lasted, though sometimes in an altered sense, sometimes in a cognate form, are as follows:

"Agonize, aetiology, autograph, aurist, bibliography, bimensal, cacography, cacology, cacophony, regurgitate, evangelize, euthanasia, ferocious, hagiography, holographical, homologation, imprescriptible (?), incommiscibility, inimical, misanthropist, misogynist, oneirocriticism, terraqueous."

So that a little more than ten per cent, have lasted. Those which are marked as "most notorious" are as follows:

"Acetologous, acercecomic, alebromancy, ambilogie, anopsie, aurigraphy, circumbilivagination, clempsonize, colligence, comprint, cynarctomachy, emgiate, essentificate, fallaciloquent, flexiloquent, helispaesrical, hierogram, holographical, homologation, horripilation, humidiferous, illiquation, importuous, imprescriptible, incommiscibility, indign, inimical, logographer, lubidinity, lubrefaction, luctisonant, miniography, nihilification, nugisonnnt, nugipolyloquous, olfact, onologie, parvipension, plastography, plausidical, quadrigamist, quadrisyllabous, repatriation, scelestick, solisequious, superficialize, syllabize, syncentrick, transpeciation, tristitiation, vaginipennous, viscated, ultimity, vulpinarity."

Among words, the loss of which may be regretted for serious purposes, are, transpeciation, circumstantiation, the establishing by circumstances, and flexiloquent, speaking persuasively. For comic and sarcastic purposes, symbolic, not paying "shot or reckoning" (pay the shot, pay the reckoning), bovicide, hydropotist, monophagous, omnitinerant, polyphagian, ventripotent. It may be worth adding, that among the words which were actually proposed, and sometimes used, is honoriftcabilitudinity. M.

It seems we weren't the first to have an RfD, of sorts. Cheers! bd2412 T 17:14, 25 January 2013 (UTC)[reply]

pay the shot was apparently in Cymbeline, not one of Shakespeare's well-known works, but likely to have had imitators. pay the reckoning also seems likely to be attestable, but scores high on my non-idiomaticity meter. DCDuring TALK 17:24, 25 January 2013 (UTC)[reply]
- By the way, I came across this while searching for alebromancy (which occurs on one of our lists of missing words); apparently it means divination by reading barley, but only appears in dictionaries. bd2412 T 17:37, 25 January 2013 (UTC)[reply]

Category:Verbs vs. Category:Verbs by language?

What is the difference between [5] and [6]? --KoreanQuoter (talk) 12:31, 26 January 2013 (UTC)[reply]

Category:Verbs ought to be only a holder category for things like Category:en:Verbs and Category:de:Verbs, which include not all verbs of those languages, but all words (of any part of speech) whose semantics makes them somehow related to verbs (entries like verb, transitive, intransitive, stative, subjunctive, indicative, and so forth). In practice, however, this doesn't seem to always be the case. —An gr 13:28, 26 January 2013 (UTC)[reply]

I think it is a bit redundant, though. There aren't that many words related to verbs specifically. Maybe it should be deleted? —CodeCa t 14:14, 26 January 2013 (UTC)[reply]

Yeah, also because of easy confusion with POS cats.—msh210℠ (talk) 20:44, 27 January 2013 (UTC)[reply]

DARE is coming to the web

An interesting article: “Coming to the Internet: the ‘Dictionary of American Regional English’” There's a beta program to sign up for. —Michael Z. 2013-01-26 15:21 z

Very cool. Thanks a lot. It's a great resource. Eventually they will have to charge for it I expect. We should probably take advantage of it while it is free to verify and perfect some or our US regional distribution information. I doubt that we will often have more specific authoritative information than what they have, so any geographic categories or labels would be interesting, too. DCDuring TALK 16:29, 26 January 2013 (UTC)[reply]

Interwikis of unattested terms

I noticed this edit and it seems to go against our interwiki policy, but I don't know how better to handle it. --Wiki Tiki 89 18:52, 27 January 2013 (UTC)[reply]

Does it? Categories and templates have IWs with different names, so the policy doesn’t apply to every namespace. In any case, I support keeping that IW, as it would be selfish to expect other Wikis to name their reconstructed form pages exactly like we do (and in English). — Ungoliant ^(Falai) 18:57, 27 January 2013 (UTC)[reply]

Note that on en.wikt it is in the Appendix namespace, but the link links to the Main namespace in ko.wikt. --Wiki Tiki 89 19:05, 27 January 2013 (UTC)[reply]

That's their problem. I see nothing wrong with the interlanguage link; it links two pages concerned with the same entity. —An gr 19:11, 27 January 2013 (UTC)[reply]

What Ungoliant said.—msh210℠ (talk) 20:46, 27 January 2013 (UTC)[reply]

Definitely keep; es: also allows reconstructed terms in the main namespace. Our rules are that in our main namespace interwikis must be identical to the page name. Appendix:Proto-Slavic/gvězda is not in our main namespace. Mglovesfun (talk) 00:21, 11 February 2013 (UTC)[reply]

Keep too. I would phrase it a bit differently. The purpose of interwikis is to link between equivalent pages. Obviously, our own entry on the word house is equivalent to the Spanish es:house. If the names themselves differ, equivalence still applies, so we should take that into account. —CodeCa t 00:26, 11 February 2013 (UTC)[reply]

Actually, viewed that way, these pages aren't equivalent, in that Appendix:Proto-Slavic/gvězda is about a specific form in a specific reconstructed language, whereas ko:*gvězda is about all unattested forms represented by gvězda regardless of language. (That is: in the main namespace, we use interwikis even when the target-language entry is currently about something completely different from our entry, because the assumption is that eventually we will add what they have and vice versa. The entries are counterparts even if their contents don't match. But here, the opposite is true: the content currently matches (same extension), but the definition of what belongs on each page is different (different intension).) —Ruakh_TALK 01:47, 11 February 2013 (UTC)[reply]

That's true, but interwikis don't have to be one-to-one, do they? And even if we do want them to be one-to-one, we can at least be sure that one of our reconstructed pages will never correspond to multiple entries on the Korean Wiktionary. Of course, one of theirs can correspond to several of ours, but that is up to them. —CodeCa t 01:51, 11 February 2013 (UTC)[reply]

The RFD and RFV pages are getting too large

Maybe we should start having these discussions on the entries' talk pages and leave the RFD and RFV pages to just be a list of active discussions? --Wiki Tiki 89 23:12, 27 January 2013 (UTC)[reply]

That wouldn't be a bad idea. It would make the discussions self-archiving, like the monthly subpages we already have for BP and GP. —CodeCa t 23:15, 27 January 2013 (UTC)[reply]

I'm mixed on that. Currently, changes to all discussions will come up on my watchlist, which is a lot more handy than watchlisting the entry itself, which would cause more junk in my watchlist, or just checking now and then. Honestly, the real solution is to ~~make -sche tend to it more quickly~~ archive stuff. IMO, RFVs should be closed and archived pretty soon after they are cited, instead of sitting around. —Μετάknowledge^{discuss/deeds} 00:09, 28 January 2013 (UTC)[reply]

When the RFD and RFV pages get too large, it forces us to start closing the nominations; so their excessive size is a necessary evil. — Ungoliant ^(Falai) 00:23, 28 January 2013 (UTC)[reply]

Except the current system is not scalable. As Wiktionary grows, so will the number of new RFDs and RFVs per day. --Wiki Tiki 89 00:38, 28 January 2013 (UTC)[reply]

I just archived around 11 kilobytes of RFV. If everyone who sees this archives a couple kilobytes, we'd be fine. —Μετάknowledge^{discuss/deeds} 00:54, 28 January 2013 (UTC)[reply]

We could also make monthly subpages, but with the catch that the oldest pages get deleted eventually after their requests are closed, so that there is a continuous rolling "window" of monthly pages. That may be a good compromise? —CodeCa t 01:20, 28 January 2013 (UTC)[reply]

I would support going to a monthly page system, with a goal of closing and archiving all discussions within 60 days of their initiation. That's plenty of time for verification to be obtained and deletion rationales to be hashed out. I would propose that Wiktionary:Requests for deletion#Names of individuals from the w:Romance of the Three Kingdoms can be archived to Appendix talk:Romance of the Three Kingdoms. We have some other discussions that are nearly that old, that need to be brought to a close one way or another. bd2412 T 04:03, 28 January 2013 (UTC)[reply]

By the way, just going down the list at RfD, the following entries have already been struck as decided one way or the other (some for a long time, some only recently): "I need a postcard", "LY335979", "rule with an iron fist", "infraspecific taxon", "open data access", "arnaut", "non-exclusive list", "out-Herod", "sug-", "browser hijacking", "revista porno", "biker gang", "willie-wag", "man-", "Java Persistence API", "chai tea", "apical abscess", "acute abscess", "smoking section and nonsmoking section", "for instance", "lion#French", "parallelled", "Rico Suave", "ニゴロブナ", "Indo-European root", "automatic indexer", "tetherless computing", "help page", "web-based operating environment", "local area network", "wide area network", "supercluster of galaxies", "house-proud", "give someone the shits", "see you in hell", "human resource management", "Care Bears", "right to work", "I need weed", "right to privacy", "right to life", "absolution", "ostracization", "Pandeism". bd2412 T 05:23, 28 January 2013 (UTC)[reply]

I don't mind archiving RFD and RFDO at irregular intervals, but I cba doing other pages like RFV, RFC and RFM. I'd never leave the house if I did them all. Mglovesfun (talk) 10:09, 28 January 2013 (UTC)[reply]

baked goods

We say that this is a plural only -term. However, Google books gives almost 7,000 hits for singular form and a simple Google search 126,000. Should we change this opinion? --Hekaheka (talk) 11:24, 28 January 2013 (UTC)[reply]

Are some of them for "baked well"? Dbfirs 13:28, 28 January 2013 (UTC)[reply]

"Baked good" definitely exists. But I'd say that both the plural and the singular are SOP. --Wiki Tiki 89 13:46, 28 January 2013 (UTC)[reply]

There are 3360 Google Books results for "a baked good". This seems to be more of a Tea Room question, though (unless there are weighty policy implications that I'm missing). Chuck Entz (talk) 13:51, 28 January 2013 (UTC)[reply]

That compares with over half a million in the same source for "baked goods", so the singular is relatively rare. I would regard the singular as mildly incorrect, and would correct it to "baked product" when proof-reading, but we certainly can't say "plural only" with so many hits ( even if many of them are by authors with foreign-sounding names (yes, I know that this doesn't necessarily mean that English is not their first language)). Very few words are truly "plural only". If this is not deleted as SoP, should we add a usage note on the rare singular? Dbfirs 14:58, 28 January 2013 (UTC)[reply]

I don't think it needs a usage note. The singular seems to be rare not because of grammar, but because of the rarity of the need to use the singular. The singular is used in cases such as "Bagels are a baked good." But these kinds of sentences are not very common since most people already know what bagels are. The more common uses would be to say "This factory produces baked goods." --Wiki Tiki 89 15:23, 28 January 2013 (UTC)[reply]

This is an example of an entry needing clarification of presentation of its usage by number. It is typical of many of the items in Category:English pluralia tantum, as referred to in #:Category:English pluralia tantum above. A good presentation could be applied to many entries of this type, though not to all members of the category. DCDuring TALK 16:47, 28 January 2013 (UTC)[reply]

I have created [[baked good]]. I don't think that this is SoP in the singular or the plural in its overwhelmingly most common use, where it only refers to items of food with a flour as a main ingredient. DCDuring TALK 17:19, 28 January 2013 (UTC)[reply]

Not SOP. Baked ham, baked beans, macaroni casserole, meatloaf, baked mackerel, beggar’s chicken, and perhaps a coat of paint on a car are all goods that have been baked, but are not baked goods. Not plural only, merely plural often, because a singular baked good has a specific name.

I don't know if flour is defining, though. Isn't cheesecake a baked good? Is a corn tortilla a baked good? (Maybe.) Pizza? Something about bakers, bakeries, dough, or pastry might be more useful. —Michael Z. 2013-01-28 22:49 z

Insofar as a cheesecake is considered a baked good it might be because it typically has flour either in its body (eg, some Italian cheesecakes) or in any crust. Baked goods can be made at home so at least one definition can't be dependent on that, though much of the usage is in business and government statistics, where the usage is limited to commercial products. I suppose we might weasel out by using "typically" or "especially" in the "flour" portion of the definition, but most usage isn't especially precise. DCDuring TALK 23:44, 28 January 2013 (UTC)[reply]

Wikipedia says: 'As well as bread, baking is used to prepare cakes, pastries, pies, tarts, quiches, cookies, scones, crackers and pretzels. These popular items are known collectively as "baked goods," and are sold at a bakery.' --BB12 (talk) 08:57, 29 January 2013 (UTC)[reply]

That is a pretty inadequate definition, mostly because of the "sold at a bakery clause." I doubt that we should take that definition to mean that baked good(s) means that those items are not included if given away, yet to be sold, never at a bakery (homemade), sold off the bakery's premises, etc.

Wordnet has "foods (like breads and cakes and pastries) that are cooked in an oven". That would make roasted meat a "baked good" if, say if was made like a pastry by putting it in a pastry, like a meat pie, Beef Wellington, or an empanada. Any cheesecake not baked in an oven would be excluded, as would crustless tarts. What about an item the only part of which was cooked in an oven was the crust? And what about fried items like donuts?

The effort to make a precise definition soon founders on the imprecision of usage.

The central items in the category are, 1., food items (potentially edible), 2., baked ("dry heat", not roasted) in an, 3., oven, 4., with a flour as an important ingredient. They are also traditionally sold at bakeries, but now more often at grocery stores. The exceptions to 1 are very rare, to 2 and 3 less so at least in modern times, and to 4 less again. Historically, breads were made over open fires and some similar items are made in pans or on griddles. It's a fuzzy category indeed. DCDuring TALK 10:02, 29 January 2013 (UTC)[reply]

Roasting is also typically by dry heat, but via direct radiation, rather than convection.

The term “baked goods” sounds to me like it’s from a commercial context, where goods= merchandise, commodities, etc. Sure, what came out of my mom’s oven were baked goods, but could a definition not be something like “bread, pastries, cakes, pies and other similar similar products typically from a bakery?” I think crustless tarts and even fridge cheesecakes are baked goods by association, similarity, or origin – you would find them on the baker’s table. Certainly a meat pie is as much a baked good as an apple pie. Doughnuts, jambusters, pampushky, &c? – not baked, but if they were in a bag with other stuff from the bakery, I wouldn't object to the collective term for the whole lot.

I think it all ends with “and similar items.” I think sometimes a correct definition is necessarily imprecise. —Michael Z. 2013-01-30 02:42 z

Yes a lot of the usage of baked good(s) is from the food industry, but more of the Books usage is from recipe books. That's why I'd like to skip the "bakery" part. But "food items, typically baked, using a dough, such as bread, ..., and similar items otherwise prepared." But cleaning up the wording would be nice. DCDuring TALK 03:51, 30 January 2013 (UTC)[reply]

Help turn ideas into grants in the new IdeaLab

I apologize if this message is not in your language. Please help translate it.

Do you have an idea for a project to improve this community or website?
Do you think you could complete your idea if only you had some funding?
Do you want to help other people turn their ideas into project plans or grant proposals?

Please join us in the IdeaLab, an incubator for project ideas and Individual Engagement Grant proposals.

The Wikimedia Foundation is seeking new ideas and proposals for Individual Engagement Grants. These grants fund individuals or small groups to complete projects that help improve this community. If interested, please submit a completed proposal by February 15, 2013. Please visit https://meta.wikimedia.org/wiki/Grants:IEG for more information.

Thanks! --Siko Bouterse, Head of Individual Engagement Grants, Wikimedia Foundation 20:19, 30 January 2013 (UTC)[reply]

Distributed via Global message delivery. (Wrong page? Correct it here.)

Indexing appendices

We now have thousands, perhaps tens of thousands of appendices, and no index or system of organization that I'm aware of, other than categories. I have created Appendix:Appendices to begin to get a handle on them. This is a necessary thing, but far too big a project for one editor to handle. Help, please - particularly if you've created appendices, make sure they are listed and properly classified. The classification that I have begun is rudimentary and entirely subject to whatever change will make this directory most comprehensible.

I note also that there is a lot of variation in the naming of appendices, and at some point, some conformity will need to be imposed. I would begin by proposing that all appendices for terms in fictional universes should be moved from "Appendix:Name of work" titles to "Appendix:Fiction/Name of work" titles. Cheers! bd2412 T 03:14, 31 January 2013 (UTC)[reply]

Note to those interested: see WT:RFDO#Appendix:Appendices. —Μετάknowledge^{discuss/deeds} 06:17, 31 January 2013 (UTC)[reply]

Speedily merged and redirected to WT:Index to appendices, which is equally incomplete and in need of restructuring. Cheers! bd2412 T 12:18, 31 January 2013 (UTC)[reply]