Jump to content

Wikipedia talk:Bot policy

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Susvolans (talk | contribs) at 17:05, 18 December 2005 (Bot permission please?: support). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This is not the page to request a bot.

Post here to ask permission to run a bot yourself. To request that someone write a bot to do something, please see Wikipedia:Bot requests instead.

How to ask for permission:

If you want to run a bot on the English Wikipedia, please follow the policy at Wikipedia:Bots, explain the use of your bot on this talk page, wait a week then ask for a bot flag at m:Requests for bot status if no objections were made.

How to file a complaint:

If a bot has made a mistake, the best thing to do is to leave a message on its talk page and/or that of its owner. For bots on their initial one-week probation, please also leave a note on this page.

If a bot seems to be out of control, ask an adminstrator to block it temporarily, and make a post below.

Authorized bots do not show up in recent changes. If you feel a bot is controversial and that its edits need to be seen in recent changes, please ask for the bot flag to be removed at m:Requests for bot status.

Archives:

Previous approvals and random discussions:

General policy discussions:


Footnote bot

This bot will be used in the assistance and correction of pages using footnotes. No proposed name has yet been suggested for the username and the code for this is still unwritten.

I am proposing that the bot performs like this: (all pages refer to subpages of the bot's user space or a Wikipedia Project page)

  1. Every hour, check for articles listed by users at a /To Do/2024 November 8 subpage... either at its user page or a Wikipedia project page
  2. Fix template usage of the footnotes on that page and re-arrange them in order on the given page
  3. Remove added article on the /To Do/ page to /Completed/2024 November 8 subpage.

The initial suggestion was to actually browse all pages using the template. I think that's a bad idea, as we're at half a million articles and the amount of pages the bot needs to work at is really limited. Personally, I like the idea of having a dog like bot where you tell it, "Fix bot, fix." is a better implementation. That way, 1) it doesn't need to bog the Wikipedia down searching for articles it needs to fix, 2) articles can be fixed when they need to. Users would just simply leave the footnotes out of order and the bot would come around to correct the ordering. -- AllyUnion (talk) 06:53, 15 Mar 2005 (UTC)

There have been several different footnote proposals and corresponding templates. Would this theoretical bot be working with Wikipedia:Footnote3? Certainly crawling all articles that use footnotes would put undue burden on the site. But much of the point of the bot would be to catch footnotes which accidentally get put out of order. A good compromise would be to load the "todo" page with the results of a Wikipedia:Database download analysis. It would also be nice to convert all pages to use the same footnoting system, but that might require broader community consensus. (Unless there are only a few non-standard pages, in which case they can be re-done by hand.) -- Beland 01:08, 15 September 2005 (UTC)[reply]
I don't know about the theoretical bot, but you can look at the SEWilcoBot contributions and see what it does to articles when I use my References bot. It's under development and I run it as a Footnote3 helper on specific articles. There is a style discussion at Wikipedia_talk:Footnote3#Footnotes_vs._inline_web_references. (SEWilco 02:40, 15 September 2005 (UTC))[reply]

Underscore replacement bot

This could optionally be added to Grammar bot as an extension... anyway...

Basic idea: Change text in {{text}} and [[text]] to remove all underscores. The only exception that it will not change is anything within <nowiki></nowiki> tags. Are there any other considerations that this bot would need to make? -- AllyUnion (talk) 08:56, 15 Mar 2005 (UTC)

I can program that. I will run the basic query I will run off, but it will have many many false positives. (No worries, it won't make edits that are false positives.) I have the 09-03-2005 dump. r3m0t talk 17:20, Mar 15, 2005 (UTC)
There will be a few articles that have "_" as part of their proper name; the only one I can think of off-hand is _NSAKEY. Links to such articles shouldn't have underscores removed. — Matt Crypto 18:06, 15 Mar 2005 (UTC)
I found the following articles with _ (space) at the beginning: [[_]] (yes, that's right, it does exist) _Hugh_Sykes_Davies _Swimming_at_the_2004_Summer_Olympics_-_Men's_400_metre_Freestyle and about 40 not in the main namespace. I found stuff with underscores inside the {{wrongtitle}} tag: FILE_ID.DIZ linear_b mod_parrot mod_perl mod_python _NSAKEY Shift_JIS Shift_JIS art strongbad_email.exe (here showing the ideal names, not the actual ones). Any which do not have a wrongtitle tag deserve the changes they get. Hmmph. r3m0t talk 20:55, Mar 15, 2005 (UTC)
Is this bot still planned? There's a small discussion of the underscore habit at the VP. — Matt Crypto 16:10, 28 Mar 2005 (UTC)

This sounds like a solution in search of a problem, to me. I'd be obliged if it could avoid any templates or tables in its processing, because I use the underscore separator in order to keep multi-word links on the same line, rather than having a wrap at the internal space. I changed the section name as well, to reflect that it's not a correction, but a replacement. Noisy | Talk 16:43, Mar 28, 2005 (UTC)

There is certainly a problem. All right, it's no more than an aesthetic annoyance, but I've seen no end of links like Main_Page in the main text for absolutely no reason.
The VP discussion also says something about using non-breaking spaces in links, but that doesn't seem to work. Nickptar 14:57, 30 Mar 2005 (UTC)

pending deletions

Can someone write a bot to move all articles in category:pending deletions out of the main namespace, e.g. to talk:foo/pending or Wikipedia:pending deletions/foo and then delete the resultant redirect? Is that a good idea? 131.251.0.7 12:30, 16 Mar 2005 (UTC)

It's possible. I think it's a good idea. Maybe I'll program that, and it could be made by the weekend (I think). You could ask at Wikipedia:Bot requests for somebody to program it. r3m0t talk 13:57, Mar 16, 2005 (UTC) PS you ought to register.


Numbers and commas - possible additional use for Grammar bot?

Just an idea... maybe convert all those large numbers like 100000000 to something with commas like 100,000,000. Some false positives to consider are article links, years, stuff in mathematical articles, and stuff in formatting. -- AllyUnion (talk) 08:18, 19 Mar 2005 (UTC)

Years are easy: ignore everything under 10000. I will consider it. r3m0t talk 10:10, Mar 19, 2005 (UTC)
Please consider that the comma use is not universal, in large parts of Europe, it is common to interchange the decimal point and the comma. We have a decimal comma and write 100.000.000. It would rather see a form like 10^8. Siebren 16:48, 15 October 2005 (UTC)[reply]
Yeah but this is the English language Wikipedi, and as far as I know most/all english language countries use a comma. Martin 20:15, 28 October 2005 (UTC)[reply]
Not all. South Africa uses the comma as a decimal seperator, and a space as the seperator. As far as I know, this is SI standard, since in mathematics, a dot means multiply (in grade 7 my maths teacher told us that this is why SA uses decimal comma) --Taejo | Talk 12:19, 9 November 2005 (UTC)[reply]
Also note that even in English, scientific writing tends not to use commas. I think the SI standard seperator is a thin-space, when they are used at all. --Bob Mellish 21:41, 28 October 2005 (UTC)[reply]
There was discussion of this on Wikipedia talk:Manual of Style (dates and numbers)#number notation at some length recently. Some people proposed making the SI style the standard for all wikipedia, and banishing all use of commas as number seperators. others objected. It might be a good idea to read this before starting any bot of this type. DES (talk) 21:52, 28 October 2005 (UTC)[reply]

You can work out how to seperate it, but I think this is a great idea and should be implemented in one form or another. HereToHelp (talk) 02:20, 2 November 2005 (UTC)[reply]

Bot to update Dutch municipalities info

I'd like permission to use a bot to update the pages on Dutch municipalities. Things the bot wants to do: update population etc. to 2005; add coordinates to infobox; add articles to the proper category. After that, I may use it as well for adding infoboxes to the articles on Belgian municipalities, and perhaps those of other countries. Eugene van der Pijll 21:45, 27 Mar 2005 (UTC)

Psychology and mental health content only bot

I'd like permission to maintain a small collection of read-only psychology and mental health-only related material through a bot. This bot would prevent search engine indexing, and limit accesses via its interface. Given the tiny percentage of material I'm interested in, downloading and parsing the enormous database dumps is not an option given my limited server resources. Thank you. -- Docjohn (talk) 12:31, 31 Mar 2005 (UTC)

RCBot

I'm requesting permission for a bot still being in development: User:RCBot. Its purpose is to help with issues on the Wikipedia Commons. There is only one task that the bot is supposed to do at the moment: help renaming media files on the commons. Suppose there were two identical files (e.g. images) on the Commons that might even have been uploaded from different language Wikipedias. But media files on the Commons can be used in any language Wikipedia and so all languages need to be checked and – if necessary – the reference to the first file replaced by one to the other. This is what the bot is supposed to do and it therefore also needs permission in the English wikipedia. — Richie 18:24, 4 Apr 2005 (UTC)

McBot

I have received a copy of the software Kevin Rector uses for KevinBot. It is used to transwiki pages marked with {{move to Wiktionary}} to Wiktionary. I just registered the username McBot for it and plan to use that for any botting. Requesting permission... --Dmcdevit 05:51, 9 Apr 2005 (UTC)

I'd like to verify that I did give him a copy of the bot software for transwikification so that more than one person could patrol this. I've also trained him in how to use it and would support his bot account being flagged. Kevin Rector 14:50, Apr 9, 2005 (UTC)


JdforresterBot ("James F. Bot")

Heya.

I've created this account to do some boring and exhausting little jobs, like correcting the 700 inbound links each to a set of 40 or so pages to be moved.

May I please have bot status?

James F. (talk) 22:50, 12 Apr 2005 (UTC)

Bot status request

I would like to request bot status for User:Diderobot. It is semi-automated and I plan to use it to fix grammar, spelling, punctuation, as well as wiki syntax and double redirects. Sam Hocevar 10:17, 13 Apr 2005 (UTC)

Can you be a little more specific? -- AllyUnion (talk) 10:33, 23 Apr 2005 (UTC)
By semi-automatic you mean it's manually assisted? And how does it fix grammar, spelling punctuation? Based on what type of dictionary? -- AllyUnion (talk) 06:51, 24 Apr 2005 (UTC)
By semi-automatic I mean all changes are validated by hand. It first runs offline on a database dump and displays a list of the modifications it is going to apply. I accept or refuse each of them manually. Then the bot runs online, downloading, modifying and saving articles according to the validated changeset.
As for the dictionary, it uses a wordlist and a set of regexp generators matching common mistakes. For a very simple exemple, the function gen_check_with_suffix('..*', 'iev', 'eiv', 'e|ed|er|ers|es|ing|ings') will generate the following regexp: \b([nN]|[sS]|[aA](?:ch|ggr)|[bB](?:el)|[dD](?:isbel)|[gG](?:enev|r)|[fF](?:latus-rel)|[hH](?:andkerch)|[kK](?:erch)|[mM](?:ake-bel|isbel)|[oO](?:verach)|[nN](?:eckerch|onach|onbel)|[rR](?:el|epr|etr)|[uU](?:nbel|nderach|nrel)|[tT](?:h))eiv(e|ed|er|ers|es|ing|ings)\b which matches spelling errors such as "theives", "acheive", "disbeleiving", etc. Sam Hocevar 08:15, 24 Apr 2005 (UTC)
Dear lords, that is a scary regex... Clever though, and a good idea. You get a cookie for it. :p --Veratien 18:33, 7 August 2005 (UTC)[reply]

Wikipedia:Introduction, Sandbot header enforcement

Adding Wikipedia:Introduction header enforcement for the top two lines. -- AllyUnion (talk) 09:59, 14 Apr 2005 (UTC)


LupinBot

This bot has been uploading maps for a little while now, blissfully unaware of the existence of this page. It's a little wrapper around upload.py

Apparently there's a bot flag I need, so I'm listing it here. Lupin 00:29, 20 Apr 2005 (UTC)


Request bot permission for mathbot

I use a bot called mathbot, to primarily do houskeeping in the math articles. So far, it was concerned with removing extra empty lines, and switching to some templates. More uses might show up. I make sure I never use it for more than 60 edits in one day (there is no rush :) Could I register it as a bot? Thanks. Oleg Alexandrov 17:59, 20 Apr 2005 (UTC)

Can you be more specific? -- AllyUnion (talk) 10:36, 23 Apr 2005 (UTC)
Specific about what? Oleg Alexandrov 14:06, 23 Apr 2005 (UTC)
By housingkeeping you mean exactly... what type of tasks? Spell checking? Formula correction? What? -- AllyUnion (talk) 06:50, 24 Apr 2005 (UTC)
I wrote above I used it for removing extra empty lines and switching to some templates. I also did lots of semi-automated spelling, but preferred to use my own account for that, as this was trickier and I wanted the resulting pages on my watchlist.
So, I used my bot for nothing else than what I wrote above, that's why I can't be more specific. About the future, I don't know what will show up. Oleg Alexandrov 15:04, 24 Apr 2005 (UTC)
Why don't you use it for more edits? Why does it restrict itself to math articles? r3m0t talk 13:51, Apr 23, 2005 (UTC)
To do more general work, I would need, in most cases, to have a local copy of Wikipedia to do queries. I can't afford to download the whole Wikipedia and the mysql database. If, at some point, jobs show up for which I do not need to have the local download, I can take care of them. Oleg Alexandrov 14:06, 23 Apr 2005 (UTC)

Tagging

I just started developing a bot which, at the moment, bears the name Tagbot. It would help with the tedious work that is Image Tagging. It's task would be simple, it would find untagged images and tag them with {{No source}}. Then it would find the user who first uploaded it and leave a message on his/her talk page saying something like "Hi! Image XXX that you uploaded is untagged. This is bad because..... You tag an image by putting......etc.". This purpose of this would be:

  1. Many images would be automatically tagged by the user
  2. It would tag all images with {{unverified}}, and thats better than nothing.
  3. It would make actual image tagging much easier to organize, as you simply would have to look at Category:Images with unknown source.

I was thinking it might be best to hear comments and/or get permission before I got too deep into the process. So what y'all think? By the way, it might also be a good idea to change Mediawiki to automatically add an unverified-tag to a new image, to ensure no untagged images. So, what y'all think? Gkhan 01:23, Apr 30, 2005 (UTC)

One problem with this: a number of images have copyright information included, but it's in a pre-tagging form. Would it be possible to make this bot into a semi-automated process? What I'm envisioning is
  1. The bot finds an untagged image
  2. It shows the image, on its Wikipedia page, to the user monitoring the process
  3. The user either indicates the tag that should be added to the page, or indicates that the bot should tag it {{unverified}} and post on the uploader's user page.
If the whole system is made simple enough, copies of the bot software could be distributed to everyone who's participating in the image tagging process. How does this sound? --Carnildo 01:44, 30 Apr 2005 (UTC)
It does sound good, I like it alot. One thing though, the way I was imagining the bot was that I should download the database and look up the images to be tagged, because thats the only way I can think of that would work short of making database requests. I suppose I can do two versions, one that tags all images with something like {{to-be-tagged}} an then the tagging-"client" that is distrobuted could read off that list (the Category-list that is). I am not qualified to asses the server hog this would create, but it sounds good (great!) to me. Gkhan 01:57, Apr 30, 2005 (UTC)
My understanding is that, with the current server software, categories with large numbers of pages should be avoided if at all possible. Viewing such a category puts almost as much load on the server as viewing each individual article in the category. --Carnildo 02:10, 30 Apr 2005 (UTC)
That can't possibly be true! When you add or remove a category from an article doesn't some sort of list become edited that the Category-page reads from (atleast thats what I, an amateur, incompetent and inexperienced programmer would do (I'm not really that bad, only as a comparison to mediawiki-developers)). Viewing a category-page should only be as hard as reading from that list, right? Although when you look at an image category, it displays all images. That most certainly is a server hog (it would be nice with some developer-input though). Gkhan 02:41, Apr 30, 2005 (UTC)


WouterBot

I would like to use pywikipedia's solve_disambiguation.py to facilitate my disambiguation work. I created an account User:WouterBot for this and now I would like your permission to use it. WouterVH 14:27, 4 May 2005 (UTC)[reply]

  • Feel free to do so. Running that script requires human intervention to complete a disambiguation. It's what I originally setup my bot account to do. RedWolf 04:47, May 17, 2005 (UTC)



Pending deletion script

Shouldn't User:Pending deletion script be listed on Wikipedia:Bots? Gdr 20:03, 2005 May 12 (UTC)

No, because it's a dev's bot, and should be watched anyway. -- AllyUnion (talk) 00:52, 14 May 2005 (UTC)[reply]
That and it seems to be done. -- AllyUnion (talk) 00:53, 14 May 2005 (UTC)[reply]


tickerbot

I've finished a bot that can upload quotes as well as other financial data (such as market cap, dividends, etc) on publically traded companies daily, from the machine readable interface provided by Yahoo Finance. It downloads a list of stock symbols and the format template from wiki pages, then sticks them data for each symbol in its own page in the Template namespace (something like Template:Stock:AAPL, for example).* Is this something that people are interested in? I guess it boils down to a somewhat philosophical question as to whether this kind of timely information should be considered "encyclopedic".

  • Does this violate Yahoo Finance's Terms of Service or any other legal considerations? Most stock ticker apps that I know of use Yahoo Finance's interface for data, but it may be different for something of this scale.

Questions:


Taak 20:51, 23 May 2005 (UTC)[reply]

Exactly how many symbols are we talking about? I think it is a good idea, but it seems a bit excessive. I find that "linking" the stock symbol would be better. -- AllyUnion (talk) 08:07, 25 May 2005 (UTC)[reply]
It could work for as many symbols as desired. It would download the list from a wiki page. Taak 18:40, 31 May 2005 (UTC)[reply]
What I'm mainly concern is that there are so many symbols for companies and that such a bot may overtax the servers for updating market information every time the market closes... depending on which market you're talking about and such... -- AllyUnion (talk) 08:53, 2 Jun 2005 (UTC)
That's a valid concern, but how many edits/day would it take to be noticeable? If it was updating daily it could certainly run during only off-peak hours. If we could get some numbers here we could figure it out. Taak 02:22, 7 Jun 2005 (UTC)
The question would be more of how many pages and how many symbols are we talking about, and which markets are you referring to. Furthermore, I'd very much prefer to keep the bot's edits out of the template space. Template:Stock:AAPL seems a bit excessive and rather unusual. Let me see here... the NYSE starts at around 9 AM ET and closes at 5 PM ET. So, you'd have about 16 hours to update NYSE stocks. Assuming if you updated a page every 30 seconds, which is the typical recommended editing speed, you'd only be able to update 1920 stocks out of the 2800 stocks traded on NYSE. That only gives you 68.6% of all the stocks on NYSE. Assuming if we allow you to "push" the limit, having an edit period of 10 second apart... 28000 seconds which is 466 minutes and 40 seconds which is 7 hours, 46 seconds, and 40 seconds. So, you'd have to have your bot edit at least 15 second part each day to update stock information on all NYSE traded stocks. Do you think it still is a bit excessive? -- AllyUnion (talk) 07:11, 10 Jun 2005 (UTC)
I've been thinking of something similar for currencies. How about this: only do full updates once a week (but stagger it through the week) except for particularly volatile stocks (say, stocks that have gone up/down by more than a certain amount since the previous day). But anyway: surely not every NYSE company is notable enough to have a wikipedia article? (I checked - 192 articles in Category:Companies traded on the New York Stock Exchange) --Taejo | Talk 12:48, 9 November 2005 (UTC)[reply]

Requesting permissions for running bots (clarification)

I feel that the policy needs a bit of clarification. I believe that bots should be permitted to run before they receive their bot flag, only once their purpose has been described here clearly and a sysop has reviewed what the bot does. In order to prove the burden of proof, we must see a week's worth of edits on the bot when it is running. Otherwise, we have no way to determine such burden of proof without a sample. The bot flag is given to bots once they have been proven their burden of proof, not for permission to run the bot. Furthermore, in its comment, it should declare what it is doing... as all good bots should. -- AllyUnion (talk) 06:52, 27 May 2005 (UTC)[reply]

I think your proposal is a little draconian. I do think it's fair to ask bot authors to announce their bot here, to describe what their bot is doing, to run under a bot account, and to limit the bot's rate of activity. And it's fair to have a policy of block-on-sight for rogue bots. But I don't think it's fair to require administrator approval to run a bot at all. What if no sysop is available or interested enough to comment? And what gives administrators the insight and knowledge to judge whether a bot deserves to be given a chance to prove itself? Administrators are supposed to be janitorial assistants, not technical experts. Gdr 22:21, 2005 May 29 (UTC)
Ditto; There's more than enough red tape for bots already, please let's not make it any worse than it already is by applying pseudo-legal concepts to wiki bots. Perhaps ask people to go slowly in the first week the bot is run, thus allowing time for feedback before a full rollout; but let's not discourage contributions (or encourage illicit bots) by making the bar so excessively high that people either say "stuff that", or start operating without first requesting approval (which are precisely the outcomes that excessive regulation will encourage). Furthermore, in its comment, it should declare what it is doing - this I do agree with, but as a guideline or recommendation. -- Nickj (t) 02:50, 30 May 2005 (UTC)[reply]
Well, maybe not necessarily an administrator. I think someone should review what a bot does... because I think the whole "if no one says anything" policy should be slightly reconsidered. What I'm trying to say is that bots should be always given a trial period of one week, then stopped and reviewed over by someone. I suggested sysops, because we don't need anyone technical to review if a bot's edits are harmful. Usually a smart person can determine whether a bot's edits are harmful. I suggested sysops because they have the power to block bots anyway... If a bot's edits are harmful, then they should be blocked, however it should be here where a bot's owner can attempt to discuss in order to unblock their bot for another trial run. -- AllyUnion (talk) 05:20, 31 May 2005 (UTC)[reply]


Policy: spelling

There should be no (unattended) spelling fixing bots; it is quite simply not technically possible to create such a bot that will not make incorrect changes; if the big office utility companies can't make a perfect automated spellchecker, you most likely can't either

It seems like this is too harsh. The comparison with word processing spellcheckers isn't a fair one, because the needs are different -- we don't need a bot that would be a "perfect" spellchecker, just one that would (1) make many useful corrections, but (2) when in doubt leave articles alone. It seems like it wouldn't be impossible to make a spelling bot that would have a "certainty metric" for how sure it was that the correction was a needed one, and only make it if it was extremely sure.

I know this page is for requesting bots, but there ought to be a place to discuss the content of this page as well, and if not here I don't know where. Zach (wv) (t) 16:00, 31 May 2005 (UTC)[reply]

Well, why not just have User:Humanbot, released today? ;) r3m0t talk 16:21, May 31, 2005 (UTC)
Wow, that looks really awesome. Is it free? (I.e. can I use it on my wiki too?) BTW thanks for cleaning up those extra posts, it wasn't letting me remove them. Zach (wv) (t)
I agree with these comments. We have spam filters that have exceedingly low false-positive levels (on the order of maybe 1 in 10,000 mails). Why shouldn't it be possible to build a similarly conservative spell checker? Besides this, I can tell you that "the big office utility companies" are almost certainly working on context-sensitive spellchecking technology with much-reduced false-positive and false-negative rates and larger dictionaries. There have been recent papers on context-sensitive spellchecking which found about 97% of errors and produced almost no false positives at all; it could even be trained on the existing body of Wikipedia text, so that it recognizes many specialised terms. I was thinking of creating a tool that does this type of spellchecking. This rule suffers from a lack of imagination.
On the other hand, I really don't see any need for fully automatic spellchecking. Page update speeds are slow enough that human review shouldn't cost much relative time, provided that it's not wrong too often. It can simply list its corrections and ask them to hit ENTER if they're all good. Deco 17:56, 31 May 2005 (UTC)[reply]

A small note here... part of the reason is that we have this difference between various spellings in English. While some people might write Internationalization, others might write it as Internationalisation. Both are still correct, but one is used over the other depending on the country you are from. Furthermore, a spellbot must recognize and skip over Unicode characters as well as HTML codes, wiki codes, in addition to any languages that use partial English for pronounation or actual words. Such examples is Romanji, Spanish, and so on. An automatic bot correcting this may accidently change something in an article not necessarily intended... and a simple revert back defeats the purpose of the bot especially if it made only one mistake out of the entire page that it just corrected. (Assuming that the one mistake is not easily corrected by a human editor.) Just some of my thoughts on the matter. -- AllyUnion (talk) 09:13, 2 Jun 2005 (UTC)

Oh yes, don't forget proper nouns such as names of places, people, things... We don't know how accurate it will be on the Wikipedia. It might be 97%, it might be lower. -- AllyUnion (talk) 09:15, 2 Jun 2005 (UTC)
I think you are right to point out these difficulties, but I think that they are surmountable, and that some one could conceivably make a useful spell checker (defined as finding some small, but nontrivial amount of wrong spellings, and making false-positives, i.e. "corrections" that shouldn't be made, almost never). You would just have to make it extremely conservative, as Deco was saying. It wouldn't get all, or maybe not even most, mispellings, but as long as it got some of them, and didn't make any false-positives, it would be useful. You could program it to not correct anything with a capital letter, for example, which would cover the proper name problem. You could also program it to recognize all the various English spellings pretty easily. It's probably not a huge issue, since I almost never find misspellings, but I think it could be done, and I'd hate to see someone be discouraged from trying if the policy is worded too strongly. Zach (wv) (t) 23:28, 4 Jun 2005 (UTC)
What about sciencific names? There is a certain degree of intelligence it does require... A person can easily determine between a typo and a misspelling... I'm not completely certain a computer can. I would not mind if we had a bot that "prompted" the spelling errors or highlight them in some manner. Nor would I mind that it is a bot that corrects the most common mistakes. Correcting the most common mistakes of misspellings would yield better results, I believe. I'm just really concern that the bot runs into a strange word and it doesn't know it... but finds a match in its dictionary and changes the word. -- AllyUnion (talk) 06:03, 5 Jun 2005 (UTC)
I would not mind if we had a bot that "prompted" the spelling errors or highlight them in some manner. Wake up and smell the roses! r3m0t talk 19:28, Jun 10, 2005 (UTC)

This really isn't possible because there are a number of circumstances in which we'd want to deliberately misspell a word, for example in an exact quotation with [sic] or mentioning "this name is commonly misspelled as foo". We even have articles about spelling usage like Misspelling or Teh. What's considered a misspelling in modern English might have been used in the past before spelling was standardized, so it could appear in quotations from old sources or in an explanation of the etymology of a word. Articles often give "pronounced like" spellings of words that may be the same as common misspellings. What is a misspelling in English might be the correct spelling in a foreign language we're quoting. I'm sure there's other situations I haven't thought of. There's just no way you can rule all of this out, no matter how conservatively you construct your word list. DopefishJustin (・∀・) June 30, 2005 21:36 (UTC)

Quite, there are also rare words like specialty that commercial programs routinely translate into speciality or compAir, which is a company name but would be auto-corrected to compare. (doodlelogic)


Procedure for user supervised scripts

I am presently designing RABot to simplify some of the tasks associated with maintaining the requested articles pages, namely deletion of created articles and sorting / tidying up article lists.

Because of the variations in formatting across different request pages and the complexities of ways in which people make requests, this is a fairly complicated parsing task. The scripts I've created so far are fairly accurate and do a good job of handling many things, but it is unlikely that this will ever be able to run unsupervised since there will likely always be people adding bizarrely formatted requested that the scripts choke on. (It is supposed to ignore things it can't parse, but sometimes it thinks it understands things it really doesn't.) So as a result, I plan to manually check and approve all its proposed edits before committing them.

So my question is, what is the procedure for getting approval for running scripts like this? It is not really a bot in the sense that it will not be running independantly, but it does use the Python bot library and have a seperate user account. The bots page isn't particularly clear on what is expected before running supervised scripts like this.

Once it reaches a point where I feel it is well-behaved enough that I want to start using it in maintaining the requested articles pages, I intend to write a detailed description of what it is trying to do (e.g. remove created articles and sort lists) and discuss its presense on the RA talk pages. Is that enough?

Dragons flight 00:46, Jun 10, 2005 (UTC)

I personally don't think that this particular bot is a good idea. If a certain link in Wikipedia:requested articles becomes blue, that is the article is created, I think it is good for a human editor to inspect that article, see if it indeed corresponds to the title, clean it up if necessary, and only later remove it. Also, having a blue link or two in this list is not something as serios as to spend a lot of time creating and debugging a bot. Wonder what others think. Oleg Alexandrov 03:34, 10 Jun 2005 (UTC)
If you haven't recently, you might want to take a look over Category:Wikipedia requested articles, there are a lot of links even if you aren't paying attention to somewhat ugly dump pages like Wikipedia:Requested articles/list of missing pharmacology. Some users, like User:DMG413, climb high in the edit count primarily by removing blue links from those lists. In a perfect world, you would be right and people would take a careful look at each article before removing it, but a single request page might get 5-10 blue links a day and go for week without being cleaned. At which point no reasonable person is actually going to look through the articles. It is also worth noting that I am (at least for now) ignoring the much larger Category:Wikipedia missing topics. Dragons flight 04:03, Jun 10, 2005 (UTC)
P.S. If you have other suggestions on how better to deal with the growth of and turnover in RA, then that is also a worthwhile conversation, but I find it difficult to accept an objection to simplifying a process that is already consuming a significant amount of users time. Dragons flight 04:06, Jun 10, 2005 (UTC)
Technically, your bot would qualify under being manual controlled "bot assistant." More or less a specialized editor, if you will. Regardless, asking permission from the community helps to confirm what you are doing. I'd recommend that you perform all edits under what "seems" to be a bot account, and give it a test run. Since your bot has an objection from Oleg Alexandrov, please run your bot slow (maybe edits that are at least 1 minute apart), and for about a week. When the week is up, ask Oleg Alexandrov if he still objects to the use of your manually controlled bot. Stop immediately, if anyone makes a complaint that your bot is not working properly or doing something not intended. In any case, I'd like to give you a trial week, and even if you screw up, it still be easy to revert your edits anyway. -- AllyUnion (talk) 06:49, 10 Jun 2005 (UTC)
OK, OK, I don't want to be blocking people willing to do good work. So please feel free to create your bot and use it as you see fit. However, it is sad that the bot will unemploy User:DMG413 who is increasing his/her edit count by removing blue links from said list. :) Oleg Alexandrov 14:46, 10 Jun 2005 (UTC)
Thanks for the clarification. Dragons flight 03:09, Jun 11, 2005 (UTC)

SEWilcoBot

User:SEWilcoBot has been created for various automated tasks, based on the pywikipediabot, often with modifications, to do various things that its creator was too lazy to do by hand. See user page for details and current status.

    • First task: Populating country template arrays for Wikipedia:WikiProject Flag Template.
      • Starting from United Nations list. Downloaded templates which point to flags. Built database of names, flags, and abbreviations.
      • Template array being initialized by script which does nothing if a template exists, and has 60+ seconds delay.

(SEWilco 06:24, 20 Jun 2005 (UTC))

I approve the running of this bot. -- AllyUnion (talk) 06:28, 20 Jun 2005 (UTC)
I disapprove for it getting flag status at this time. -- AllyUnion (talk) 23:00, 21 Jun 2005 (UTC)
      • Present version: custom pywikipedia bot, given a parameter which directs it at a directory containing country info. If a template already exists, does nothing. Else creates template with country name/abbreviation/flag info. Using 61-second put_throttle. (SEWilco 00:44, 22 Jun 2005 (UTC))
  • June 23 2005: Most UN countries initialized. Various sources combined. Initialized table for finding ISO codes when a country name is known. Filling in gaps. (SEWilco 05:15, 24 Jun 2005 (UTC))
  • June 24 2005: All ISO countries defined. All non-obsolete FIFA countries defined. List of FIFA country codes converted to {{country}} format. Next will do conversion of existing ISO templates (ie, United Nations member states list) with 120-second put_throttle. (SEWilco 02:54, 25 Jun 2005 (UTC))
  • June 26 2005 ISO and many non-ISO countries updated. Converted Major League Baseball rosters to {{flagicon}} format. (SEWilco 00:23, 27 Jun 2005 (UTC))
  • July 3 2005 Testing footnote/references helper. (SEWilco 3 July 2005 23:21 (UTC))
  • July 7 2005 Manually used as helper to upgrade footnotes/References in various articles. Converting inline external links and others to Wikipedia:Footnote3 format. (SEWilco 7 July 2005 06:05 (UTC))
  • July 10 2005 Bot flag request: I will be cleaning up {{main}} and related template references and believe it would be polite to reduce Changes clutter. (SEWilco 07:44, 10 July 2005 (UTC))[reply]
    • Details: The misapplication of {{main}}, which is supposed to be used at the top of an article, has been brought up several times. I will modify a pywikimedia tool to change {{main|}} references which are "too far" from the top to a cousin such as {{seemain|}}. "Too far" will be a small number of text lines which have no leading nonalphabetic characters, thus accepting various templates and Wiki markup as being part of the "top". The above test tends to fail by overlooking some candidates, which merely does not change the current situation. I have not here defined the template to change to due to an ongoing discussion (improperly taking place in TfD) which will end soon and may change details, but the discussion will not change the existing tangle to be cleaned up. The needed code is trivial compared to the previous activities. (SEWilco 07:44, 10 July 2005 (UTC))[reply]
    • Schedule: I request the bot flag because by the time there is a week of history its work will be done, and a history of the behavior of my other tools exists. Special:Contributions&target=SEWilcoBot I point out that the mostly repetitious edits of User:SEWilco/Sandbox are examples of the testing given to tools. (SEWilco 07:44, 10 July 2005 (UTC))[reply]

In order to elaborate my reasoning, SEWilco's bot is described as "one-time" usage. I will approve for SEWilco to get his bot flag if he can boil down his specifics of what the bot will be used for. The reason behind this is that if the bot is going to run, it should run for something very specific and not at whim for one time tasks. This is only an insurance to as what we know SEWilco's bot is doing and whether or not the community approves it. --AllyUnion (talk) 06:40, 18 July 2005 (UTC)[reply]

  • Based upon the descriptions of existing bots, and the variety of existing tools, I had not realized the bot flag was assigned for a specific task. It had seemed to me that the bot flag was a courtesy to reduce cluttering the changes visible to users during routine/maintenance alterations. I assumed this bot-usage community would still know what a bot is doing despite the flag (or due to it, if some people/bots have a "show only bot activity" ability). I thought these descriptions are part of that exchange, although some are rather brief. The speed privileges are minimal, and their import is actually in the rules throttling the unflagged to reduce speedy damage from the careless. (SEWilco 09:04, 19 July 2005 (UTC))[reply]
    • There was a whole issue with Anthony DiPierro where his bot was doing something that it was not suppose to do. A complaint was made here about it. What I am finding is that less and less people are worrying about what bots are doing and more and more are paying attention to vandalism. I am not saying you are a bad user, but I don't want somewhere down the line where users complain about why this and that person is permitted to run a bot that is doing something it wasn't designed or suppose to do. I don't mean to offend you or sound very bureacratic, but I feel that very few people come to review this page. --AllyUnion (talk) 04:58, 29 July 2005 (UTC)[reply]

If you can generalize this down to something like, "bot specifically to maintain templates," then I'd be happy to accept that generalization. --AllyUnion (talk) 03:27, 9 September 2005 (UTC)[reply]

User:Brendan OShea running unauthorised spelling bot

User:Brendan OShea User talk:Brendan OShea special:contributions/Brendan OShea has been running an unauthorised spelling bot. I've blocked him for 24 hours, and to be fair I couldn't see him doing any damage, but can someone with a bit more knowledge have a word with him? Dunc| 2 July 2005 18:26 (UTC)

are you sure its a bot. it could just be a user using a spellcheck plugin in thier browser or something similar. Plugwash 2 July 2005 22:11 (UTC)
Any bot which requires human review of its edits is not a bot but an alternate interface, and should not provoke a ban. Besides, it's courteous to ask them about it first on their talk page. Also see User:Humanbot. Deco 2 July 2005 23:48 (UTC)
I have offered them to set up a User:Humanbot project on their user talk. r3m0t talk July 3, 2005 11:50 (UTC)
BTW In case someone wanted to do this as well: Everyking already unblocked. -- User:Docu



SEWilcoBot flag request

Bot flag requested for SEWilcoBot. Details at: Wikipedia_talk:Bots#SEWilcoBot above. (SEWilco 07:47, 10 July 2005 (UTC))[reply]

User interface helper under development

I am currently developing a tool to help me with my manual edits (especially RC/New Pages patrol), i.e. I visit a page and tell it manually what to do with it. Planned features are: adding (categorised) stub tags, if this works I'll probably add VfD nomination (I think the current process is a bit tedious to do manually), of course with user-supplied reasoning texts. I would like to emphasise the fact that this tool is not planned to have any spidering or other autonomous/high-volume features and will not create any new pages (other than VfD discussions once this feature is added). It is therefore believed that the tool does not have the problems as listed under WP:Inherent bot drawbacks. This is therefore not a request for a bot flag (as I think it shouldn't have one; they're still low-volume user-initiated edits), but a general check for concern. Similarly, like Humanbot, I doubt the desirability of a separate user account. The bot will be tested on the Sandbox during development (providing manual reverts if the bot malfunctions). It is being developed using PHP/cURL. Your thoughts? --IByte 22:49, 12 July 2005 (UTC)[reply]

Stub tag addition has been implemented, I'll be editing some pages with it. See my bot page fur further information. --IByte 20:51, 16 July 2005 (UTC)[reply]


Permission to run SecuniBot

I request permission to run User:SecuniBot to update the vulnerability counts in Comparison of operating systems.

The bot fetches all Secunia pages linked to in the article, counts the critical advisory, and updates that number in the article accordingly. It also updates the footnote that specifies when the table was last updated. The secunia.com terms and conditions [3] seem to permit such usage of their site, since their information is not redistributed, only linked to.

I plan to run the bot manually at first, and if it works well, run it as a cron job once a day.

The source is available for review at User:SecuniBot/source.

--K. Sperling 14:54, July 19, 2005 (UTC)

I'd say go for it, neat little bot offering helpful functionality. --Yogi de 11:09, 23 July 2005 (UTC)[reply]
I'm not too experienced with Python (I prefer Perl), but it looks like the script will edit the page each time it is run, regardless of if the numbers have changed or not. Would it be possible to change this so it only edits if needed? --Carnildo 05:11, 24 July 2005 (UTC)[reply]
Well, I would have preferred Perl myself, but the pywikipedia modules that handle all the low-level details of interacting with Wikipedia are written in Python, so I just used those :-)
The bot only makes an edit if anything is to be changed, which will be once a day because it also updates the "this information was last updated on" line below the table. Of course it would be possible to not do that, but I think it is important to tell the reader of the table whether the data is current or not. Note that the edit is marked as minor if only the date is updated, or as a normal edit otherwise. --K. Sperling 11:38, July 24, 2005 (UTC)
Technically, you could call Perl from command line. --AllyUnion (talk) 05:01, 29 July 2005 (UTC)[reply]
I'd like this idea. I thank to K. Sperling. Would it be possible to apply this also on Comparison of web browsers#Security, wrote a severity of highest critical unpatched vulnerability for each product and date of oldest vulnerability? I underestand that K. Sperling prepared script, that only update number of vulnerabilites, so we would still must manually update dates (and severity at browsers). --Ptomes 06:42, 24 July 2005 (UTC)[reply]
Currently it updates the number of unpatched vulnerabilities marked "less critical" and above, and also the date of the oldest of those, if any. However it only understands the format that is used in Comparison of operating systems. I'm sure you could adapt it to other uses, though. --K. Sperling 11:38, July 24, 2005 (UTC)

I would like to run pywikipediabot with bot status to fix links to disambig pages here on en. This will use the solve_disambiguation.py script. – ABCD 02:37, 20 July 2005 (UTC) (edited at 23:27, 22 July 2005 (UTC))[reply]

Approved for test run of one week. --AllyUnion (talk) 05:02, 29 July 2005 (UTC)[reply]

Harley-Davidson bot

As both a grammar-nazi and a motorcycle enthusiast, seeing "Harley Davidson" just irks the crap out of me.

It's not a guy, Mr. Harley Davidson. It was William Harley and the three Davidson brothers who made "The Harley-Davidson Motor Company".

I've been fixing such mistakes when I come upon them manually, but there are a lot ([4]) to do. All I want to do is make a bot which strafes the wiki entries which use the phrase "Harley Davidson" and change it to "Harley-Davidson".

I did a (naughty) single-page test on Gay Byrne with the python script to verify that I know what I'm doing (turns out I do, on the first try). I would like to let the bot walk all referring pages.

Comments or Objections?

boinger 21:07, 21 July 2005 (UTC)[reply]

What is the bot's name? Approved for test run for one week after posting of bot's name on this page. May apply for bot status after one week test run duration if no complaints are made. --AllyUnion (talk) 05:04, 29 July 2005 (UTC)[reply]
How about HarleyBot? --boinger 16:12, 1 August 2005 (UTC)[reply]
Sign up User:HarleyBot, run it for one week, then you may apply for bot status. --AllyUnion (talk) 04:30, 20 August 2005 (UTC)[reply]


Hi I've put together a little bot that will read all the pages linked to from the Main Page and parse them searching for bad grammar. I then take the result file and copy paste it to my user talk page. It does NOT make any changes to any pages or even have the capability to do so. I have more info at My user talk page. Since it seemed to be the norm I created an account GrammarGremlin for running the bot. My own account is the username Tubelius. Do I need to get any special permisions for running this? GrammarGremlin 23:32, 27 July 2005 (UTC)[reply]

Approved for test run for two weeks. Do you plan to apply for a bot flag or not? --AllyUnion (talk) 05:06, 29 July 2005 (UTC)[reply]
If you are the person making the changes, you should make them under your own account. I cannot see that many of the mistakes it shows are real - are you improving this? :) r3m0t talk 14:00, July 29, 2005 (UTC)
I don't know that I need a bot flag because my bot does not make changes. That is my current understanding of the designation. Please inform me otherwise. R3m0t, I am working on improving the percentage of actual grammar mistakes caught. The biggest problem seems to be abbreviations being clipped as sentences.Tubelius 03:42, 30 July 2005 (UTC)[reply]
It seems his bot is generation of stats on the most common errors of grammar. --AllyUnion (talk) 08:10, 30 July 2005 (UTC)[reply]


I would like to run this bot to fix links to redirects to lists (see the page for an explanation). ~~ N (t/c) 01:08, 5 August 2005 (UTC)[reply]

Approved for test run for one week, then may apply for bot status after no complaints. --AllyUnion (talk) 08:38, 6 August 2005 (UTC)[reply]
Personally I don't think its an advantage to have
  • [[List of minor Star Wars characters#Panaka, Captain|Captain Panaka]]
  • instead of [[Captain Panaka]]
it adds a lot of wiki syntax to pages. In general, Redirects with possibilities should never be replaced with their target. Captain Panaka isn't one though. -- User:Docu


Hello. I exist. I'm not doing anything automatically, yet. So my owner doesn't want a 'bot flag for me, yet. My owner will let you know when I'm ready to start doing real work. Uncle G's 'bot 11:22:55, 2005-08-05 (UTC)

DELBOT

SANDBOT

  • After successful manual test runs, SANDBOT is now scheduled to be automatically run, editing under the aegis of User:Uncle G's 'bot. See the user page for full details of SANDBOT. Like VFDBOT, SANDBOT will be making at most 2 edits per day. This is because it only performs two of the sandbox cleaning tasks that used to be performed by User:Sandbot, and those not as frequently. More sandbox cleaning tasks can be added, or a greater frequency configured, upon request. Uncle G 02:01:58, 2005-08-06 (UTC)
    • After successful test runs, SANDBOT has now been extended to clean the tutorial and template sandboxes once per day as well. This increases the maximum number of edits per day (some sandboxes are not frequently used, and so the edits to clean them will be null edits) to 24. Uncle G 14:01:48, 2005-08-16 (UTC)
    • SANDBOT is still cleaning the template sandboxes, but has been configured to not clean the main and tutorial sandboxes. The number of edits per day is thus reduced. Uncle G 00:43:01, 2005-08-20 (UTC)
      The template sandboxes are not edited very much... at least from what I can see. --AllyUnion (talk) 06:40, 2 September 2005 (UTC)[reply]

Uncle G's major work 'bot (talk · contribs) has been created to handle major tasks. The 'bot flag will be requested for this account, as it is intended, per the name, to be for tasks that require large numbers of edits. For details of current work, see the user page. For details of planned work, see Wikipedia talk:Votes for deletion. Uncle G 18:48:45, 2005-08-28 (UTC)

  • Permitted to apply bot flag, one week from 8/28. --AllyUnion (talk) 06:37, 2 September 2005 (UTC)[reply]
    • I'm a little concerned that the bot has blanked all of the comments in the process of moving VfD to AfD [5] --Tabor 22:52, 29 September 2005 (UTC)[reply]
      • The 'bot doesn't edit articles. It only renames them, using Special:Movepage. An edit+rename would show up as two separate entries in the history, in any case. What we are seeing here is one. Indeed, I cannot think of a way to actually do what appears to have been done here — to rename an article and modify it at the same time. There's no way to do that using the normal web interface as far as I know. I suspect a server error. Perhaps some edits are not being displayed in the history. Uncle G 23:30, 29 September 2005 (UTC)[reply]
      • I've contacted Brion Vibber, who has done some checking. As I understand xyr explanation: The wrong version of the page was listed in the server database as the current version, causing the server to revert to that version when the article was renamed. This is a database problem and nothing to do with the 'bot. The exact same thing would have happened had anyone renamed the article manually. Uncle G 12:36, 30 September 2005 (UTC)[reply]

I'm writing a bot which I plan to run as User:Cobo once it's finished. The basic idea is detailed on its user page. In short, it'll hunt out copyvios in new articles and change them to the {{copyvio|url=blah}} template. I might also add in the ability for it to detect and revert common types of obvious vandalism (also detailed on the user page), but this will need extensive testing on a dummy Wiki before it would ever come near here.

Ideas? Suggestions? Thoughts? Threats? :p --Veratien 19:53, 7 August 2005 (UTC)[reply]

Sounds like it might catch pages with a long fair-use quote. But good idea. ~~ N (t/c) 20:28, 7 August 2005 (UTC)[reply]
I'm going to try and account for that. It'll check the article for, "fair use", "public domain", and other buzzwords that will reduce the score, and check the source article for the same as well. --Veratien 20:36, 7 August 2005 (UTC)[reply]
How is that going to work? Will it google for things and look for text that is basically identical, or what? By the way, don't forget that alot of copyvio'ers use the "borrowed" material as a source, but it's not neccesarily listed properly. And, it's a good idea to put the {{nothanks}} template on their user page. Maybe make a version that notes that it was done by a bot, and give a link to your talk page, so they can ask about it. I think you could do this and do it quite well, personally. --Phroziac (talk) 23:39, 7 August 2005 (UTC)[reply]
As I said above, the basic way it'll work is detailed above. I've already made some changes to things, and the basics are currently running in #en.wikipedia.copyvios. It doesn't actually do anything yet except flood the channel with new post notifications and assign them a score, which I need to make it go into in much, much more detail whilst doing. But, it's a start. :) It needs to check for a lot more conditions before it actually does the investigation part of the botting... :/
I will be making a template that the bot will append to peoples user_talk pages to point out that copyrighted material is bad, mmmkay. :p --Veratien 03:30, 8 August 2005 (UTC)[reply]

Approved for running for one month testing. May apply bot flag after that. Nothing against you, Veratien, it's a great idea, but as you said, it's still experimental, and want your bot to be more solid before it applies for bot flag. I'm trying to take the stance that this page is a proposal page to run a program on the Wikipedia, and they have to have a reasonable and solid proposal for running their bot. This is only to demonstrate the proof of burden and that a user will be responsible enough to take care of their bot and not let it go run amok. --AllyUnion (talk) 04:28, 20 August 2005 (UTC)[reply]

Cool idea, but I don't actually think this one should run with the Bot flag at all. It won't have a high edit rate, and it'll strongly benefit from having the odd user look over its shoulder. Make sure to also have it add an entry to WP:CV by the way. --fvw* 12:41, 30 September 2005 (UTC)[reply]

FlaBot

I have unblocked User:FlaBot on personal request of User:Flacus. AllyUnion blocked this bot indefinitely earlier for messing up, but Flacus assured me any issues have been resolved, and AllyUnion did indicate there would be no particular problem if an eye was kept on it. If anything goes wrong, I'm (at least partially) responsible. JRM · Talk 22:38, 10 August 2005 (UTC)[reply]


Andrewbot

I would like to run bot User:Andrewbot, which is pywikipediabot using the double redirect fixer. It will be run occasionally. Andrew pmk 02:09, 17 August 2005 (UTC)[reply]

Is this manually assisted or what? --AllyUnion (talk) 21:24, 19 August 2005 (UTC)[reply]
Approved after speaking with author on IRC. -- Pakaran 01:26, 30 August 2005 (UTC)[reply]
The details of what it does should be posted here and its userpage. Without that understanding, we don't know what the bot is intended for. --AllyUnion (talk) 04:39, 1 September 2005 (UTC)[reply]


AgentsooBot

I would like to use pywikipedia's solve_disambiguation.py to facilitate my disambiguation work. I created an account (User:AgentsooBot) for this and now I would like your permission to use it. Soo 17:32, 20 August 2005 (UTC)[reply]

Please setup the user page for the bot. After that is done, you may run your bot for one week. If no complaints are made, you may apply for a bot flag. --AllyUnion (talk) 17:34, 23 August 2005 (UTC)[reply]

I have been doing a number of what I call disambiguation drives manually, and would now like permission to trial a bot to see if it is a more efficient means of doing this. The bot has a user account ready (Robchurch-A) and would run manually-assisted running solve_disambiguation on selected items. Rob Church Talk | Desk 01:15, 25 August 2005 (UTC)[reply]

The name is rather very close to your own, and it does not assist in clearly spelling out that it is a bot rather than a sockpuppet account. --AllyUnion (talk) 23:15, 25 August 2005 (UTC)[reply]
I have created a different account, DisambigBot to address both of these concerns. Rob Church Talk | Desk 01:30, 26 August 2005 (UTC)[reply]
Approved for running for a duration for one week. If no complaints are made, may apply for a bot flag. Please make sure you list your bot at Wikipedia:Bots, and add your bot to the Category:Bots --AllyUnion (talk) 11:07, 27 August 2005 (UTC)[reply]


Why spellbots are bad

See http://pseudodoxia.flawlesslogic.com/index.php?title=Special:Contributions&target=Spellbot for a practical example of why automated spellchecking is bad. This particular spellbot has produced such gems as "SpongeBob SquarePants" -> "Sponger Smartypants", or "placing intermediate-range nuclear missiles in Cuba" -> "placing intimidatingly nuclear missiles in Cuba" --Carnildo 07:59, 25 August 2005 (UTC)[reply]

  • Someone’s been boasting about their vandalbot.[6] Susvolans 16:04, 25 August 2005 (UTC)[reply]
  • We only allow manually assisted spellchecking bots, or bots that generate automatic stats. --AllyUnion (talk) 23:12, 25 August 2005 (UTC)[reply]
  • In context of the rest of the discussion linked to it is clear that that isn't a spellbot at all. It's 'bot that deliberately introduces subtle vandalism into articles, that is simply styled as a spellbot either for ironic effect or to mislead. As such, it really indicates nothing at all about why automated spellchecking is bad, since that's not what it is doing. Uncle G 02:43:09, 2005-08-28 (UTC)


Curps' autoblocker

Curps is running an blocking bot which he refuses to get Wikipedia:Bots approval for. Further discussion is here. --fvw* 08:23, September 1, 2005 (UTC)

Traditional bots do leisurely janitorial work: fixing an interwiki link or a bit of text there. However, the block bot is being run as an emergency measure in response to the events of August 26, when "Willy on Wheels" pagemove vandalism reached a new and much more dangerous level. It was necessary to run it immediately, and I posted a notice at Wikipedia:Administrators' noticeboard/Incidents.
See AN/I (permanent link: [7]), and please see Special:Log/move for August 26 to see what happened on that day.
As fvw suggests, discussion is at Wikipedia:Administrators'_noticeboard#Curps.27_block_bot.
-- Curps 09:04, 1 September 2005 (UTC)[reply]

Discussion belongs here, on this page. Uncle G 09:54:22, 2005-09-01 (UTC)

  • I did indeed unblock fvw right away. However, I am running the bot 24/7, which means it is occasionally unsupervised. This is unfortunate, but I am arguing urgent practical necessity. For those who weren't involved in the events of August 26, please see the move log and AN/I discussion for that day. -- Curps 09:35, 1 September 2005 (UTC)[reply]
  • The bot has made the following blocks. Each time it reacted faster than a human could have. So it has a fairly strong track record of success.
  • I tend to think the bot should be OK in this instance as long as the set number of moves is higher than what any normal editor can reasonably be expected to make and also lower than or equal to the number of moves Willy usually makes in a minute. It may be tricky to find that balance; Curps should be careful to check and quickly unblock any legitimate user affected by it. The problem, which occurs to me, is that the bot may ironically end up blocking people who are trying to move pages back to where they should be. Is there a way to deal with this? Everyking 09:10, 1 September 2005 (UTC) (copied from the administrator's noticeboard to here, as it deals with proposed modifications to the 'bot before community approval. Uncle G 10:18:39, 2005-09-01 (UTC))[reply]
  • When it comes to running automated tools under the aegis of accounts that have been granted administrator privileges by the community, I'm largely in agreement with Netoholic (see above). All of my automated tools run under the aegis of unprivileged accounts (User:Uncle G's 'bot and User:Uncle G's major work 'bot) in accordance with the principle of least privilege. Exercise of administrator privileges by a tool that does not require a human to explicitly pull the trigger each time is not something that should be accepted as a matter of routine. fvw has already described how this 'bot has already made a false positive. User:Uncle G's major work 'bot does page moves, with a delay in between each operation. I have no way of knowing whether it will be hit by this 'bot. Indeed, how is Tim Starling to know whether Portal namespace initialisation script (talk · contribs), or any other mass rename done by developers under role accounts, will be hit by Curps' bot? Furthermore, why does the 'bot exclude administrators? It seems to operate on the premise that ordinary users are not allowed to revert vandalism. Speaking as an editor who reverted vandalism as an ordinary user here, and who still reverts vandalism (including page move vandalism) as an ordinary user elsewhere, I find that premise unacceptable. The more ordinary users who help in protecting against vandalism, the better.

    Certainly this 'bot should not be approved unless its operation is more thoroughly documented, and the concerns raised here addressed. Uncle G 10:18:39, 2005-09-01 (UTC)

Fvw did a very large number of moves in rapid succession and triggered the threshold; I have since set the threshold even higher. Also, he was only blocked because I inadvertently left him off the list of admins because Wikipedia:List of administrators listed him as "inactive"; I have since added all "inactive" administrators to the "do not block" list.
"Why does the bot exclude administrators?" Because it's intended solely to stop pagemove vandals and it's assumed that by definition administrators can't be vandals (perhaps they could be, but then we'd have far more to worry about than just page moves). Of course ordinary users are "allowed" and even expected to help protect against vandalism. There's no intent to discriminate, but excluding admins is simply practical and sensible.
This is not a traditional bot; it does not edit any pages. Also, Wikipedia:Bots is very quiet compared to AN and AN/I and far fewer eyes see the discussions here. I posted originally to AN/I and fvw has started a discussion at AN, so it's best to consolidate the discussion there.
-- Curps 10:57, 1 September 2005 (UTC)[reply]
PS, the latest Willy vandalism does 75 pagemoves per minute. If the block bot is disabled and we end up with many hundred pages to be moved back, can we count on you to pitch in? And then do it all over again, nine times in one day? -- Curps 10:57, 1 September 2005 (UTC)[reply]

First of all, I would like to say that I have no objections to this Bot, for the most part. On the other hand, a bot is a bot is a bot. I think that every bot used on the Wikipedia should have approval for continued use on the Wikipedia, and it should have its own account. Also, since this Bot is running 24/7, but Curps can't be monitoring it 24/7, there should be some provision for a couple of other trusted editors to help supervise it.

For one example of where a change might be needed--The odds are the Willy vandal probably reads many of the discussions about him, and so he now knows that he needs to throttle down the number of page moves that he does.

Also, since this bot has the possibility of false positives (someone with a broadband connection and using a tabbed browser like Mozilla can do plenty of edits in a minute), there should be greater publicity--perhaps a mention in Wikipedia:Signpost. BlankVerse 13:32, 1 September 2005 (UTC)[reply]

Why does the bot need to do an indefinite block? How about a block for something like half an hour, if it detects 3 pagemoves a minute? It could then put an automated notice on a page where other admins could review these blocks and see they should be upgraded to a permanent block, or unblocked as a false positive. This would immediately chill willy's actions, and a half an hour false positive isn't going to kill anyone. As admins are whitelisted, they wouldn't be caught for reverting damage. -- Norvy (talk) 03:16, 2 September 2005 (UTC)[reply]

The problem is the way multiple blocks work. In recent cases, when the bot has blocked, very often another admin or two blocks a minute or two later (unfortunately, at 75 pagemoves per minute, even a minute's delay can be very expensive). However, if the bot blocks for half an hour and other admins block indefinitely, ALL blocks get removed when the half hour's up. That's the way the software works. So there's no way for the bot to block for a short time and still allow other admins to block indefinitely, unless they laboriously unblock and reblock (which is downright unwise if Willy's bot gets unblocked even momentarily). In any case, I do review any blocks the bot has made (immediately, if I'm around at the time). -- Curps 04:13, 2 September 2005 (UTC)[reply]
Again, I support the running of this bot, but I recommend a trial run period of two weeks. This bot should immediately stop running if no one is taking care of it. By "taking care of it," that user must have access to the physical code of the bot, and whatever account it is run on. Just covering all the bases, Curps. As tragic as it may be, if, for whatever reason, you decide to leave the Wikipedia community, the bot unfortunately has to stop running on your own account. I'm trying to think... Curps, is there any way for you to add some kind of form system that allow you to log unblock requests? Although such a form would be subject to abuse, it's just a suggestion. --AllyUnion (talk) 06:03, 2 September 2005 (UTC)[reply]
Unfortunately, due to the very nature of what the bot is fighting against, it needs to run at all times, 24/7. I have now added a feature to make the bot post a notice to an admin discussion page (AN/I) when it does a block. Unfortunately this failed the last time because the page move vandal moved AN/I itself and turned it into a redirect (!) (I thought such pages were protected against moves?). This should ensure that any accidental block will not last long. -- Curps 06:30, 6 September 2005 (UTC)[reply]
I agree with your suggestion, AllyUnion. It shouldn't be that hard to do, even if it was just a PHP and MySQL setup on an external website. All that would be needed is the form, he could look in the MySQL database manually to see the requests. --Phroziac (talk) 14:30, September 2, 2005 (UTC)

By the way a good filter is that not to block users with "old" edits, at least I would guess most vandals register a new account and don't go on and edit (usefully) some article, then wait for weeks. Old edits can be checked by checking the oldest edits of the user. A reasonable "minimal user age" (age of oldest edit) and "minimal number of non-suspicious edits" should be guessed upon. Would minimalise blocking real editors. My 0.02. --grin 15:27, 2005 September 3 (UTC)

The problem with this suggestion is that pagemove vandal bots are moving pages at a rate of 75-90 pages per minute. They have to be stopped immediately. In the time it takes to look up a user's edit history a lot of damage can be done (especially given Wikipedia's very uncertain response time, and the fact that it takes two page fetches to see a user's earliest contributions). -- Curps 06:30, 6 September 2005 (UTC)[reply]
How about running the checks after blocking, and unblocking if it appears to have been a mistake? --Carnildo 07:22, 7 September 2005 (UTC)[reply]
I always do this right away, if I'm around at the time. The problem is the bot has to be run 24/7. Hopefully by posting to AN/I, other admins will be able to take a look if I can't. -- Curps 07:24, 8 September 2005 (UTC)[reply]
What about checking against the admin list? Or bot list? --AllyUnion (talk) 04:08, 8 September 2005 (UTC)[reply]
Admins are immune from the bot, because the bot is intended to work against vandals and we have to assume that admins aren't vandals. As for other bots, well, do any of them do page moves? If they do, they should do them at a reasonable pace under normal circumstances. If anyone writes a pagemove-revert bot to undo willy damage, which needs to do a lot of pagemoves quickly, they could perhaps let me know. -- Curps 07:24, 8 September 2005 (UTC)[reply]


NekoDaemon modifications

NekoDaemon will be scheduled to empty categories using the soft redirect of {{categoryredirect}}. Code is in testing phases. --AllyUnion (talk) 06:11, 6 September 2005 (UTC)[reply]

Code complete. --AllyUnion (talk) 04:36, 8 September 2005 (UTC)[reply]

Bot flag for ZwoBot

I've now been running my interwiki bot ZwoBot since July now, and there have been no serious problems so far. AllyUnion has asked me to request a bot flag for the bot's account, so that it doesn't appear on recentchanges. Does anyone disagree on this issue? --Head 09:41, September 6, 2005 (UTC)

You've already requested above. --AllyUnion (talk) 00:49, 7 September 2005 (UTC)[reply]

HasharBot

Please add User:HasharBot on Wikipedia:Bots. Been around for a year. It uses the python wikipedia framework and I use it for interwiki update as well as for solving disambiguations sometime.

Hashar 14:06, 6 September 2005 (UTC)[reply]

Approved to run, if no objections are made within a week, continue on. --AllyUnion (talk) 00:51, 7 September 2005 (UTC)[reply]

NotificationBot

The purpose of this bot is to notify users on their talk page of an event or reminder, based on whatever they schedule the bot to do so and whatever message they set up with the bot. More details found at User:NotificationBot. --AllyUnion (talk) 02:15, 9 September 2005 (UTC)[reply]

NotificationBot has been granted bot status. If there'll be any problems with him, please notify the Stewards at m:Requests for bot status. Datrio 09:46, 12 November 2005 (UTC)[reply]

Pearle to auto-select articles from categories

I'd like to modify Pearle to select articles from categories and add those selections to wiki pages. The first application will be Template:Opentask. As discussed on Template talk:Opentask, I'd like to throw up a selection of articles from four different categories, once every 24 hours or so. (I'm tired of doing it myself.) I will have to embed some HTML comments which the bot will look for so it will know where to insert article lists. -- Beland 09:37, 9 September 2005 (UTC)[reply]

I'm rather technical, do you mind highlighting how Pearle will do this? --AllyUnion (talk) 23:02, 9 September 2005 (UTC)[reply]
I wrote a description at User:Pearle#Opentask_demo. The last Pearle edit of User:Beland/workspace shows the finished demo code in action. I've also uploaded the source code for this new feature to User:Pearle/pearle.pl. The new functions are near the bottom; starting at "sub opentaskUpdate". Enjoy, Beland 02:55, 11 September 2005 (UTC)[reply]

Kurando-san WikiProject Archival

Kurando-san will automatically archive any WikiProject that has not been edited in the past 6 months, and who's talk page hasn't been edited in the past 2 months. If the talk page doesn't exist, it is assumed as the same thing as a talk page that hasn't been edited in the past 2 months. --AllyUnion (talk) 22:58, 9 September 2005 (UTC)[reply]

What do you mean by "archive"? -- Carnildo
It will mark the project with {{inactive}} and remove it out of the Category:WikiProjects. --AllyUnion (talk) 05:42, 11 September 2005 (UTC)[reply]
It would be splufty if it also updated Wikipedia:List of inactive WikiProjects and Wikipedia:List of WikiProjects, but it sounds useful as it is, too. -- Beland 03:24, 13 September 2005 (UTC)[reply]
I don't know how to make the bot sort it into the correct category. It already posts the results into Wikipedia talk:List of inactive WikiProjects if any changes have been made. --AllyUnion (talk) 04:08, 13 September 2005 (UTC)[reply]

Bot for interwiki es -> en

Hi, I'm es:Usuario:Armin76 from spanish wikipedia and I want to get a bot flag for this user to create interwiki links from spanish wikipedia to english wikipedia.

  1. Autonomous bot and sometimes manual
  2. Forever i can
  3. pywikipedia framework
  4. interwiki from es to en

--KnightRider 16:19, 12 September 2005 (UTC)[reply]

I'm skeptical that it's a good idea to have a bot automatically make any interwiki links. Shouldn't they all be checked by a human to make sure the target actually covers the same topic as the source? It's perfectly find to have an automated process suggest interwiki links and/or help a human make links faster, though. Out of curiousity, how will your bot be deciding which links to suggest? -- Beland 03:31, 13 September 2005 (UTC)[reply]
Well, it depends on how much you will be checking those interwiki links. --AllyUnion (talk) 04:21, 13 September 2005 (UTC)[reply]
The idea is put the interwiki link from es in the en wikipedia. I mean, in the es wikipedia there are the en interwiki links, but in the en wikipedia there aren't the es interwiki links. The bot i want is for insert the es interwiki link into the en article that is referred in the es wikipedia. --KnightRider 11:32, 13 September 2005 (UTC)[reply]
So long as your bot is not removing any other languaqge interwiki links, I would say your bot is okay. Approved for a test run for one week. --AllyUnion (talk) 19:55, 14 September 2005 (UTC)[reply]
This bot (KnightRider) does appear to be removing other interwiki links during some edits. See [8] [9] [10]. I request that it be turned off or blocked until fixed. -- DrBob 19:33, 15 September 2005 (UTC)[reply]
Now seems to be fixed. -- DrBob 17:57, 21 September 2005 (UTC)[reply]
Trial period resetted. Test run for 1 week. User has indicated that interwiki removal is done when the bot is running manually. --AllyUnion (talk) 19:52, 21 September 2005 (UTC)[reply]
Finally what? --Armin76 | Talk 16:59, 1 October 2005 (UTC)[reply]

Proposal for front page

Is the following text reasonable enough to put on the project page? -- Beland 05:02, 13 September 2005 (UTC)[reply]

What counts as a bot?
If you are doing offline processing of database dumps, and using the result to guide manual edits, there is no need to request permission here. A script that helps a human editor make changes one at a time (like a javascript script) and where the human checks the output is not a bot, and no permission is required. Any process that makes edits automatically, or which makes multiple edits in response to a human's command, is a bot, and requires community pre-approval. Automated processes using admin priviliges should also be approved here. Read-only spiders are discouraged. To reduce load on Wikimedia servers, please use a database dump instead. Dynamic loading of pages may also be inappropriate; see Wikipedia:Mirrors and forks.

Sounds good to me. --AllyUnion (talk) 11:07, 15 September 2005 (UTC)[reply]

That definition would make some stuff we have been traditionally treating as bots such as semi automated disambig tools (retrive what links here for a disambig and then let a user quickly process each of the pages linking to the disambig by selecting one of the items on the disambig page to change the link to) not count as bots any more. I personally think this is a good thing but its certinaly a point to consider. Plugwash 22:19, 15 September 2005 (UTC)[reply]

Plant bot

I would like to request to be able to add all known species of plant life by scientific and common name to the Wikipedia automatically with a taxobox on each page. I don't have an exact source yet and I don't have any code yet. -- AllyUnion (talk) 08:41, 11 Mar 2005 (UTC)

Just out of curiosity, how many articles is this going to add to Wikipedia? Am I going to need to adjust my guess for the Wikipedia:Million pool? --Carnildo 09:08, 11 Mar 2005 (UTC)
If the bot will only add a taxobox, what is the added value compared to Wikispecies? --EnSamulili 21:19, 31 May 2005 (UTC)[reply]

I think the only list of plants I can add are established and very well known plants whos sciencific name classification hasn't changed since the last past 25-50 years. -- AllyUnion (talk) 19:25, 12 Mar 2005 (UTC)

On somewhat of a sidenote, I've considered doing a similar thing for the fish in fishbase, but I have not had the time to put into that yet. I suppose the request would be quite similar, so I'll throw it out as an idea. -- RM 14:04, Mar 23, 2005 (UTC)
Both of the above sound like good ideas to me. Indisputably encyclopedic, and the stub articles are likely to expand in the fullness of time. Soo 13:18, 21 September 2005 (UTC)[reply]

Taxobox modification for plant articles

In addition to this request, I wish to use a bot to automatically correct and add the taxobox to all plant articles. -- AllyUnion (talk) 09:46, 13 Mar 2005 (UTC)

Status?

What is the current status of this proposal? -- Beland 05:11, 13 September 2005 (UTC)[reply]

Inactive at the moment. It's filed away until I run into a database with this information. --AllyUnion (talk) 07:43, 15 September 2005 (UTC)[reply]

Could it be a good idea for someone to create a bot that looks for new users that don't have a userpage set up that edits the userpage by setting up some helpful links that the new user could use to start his/her carrer in Wikipedia? --Admiral Roo 11:47, 21 September 2005 (UTC)[reply]

You mean like a bot that does welcoming committee duties? And I assume you meant User talk page, right? If it was a manual tool assisting users, run manualled by hand by several users, I would have no problems with it. If it was an automated bot... rather a cold welcome, don't you think? --AllyUnion (talk) 19:49, 21 September 2005 (UTC)[reply]
No, I don't mean a bot that does welcoming duties, I mean a bot that puts up helpful links on the users page, links that would help a new user to be familer with wikipedia. --Admiral Roo 11:21, 22 September 2005 (UTC)[reply]
Why don't you just modify the welcoming template? --AllyUnion (talk) 17:09, 26 September 2005 (UTC)[reply]

I finally have cable again, after almost a month after the hurricane. Whobot is running again, until final approval. See original request here. Who?¿? 21:33, 22 September 2005 (UTC)[reply]

Wahey, welcome back! --fvw* 21:36, 22 September 2005 (UTC)[reply]
Thankies.. life is good now :) Who?¿? 23:10, 22 September 2005 (UTC)[reply]

I wanted to inform everyone that Whobot has been running at 10sec intervals, as I had planned to have a bot flag by now. I do not wish to flood RC, but cfd has been backed up, and I've been quite buzy. There are a huge number of naming conventions that have become speedy, and it affects a great deal of categories. Until the bug gets fixed on Meta, bot flags can't be set. If there are any objections, please let me know. Who?¿? 10:05, 6 October 2005 (UTC)[reply]

Interwiki bots and Unicode characters above U+FFFF

Interwiki bots should take extra care when doing interwiki links involving Unicode characters larger then U+FFFF. These don't fit into 16-bits, and must be represented by a surrogate pair, see UTF-16.

For example, this edit by FlaBot wrecked the zh: interwiki link for Bohrium. The Chinese character in question is U+28A0F (or &166415;) and looks like 金+波. -- Curps 09:45, 24 September 2005 (UTC)[reply]

Mediawiki does NOT use UTF-16 so surrogate pairs have no relavence to mediawiki itself. What it looks like is happening is the bot is using UTF-16 as an internal format without realising that it contains surrogates. Plugwash 15:19, 24 September 2005 (UTC)[reply]
Curps, please take up the issue with Flacus. I have already blocked his bot once for not properly taking care of his bot, but JRM let him run the bot again. --AllyUnion (talk) 08:39, 25 September 2005 (UTC)[reply]
I have already left a message at User talk:FlaBot. -- Curps 16:26, 25 September 2005 (UTC)[reply]
Curps in your initial post you said "for example" does this mean you have noticed other bots doing it and if so do you remember which ones? Plugwash 17:57, 25 September 2005 (UTC)[reply]
No, I just phrased it that way because I wanted to avoid singling out FlaBot. The only articles I know of that have such Unicode characters in interwiki links are Bohrium, Hassium, Dubnium, Seaborgium, in the zh: interwiki link. Yurikbot handles these correctly, and I haven't seen other bots alter these. Also Gothic language has such characters in the intro paragraph, as someone pointed out, but I don't think any bots have touched that. -- Curps 17:55, 26 September 2005 (UTC)[reply]
This can't happen with UTF8 wikies MvR 10:35, 8 October 2005 (UTC)[reply]

Open proxy blocker

Open proxies have been getting more of a problem again lately, and the blacklisting incorporated into mediawiki isn't making much of a dent in it. Since Wikipedia has been upgraded a while back and the block list now finally scales nicely, I was thinking of resurrecting the preemptive proxy blocker bot. Any comments? --fvw* 15:21, 26 September 2005 (UTC)[reply]

I think it's a really good idea provided that MediaWiki can handle it now. I'm just interested in the specifics of how the open proxies are detected. Is a scan of IP ranges performed or do we import some sort of blacklist? Carbonite | Talk 15:30, 26 September 2005 (UTC)[reply]
Check the archives for the long version, but basicly I grab all the open proxy lists off the web I can and try to edit wikipedia through them. If it works, I block em. --fvw* 15:41, 26 September 2005 (UTC)[reply]
Go for it. I remember your bot and my disappointment when the list it had carefully generated had to be removed from the block-list for performance reasons. If the block list scaling is good now, then we should all welcome the bot back with open arms. Shanes 15:39, 26 September 2005 (UTC)[reply]
Yeah, provided everyone agrees here I'll go bug someone who knows to check exactly which code is running on the wikimedia servers. --fvw* 15:41, 26 September 2005 (UTC)[reply]
I have no problems with it, it was working great. --AllyUnion (talk) 17:00, 26 September 2005 (UTC)[reply]
Definitely go for it. -- Curps 17:57, 26 September 2005 (UTC)[reply]
If it stops the spammers, please... Shimgray | talk | 01:13, 27 September 2005 (UTC)[reply]
The spammers are back at WP:HD today. Of the four IPs so far, two were on the RBLs I checked. Even if it only serves to slow them down, I think its worth trying. Provided of course, that it doesn't hit the servers too hard. Off to look at the archives for more details. --GraemeL (talk) 15:04, 29 September 2005 (UTC)[reply]
Here's one discussion on the open proxy blocker bot. There may be more. Carbonite | Talk 15:09, 29 September 2005 (UTC)[reply]
Wahey! I just had a long and initially rather confused chat with Tim Starling, and it turns out the inefficient code (which is a mysql3 workaround (and mediawiki is mysql4 only now)) is still there but no longer getting hit. Since I've just discovered I have no class tomorrow I'm going to start hacking on this right away. Thanks for your support everyone! --fvw* 00:22, 30 September 2005 (UTC)[reply]
Up and running for HTTP proxies, now I have to go learn how to talk to SOCKS proxies. --fvw* 12:44, 30 September 2005 (UTC)[reply]

Ok, I'm going to start it on blocking the first batch now. I'm going to be behind the computer for the entire run (email me if I don't respond to talk quickly enough, I may just not have loaded any fresh wikipedia pages), but if you see anything you feel is wrong by all means please unblock. --fvw* 18:53, 1 October 2005 (UTC)[reply]

Ok, on the first run (which took rather long because it involved writing the necessary tools along the way) it's blocked 1016 open proxies, and even managed to unblock a few that weren't open proxies any more. In future runs I'll try and gather up a larger starting set and see how that influences the number of blocks (if the number of blockable proxies doesn't increase much that would be a sign we're nearing saturation of the well-known open proxies, which is sort of what I'm hoping for). Let me know if you encounter any more open proxies or if open-proxy-related problems have ceased or diminished. --fvw* 04:46, 2 October 2005 (UTC)[reply]

I'm no longer putting time into this one; is there anyone who's willing to take over the job of supporting the blocks (a few emails a week with people who don't understand they're running tor or where there's an open proxy running through their ISP proxy and you need to contact the ISP)? If not I'll get rid of its blocks, which seems a bit of a waste. --fvw* 02:01, 20 October 2005 (UTC)[reply]

Bot for disambiguating Native Americans and American Indian

I want to run a simple bot, to be titled User:PhD-Econobot, for the purpose of disambiguating between the two pages that used to be Native Americans. This requires checking a long series of links, including redirects as well as American Indian and its redirects. My bot would be completely manual and would not be making superrapid edits. Do I need a bot flag? Looking at Wikipedia:Bots, it appears that there are some bots running in a similar fashion that do not require flags. - Nat Krause 08:26, 27 September 2005 (UTC)[reply]

Interesting. Regardless of the bot, care needs to be taken to differentiate the two as well as (Asian) Indians.
Well, it depends how frequently you will be running your bot and whether you will have it on auto-pilot or manually assisted. --AllyUnion (talk) 20:32, 1 October 2005 (UTC)[reply]
It will be all manually assisted. And I will run it about as often as I normally edit, I suppose. I don't have any specific plans. - Nat Krause 05:57, 2 October 2005 (UTC)[reply]
Approved for a trial run of one week. May apply for bot flag if there is a high frequency of edits. (i.e. 100 edits a day) --AllyUnion (talk) 18:41, 3 October 2005 (UTC)[reply]
Trial run begins today! - Nat Krause 11:45, 11 October 2005 (UTC)[reply]
Why the odd name, though? Rmhermen 16:43, 11 October 2005 (UTC)[reply]
I think it's a great idea, if you can ge it to work. Indian should be the Asian, Native American should be the original people of the continet. HereToHelp (talk) 02:33, 2 November 2005 (UTC)[reply]

I would like to request a permission to use KocjoBot on :en. Primary mission will be updating link between :en, :sl, :bs and :hr. So far the bot was running on other 3 WP (with bot-flag) and without problems. Regards, --KocjoBot 22:12, 30 September 2005 (UTC)[reply]

I see you understand English, Slovene, and German. I'm uncertain on what languages bs: and hr: run on, can you elaborate? --AllyUnion (talk) 20:35, 1 October 2005 (UTC)[reply]

I don't understand your question; I also understand Croat and Bosnian (very similar languages to Slovene). Also, after I posted this request, I was asked by Serbian community to use KocjoBot on :sr; so this bot will be running on :bs, :en, hr:, :sl: and :sr. Because a lot of these WP has foreign iw, but others don't have theirs, so IMHO I think that this Bot is great need to coordinate these WP. Regards, --Klemen Kocjancic 09:01, 3 October 2005 (UTC)[reply]

We have a policy on the English Wikipedia, which is to ask Interwiki linking bot operators to understand the language that they plan to interwiki with. If you do understand the languages written on bs, en, hr, sl, & sr, then you are permitted to run a trial period of one week. If no complaints are made within that week, you can continue to run your bot. This is provided that you will be checking the interwiki links that your bot posts. --AllyUnion (talk) 18:38, 3 October 2005 (UTC)[reply]

I've alreay bot status on other 4 WP and there were running for some time now (in :sl over 20.000 edits, on others 900+). So far there were no problems with interwikis. Regards, --Klemen Kocjancic 19:24, 3 October 2005 (UTC)[reply]

It is still general English Wikipedia policy for bots to run a trial run of one week, even if you have been running it elsewhere. Sorry, we deal with a lot of stupid vandals around the English Wikipedia. It is not to say you are one, but it needs to be proven that you aren't one. A trial run of one week will assert whether your bot is: harmless, useful, and not a server hog. --AllyUnion (talk) 08:30, 4 October 2005 (UTC)[reply]

OK, I'll run it. Regards, --Klemen Kocjancic 20:39, 4 October 2005 (UTC)[reply]

The bot appears to be adding and removing interwiki links for languages beyond what it's approved for. For example, removing ast:, which appears to be an incorrect removal. I've blocked the bot until this is sorted out. --Carnildo 18:23, 5 October 2005 (UTC)[reply]

The bot recognized such "articles" (minimal text) as non-text and removes it from the list. If you see, what is on the page you'll see. If that is a problem, OK. I've already made modification. About "beyond what it's approved for": I meant that I'll be running bot on 5 WP with fixing IW of all languages, not just these 5. If this is problem, sorry for not making more clear. BTW. I would reply earlier, but Carnildo blocked my IP-address and not just KocjoBot, so I couldn't edit. Regards, --Klemen Kocjancic 20:37, 5 October 2005 (UTC)[reply]

Are there any further objections to this being flagged as a bot now? Please leave a note on m:requests for permissions to clarify this. Thanks. Angela. 10:38, 25 October 2005 (UTC)[reply]
Responded to Angela's request on Meta. Stated that, at present, we want to resolve exactly what the bot will be doing, as it's still not clear - the user now claims to want to update all interwiki links across five Wikipedias, not the aforementioned five languages' worth of interwiki links on our Wikipedia; since this is confusing (and violates our general "must understand language" requirement) I've asked them to hold fire. Rob Church Talk | FAHD 03:32, 27 October 2005 (UTC)[reply]

A question about this bot : he added interwikis to my user page and I want to control the interwikis I put on each user page for personal efficiency use. Has these bot be authorized to do that ? Sebjarod 13:10, 18 December 2005 (UTC)[reply]

Collaboration of the week update bot.

I wish to run a bot which will automatically update the dates and prune unsuccessful nominations on Wikipedia:Collaboration of the week. The bot would likely run daily, using the pywikipediabot framework. I've registered the name CollabBot for this. Talrias (t | e | c) 12:40, 2 October 2005 (UTC)[reply]

Are you restricting your bot to only Wikipedia:Collaboration of the week or will your bot maintain all collaborations of the week? Furthermore, how do you determine which nominations are successful and which nominations are not? (The latter question is to just find out how you are making your bot tick, which for all intensive purposes, just helps to assert that your premise of getting your bot to function is valid.) --AllyUnion (talk) 18:40, 3 October 2005 (UTC)[reply]
At this time it will be restricted to the main collaboration of the week, and possibly the UK collaboration of the fortnight (as I am semi-involved in this). The bot would update the "number of votes by X" by counting the number of votes made (the number of lines starting with #, without a : (which would indicate a reply to a support vote) and with a signature in the line in the support section), discarding any votes from IP addresses, and changing the date and number of votes required appropriately. If a nomination failed to get the correct number of votes, as calculated above, it would remove it from the page and add it to the archive page. For both stages I would run trials where the bot would suggest a change to make before enabling the bot to do it automatically. Talrias (t | e | c) 19:45, 3 October 2005 (UTC)[reply]
What about sockpuppets? I'm thinking here of cases like the collaboration drive nomination for Seduction community, where all the support votes were by sockpuppets try to avert the closure of the AfD on that article. Not a huge problem, but it might suggest some human intervention would be useful. --fvw* 19:52, 3 October 2005 (UTC)[reply]
A fair point. This is something which would be a minor problem with the automatic removal of unsuccessful nominations rather than updating the no. of votes required and date for ones which get the required number of votes (as someone could remove the sockpuppet votes when they were detected, and the bot would correct the no. of votes required and date). It would only be a problem with removal because the article would stay nominated even though it may have been unsuccessful (and only kept due to the sockpuppets). However, I think that having an article stay by mistake is better than having an article removed by mistake, which is more likely if the bot attempted to do sockpuppet detection. I don't see this bot as being a substitute for people looking over the nominations, just a substitute for updating the no. of votes required and date, which is menial. Talrias (t | e | c) 21:45, 3 October 2005 (UTC)[reply]
How will you deal with unsigned nominations? --AllyUnion (talk) 21:34, 3 October 2005 (UTC)[reply]
Nominations are not signed (the first vote is that of the nominator); I'm assuming you mean unsigned votes, and the answer is that typically the signature is the only text left by the voter and a vote without a signature I believe it is a non-issue. In the event it occured, the bot would not count it as a vote (but would if the person later added their signature or someone used the {{unsigned}} template). Talrias (t | e | c) 21:45, 3 October 2005 (UTC)[reply]
I wouldn't so much mind if it intervened in cleanup, or pushing the updated dates... but I'm not so certain I'd be the one having it actually select the winner. --AllyUnion (talk) 08:20, 4 October 2005 (UTC)[reply]
It would not do this - updating the actual collaboration of the week is non-trivial. Talrias (t | e | c) 13:34, 4 October 2005 (UTC)[reply]
Approved support for trial run of one week. --AllyUnion (talk) 23:38, 7 October 2005 (UTC)[reply]

Chobot exceeding its approved behavior

The approval given to Chobot (see Wikipedia_talk:Bots/Archive_interwiki#Chobot) covers only links between the English and Korean wikipedias. Recent edits by the bot include interwiki links to other languages, exceeding the bot's approved behavior.

I have left a note on the operator's talk page. User:ChongDae indicates on their user page that they can speak English, Korean, and Japanese. I'm fine with expanding the permission to include these three languages, but our policy requires that we ask them to please not modify interwiki links to any other languages.

Misbehaving bots are subject to blocking on sight by administrators.

-- Beland 01:59, 5 October 2005 (UTC)[reply]

Yeah, that policy may be a pain some times but it's there for a good reason. It appears to not be running currently though, let's hope ChongDae reads his talk before restarting it. --fvw* 02:09, 5 October 2005 (UTC)[reply]

I was running the bot with autonomous mode. When running in autonomous mode, the bot tries to collect interwiki links and update it. (The bot don't run in en: now.)

When I got the bot permission in en:, The bot could update only the home language wikipedia. At that time to update a link using a bot, I have to run the bot on ko: first; analyze the log of bot; change the language setting to en:; and update a missing links depending on the log. This process is easy to do mistakes and sometimes outdated. (The time difference between running of the bot in ko: and en: can be a week or more.)

The pywikipedia software was updated to allow multiple site simultaneously after July. (cvs log) So I have been depend on it. Many other bot operators also use it. It is hard to say that the program is perfect, but it works well.

BTW, Is there any renewal procedure for bot's behavior? I want to update it. (for example, permission to generic interwiki bot, image mover to commons:, ...)-- ChongDae 18:24, 5 October 2005 (UTC)[reply]

How does it determine which articles are the same without user intervention? Just because nl:A links to ko:B and ko:B links to en:C, doesn't necessarily mean nl:A should link to en:C (or vice-versa). Article topics are frequently chosen differently which causes the link between nl:A and ko:B to be off but a reasonable approximation, however compounding the topic approximation error of nl→ko and ko→en is going to get an even worse match for the nl→en.
If it's determining which articles to link in a different way, how? --fvw* 18:46, 5 October 2005 (UTC)[reply]
That's the way interwiki bots work. Do you have any idea? -- ChongDae 20:25, 5 October 2005 (UTC)[reply]

The pywikipedia framework just assumes that any link to or from en:A is just as good as any other link, and that if fr:B links to en:A, any links to and from fr:B might as well go directly to en:A. Unless there is a conflict, e.g. nl:C links to fr:B and en:D, in which case it asks for manual intervention.

We should be explicit - do we want to establish a policy that all interwiki links must be manually reviewed? Personally, I think an interwiki link to a slightly-off article is better than none at all, and humans can check that sort of thing after the fact.

I think it's important to insist that bots only modify articles in languages the bot operator can understand, to deal with complaints from denizens of that Wikipedia.

But what about links to various other Wikipedias? If we insist on manual review of all interwiki links, then obviously bot operators must speak the target language. (And by doing so we miss out on some of the links suggested by the pywikipedia method.) But with an automated default, what's wrong with bot operators replying to a complaint, "I don't speak German; if you do and you think the link should be changed, go ahead"?

If we insist on manual review (whether while the bot is running or afterwords, by looking at contribs) we have a lot of enforcement to catch up on. If we don't, then we can basically give blanket permission to run any standard interwiki bot based on pywikipedia for anyone that can speak English and who we trust to run a bot. Personally, I'm fine with the latter. Some WikiProject can systematically check interwiki links, if they want, either by following bots around, or in some other fashion. (After all, people make bad interwiki links, too, whether because they don't speak a certain language very well, or because they didn't know that there's a better article to link to.) -- Beland 02:10, 6 October 2005 (UTC)[reply]

Oh, and if you want to do something new with your bot, the procedure is to let us know here, and we discuss it. (As we've started doing with the idea of a generic interwiki bot.)

Chobot wanting to move images to Commons

As for automated image moving to Commons, I would say that any such bot would need to:

  • Detect naming conflicts (e.g. if there was already an image with the same name on Commons)
  • Preserve all image description information
  • Note that the image was originally uploaded to Wikipedia, and who uploaded it
  • Preserve template information (especially licensing info). Remember that just because a template is defined on Wikipedia doesn't necessarily mean it's defined on Commons, or that it has the same definition there.

Images that have more than one version are somewhat problematic, because several people may have copyright on the image, or maybe the image has been wholly replaced, and only the latest uploader (or maybe only the 2nd and the 4th) have copyright on it. I guess these would have to be done manually. -- Beland 02:22, 6 October 2005 (UTC)[reply]

I'm moving images manually. If images are required when writing articles in ko:, I first look at commons. If I cannot find any suitable one, I look around images in other wikipedias, en/de/ja/..., and transfer it to commons if the license is suitable for commons; PD/GFDL/SA/... (Note that these templates are almos same to commons and other wikipedias, as I know.) I'm using imagetrasfer.py in pywikipedia frameworks. It automatically leaves {{NowCommons}} macro on the original one. If you are complaining to moving images in hidden (bot account), I can change it. -- ChongDae 08:09, 6 October 2005 (UTC)[reply]
Wouldn't the history for the file upload need to be copied? --AllyUnion (talk) 00:51, 8 October 2005 (UTC)[reply]

Interwiki by RobotJcb

I would like to ask permission to do some interwiki botworks with my bot: RobotJcb. I run a multi-login bot, doing interwikis on several wikis. Today I allready did a few edits on EN.wikipedia with the robot: [11]. I did not now that permission was needed, I'm sorry. But you may review that 18 edits to see what kind of upkeep I would like to do. Jcbos 15:27, 8 October 2005 (UTC)[reply]

Is this a pywikipedia bot? If so, then I am ok with you running it, as long as you only make links to articles in languages you can understand. (This is based on the existing policy, which we are discussion, so this may change shortly.) Please let us know which languages you can understand well enough to judge whether or not two articles are on the same subject. -- Beland 03:09, 9 October 2005 (UTC)[reply]
This is a pywikipedia bot: interwiki.py. It runs on about 20 languages, of which I can understand about 10. But I use the autonomous option, so if there are any doubts, it will nog change anything and just skip the page. Jcbos 11:49, 9 October 2005 (UTC)[reply]
i'd like to know if the en: community is approving of this bot or no. btw this bot does good work on nl:, and imho he deserves a little bit more feedback than from just 1 user responding to his request... oscar 00:33, 17 October 2005 (UTC)[reply]
Not many people from the en community reply on this page. Usually I'm the one who does. The only problem the English community has is if the bot starts to remove interwiki links. We don't want it to do that. Modify, yes, delete, no. --AllyUnion (talk) 13:45, 17 October 2005 (UTC)[reply]
OK, I will keep that in mind and I will not remove broken interwiki links. Jcbos 23:22, 17 October 2005 (UTC)[reply]
In a few pages I already removed an interwiki yesterday, but I just restored them manualy. Jcbos 23:31, 17 October 2005 (UTC)[reply]
It's good you fixed it manually, but have you also fixed the bot to prevent it doing this again? Angela. 10:35, 25 October 2005 (UTC)[reply]
It has been fixed and as far as I can see it didn't happen anymore. I will check my botedits from time to time as well. Jcbos 17:23, 26 October 2005 (UTC)[reply]

Interwiki bot policy

The existing policy on the project page is a bit vague on what interwiki bots are "known safe", etc., and I have posed some questions above that affect a lot of interwiki bots, including the two most recent requests for permission.

I propose the following slightly revised policy on Interwiki bots. I have posted an RFC pointing here. -- Beland 03:32, 9 October 2005 (UTC)[reply]


pywikipedia interwiki bot operators making links from the English Wikipedia:

  • Must run the latest version.
  • Must update on a daily basis.
  • Must speak English well enough to respond to complaints about incorrect links or other bot behavior.
  • Must actively monitor their English talk page, or have a note directing English speakers to meta or another page which is actively monitored and where they are welcome to post in English.
  • Must ask for permission on Wikipedia talk:Bots (primarily to allow people to voice any objections based on the history and reputation of the operator).
  • May conduct a short demonstration run immediately after requesting permission, so that participants can examine bot edits.
  • Need not speak the language of Wikipedias they are linking to, but this is encouraged. Obviously they will not be able to provide manual disambiguation for links to languages they do not understand.
  • Are encouraged to indicate on their English user pages which languages they speak.
  • Need not manually check all links made by their bot.

Non-pywikipedia interwiki bot operators:

  • Must ask for permission before doing a demonstration run.
  • Will need to have all links manually checked, at least during a test phase. This means they must speak the language of the target language(s), or find a volunteer who does.
  • Must speak English well enough to respond to complaints about incorrect links or other bot behavior.
  • Must actively monitor their English talk page, or have a note directing English speakers to meta or another page which is actively monitored and where they are welcome to post in English.
  • Are encouraged to indicate on their English user pages which languages they speak.

Other editors who see semantically inappropriate interwiki links being made by bots are encouraged to fix those links manually. In general, our feeling is that automatic links which are mostly right are better than no links at all. If one or more bots are making a high percentage of erroneous links, editors are encouraged to leave a note on Wikipedia talk:Bots and the talk page of the bot or bot operator. General feedback on interwiki bot performance is also welcome.

Comments/support/objections

  • Part of the reason I'm confortable with bots making links to articles in languages the operators don't understand is that these links are being made in the English Wikipedia. Anyone who knows enough to complain about any such links should be able to do so in English. As long as the bot operator also speaks English, that should be fine. If the bot operator doesn't speak the target language, then it should be fine to trust the judgment of a human who claims to know the right answer over a bot which could easily have made a bad assumption. I'm comfortable letting pywikipedia bots make the assumptions they do because there seem to be a lot of them running, and there does not seem to be any complaints that they are habitually inaccurate. -- Beland 03:36, 9 October 2005 (UTC)[reply]
    • What about the removal of (not the modification of) interwiki links? They should also be listed under the Interwiki bot section on Wikipedia:Bots. --AllyUnion (talk) 04:06, 9 October 2005 (UTC)[reply]
    • pywikipedia doesn't remove links without manual prompting, does it? Certainly no one who doesn't speak the target language (not just the source language, which in our case would be English) should remove an existing interwiki link. (Unless we have a policy encouraging removal of links to non-existing foreign-language articles?) Is that what you are getting at? -- Beland 03:06, 20 October 2005 (UTC)[reply]

Kakashi Bot

Kakashi Bot will be used for two purposes: One time requests & Marking short articles (less than 1K) as stubs. Anything that is less than 42 bytes will be marked with {{db}} and anything less than 14 bytes will be auto-deleted under my account. This is per the discussion held at Wikipedia:Village pump (proposals)#Auto deletion of nonsense. --AllyUnion (talk) 04:10, 9 October 2005 (UTC)[reply]

May hold off on this for a bit. --AllyUnion (talk) 05:16, 9 October 2005 (UTC)[reply]

Auto deletion of nonsense

Repast The Tempest (Insane Clown Posse) Formal amendment Hay sweep Actual effects of invading Iraq W. Ralph Basham Adam bishop Harrowlfptwé Brancacci Chapel Acacia ant(Pseudomyrmex ferruginea) Cyberpedia Principles of Mathematics Moss-troopers Gunung Lambak

All of the articles above are created by people as a "test". On average there are 2-20 of these posts that keeps admins busy unnecesarily. All pages have one thing in common they are less than 16 bytes. 15 bytes is a magical number because it is the smallest article size posible to have a redirect. or #redirect A = 15 chars. --Cool Cat Talk 02:30, 21 September 2005 (UTC)[reply]

14 bytes. The space after #redirect can be omitted [12] -- Curps 08:42, 21 September 2005 (UTC)[reply]
I already replied to this on your tlk page, it is trivial to code in a on "#redirect" on page halt. Thans for the input. --Cool Cat Talk 11:08, 21 September 2005 (UTC)[reply]

We already have a way to detect such pages. That'll be my bot in #en.wikipedia.vandalism . It is trivial to add it a function to make it delete any page newly created with less than 15 bytes. I intend to do so, objections? --Cool Cat Talk 02:30, 21 September 2005 (UTC)[reply]

Several people suggested why not simply disallow the creation of pages smaller than 15 bytes, well people would create a slightly larger page with nonsense. People should be able to experiment and if it is detectable its better. This way admins can worry about real problems rather than waisting hours on peoples "tests" --Cool Cat Talk 03:03, 21 September 2005 (UTC)[reply]
I intend on restrict the bot to article namespace --Cool Cat Talk 03:03, 21 September 2005 (UTC)[reply]
A valid tiny edit could be adding {{WoW}} to a userpage, but the bot will ignore it as it is not in main article name space. --Cool Cat Talk 03:54, 21 September 2005 (UTC)[reply]
I was the one that originally started bugging people about this idea. The bot is very reliable at detecting too small to be #redirect articles, and I thought it would be good to automatically delete them to save the admins time. Later, if this works, I think it would also be cool to test out the possibility of auto reverting page blanking, where if more than 90% of the page is blanked without an excuse, it automatically reverts its. One thing at a time though, I hope to see this implemented soon --appleboy 03:17, 21 September 2005 (UTC)[reply]
I support the bot addition. Not in a single case I've seen one of those tiny entries has been a valid one. -- (drini|) 03:16, 21 September 2005 (UTC)[reply]

It might be a good thing to avoid deleting anything that's a template. One example is a {{deletedpage}} (which happens to be 15 chars, but no reason it couldn't have been shorter). Or, it might theoretically be possible to have a very short template that performs some logic based on the {{PAGENAME}}. -- Curps 09:01, 21 September 2005 (UTC)[reply]

The bot would not delete admins creating tiny pages. Only admins can have a valid reason (that I can see) to create a tiny page, I think we can trust the admins for this. --Cool Cat Talk 09:40, 21 September 2005 (UTC)[reply]
Admins shouldn't, as I understand things, have editing privilages differnt from thsoe of ordinary users in good standing. a page that uses templates could have an arbitrarily long expansion for a very short source text. either the bot should ignore any page using transclusion, or it should count on the lenght of the text after transclusion, whichever is easiest to implemt, IMO. Recognizing the presence of a template is easy, after all. DES (talk) 14:51, 21 September 2005 (UTC)[reply]
True true but {{deletedpage}} is an admin only template like {{protected}} regular users should not create any page smaller than 15 bytes in reglar namespace. If someone can come up with legit instances I can write expetion cases for those. --Cool Cat Talk 15:11, 21 September 2005 (UTC)[reply]

Pages created in the past 10 minutes: Jamie Lidell Katy Lennon Cassa Rosso

--Cool Cat Talk 10:45, 21 September 2005 (UTC)[reply]

Exeptions bot will not delete:

  • #redircts
  • templates

Given that how large (in bytes) should a newly created article be written to be kept?

  • Aside from templates and #redirects is there anything else lets say less than 30bytes a legitamate page?

objection to 15 limit - Please note: During debate on potential deletion of stub template redirects at WP:SFD and WP:TFD it is often easier - so as not to get the "this may be deleted" message on the real template - to replace the redirect message with a simple template message. Thus, if a redirect to {{a}} was being debated, the nominated template would contain the text {{tfd}}{{a}}. 12 characters. Then there's pages containing simply {{copyvio}} - 11 characters. I can't think of any smaller possibilities, but they may exist - and 15 is thus too big! Grutness...wha? 01:49, 22 September 2005 (UTC)[reply]

Grutness, he already said he wouldn't delete templates. --Golbez 01:59, 22 September 2005 (UTC)[reply]
Oops - missed that bit - sorry! Grutness...wha?

Will this bot automatically post an explaination to creator's talk pages explaining why their page was deleted? Perhaps it should, if that's feasible. --Aquillion 16:09, 26 September 2005 (UTC)[reply]

First of all - good idea, go to it. (With the no templates, no #redirect's limits) 15 chars seems fine to me, we can expand it upward if we still get lots of slightly longer bad pages. I find it ironic that people have often cried that their articles were deleted automatically and we were always able to say - no, we don't delete things automatically, someone just was watching really fast and deleted it by hand. Now we won't be able to say that. It probably would be good to put a tiny note in the start-a-new-article MediaWiki text saying something like "Articles with less than 15 characters will be automatically deleted" - so we can point people there when they complain. In any case, good idea, go to it. JesseW, the juggling janitor 00:24, 4 October 2005 (UTC)
Sounds good. Might I suggest also - if there wouldn't be one automatically - a 10 minute delay between the posting of the article and the automatic deletion, just in case the article was a valid one which glitched while being saved (as can happen with IE from time to time). Grutness...wha? 00:43, 4 October 2005 (UTC)[reply]
Can be done. We can raise the cap a bit as JesseW suggested. What should be the cap? 42 bytes? ;)
The bot will need "some" admin power at least enough to delete pages. Should we start a wikiproject? Its easy for me to code the bot, however we will have people complaining unless we have a page explaining this entier mess. --Cool Cat Talk 22:33, 7 October 2005 (UTC)[reply]

I have been requested by Cool Cat for his assistance, but in light that Cool Cat doesn't have admin powers, I have taken upon myself to write this bot. I have decided three levels for this bot: 15 bytes that are not redirects or templates will be deleted automatically by the bot. 42 bytes are automatically marked with {{db}} with the delete reason of "Bot detected new article less than 42 bytes, possibly spam." Anything under 1k will be marked with the generic stub template. --AllyUnion (talk) 23:57, 8 October 2005 (UTC)[reply]

Does this include the 10 minute (or whatever) delay I suggested? I still think it would be useful... Grutness...wha? 00:13, 10 October 2005 (UTC)[reply]

Further discussion

  • Just a note to state the obvious: Pages with edit histories, especially where any single version would not be eligible for deletion, should not be auto-deleted. -- Beland 03:02, 20 October 2005 (UTC)[reply]
  • We have an existing stubsensor, though it may be inactive. Any new bot that's marking things as stubs should coordinate to make sure multiple bots or reports (not to mention WikiProject or random human editors) aren't working at cross-purposes. It might be worth it to look at the more sophisticated detection methods it uses, and the "lessoned learned" from that project. -- Beland 03:02, 20 October 2005 (UTC)[reply]

Bot submissions

Ok, so I'm not sure if this is the right place for this but I couldn't find any other talk pages devoted to Wikipedia Bots so here goes; if a bot grabs a page via the http://en.wikipedia.org/wiki/Special:Export method (both for ease-of-parsing, and to ease the load on the server), is there any way to submit its edits? By checking the page's source while editing it seems each submission must be accompanied by an opaque hash value. Is this actually enforced? Or is there something I'm missing here? Thanks in advance. porges 11:03, 13 October 2005 (UTC)[reply]

You mean the wpEditToken? You can get that by loading the edit form; it's just there to detect edit conflicts. If you choose to use the special:export interface for getting the wikitext instead of ripping it out of the edit form (why?), make sure you first get an edit token and then load the text from special:export though, as otherwise there's a race condition where you could overwrite someone else's edits. --fvw* 12:03, 13 October 2005 (UTC)[reply]
I think you need to fetch and repost wpEdittime, wpStarttime, and wpEditToken from the edit form, to prevent various kinds of problems. Some of those are for security, and some are to detect edit conflicts. It would save you from doing an additional HTTP transaction (and reduce server load, which is required of bots) if you also extracted the wikitext from the edit form. -- Beland 02:23, 20 October 2005 (UTC)[reply]
It would be nice if the tokens needed to edit a page were provided via the export interface (maybe if provided with an extra parameter or something)... I don't need to be grabbing all the page formatting etc whenever the bot needs to grab a page. porges 23:23, 24 October 2005 (UTC)[reply]

Notification Bot: cleaning up of Images with unknown....

I would like for NotificationBot to be allowed run to notify uploaders about their image in Category:Images with unknown source and Category:Images with unknown copyright status. I would also like to have the bot use two (not yet created) templates: {{No source notified}} & {{no license notified}}. The only difference in these two templates is, the following text will be added: The user has been automatically notified by a bot. Also, two new categories will be created: Category:Images with unknown source - notified & Category:Images with unknown copyright status - notified with the cooresponding templates. These templates will replace the ones on the image and the bot will sign a date on the page to indicate when it first notified the person. I would also like for the bot to give second notification on the 5th or 6th day before the 7 day period, then final notification on the 7th day. On the 8th day, it will change the image page and add a {{db}} with the reason of: "User already warned automatically 3 times about changing copyright information on image. Final notice was given yesterday, at the end of the 7 day period mark." This bot will run daily every midnight on UTC.

Oh, and the notification text will be something that looks like this:

First notice:

[[{{{1}}}|75px|center|]]
The image you uploaded, [[:{{{1}}}]], has no {{{2}}} information. Please correct the licensing information. Unless the copyright status is provided, the image will be marked for deletion seven days after this notice.

Second notice:

[[{{{1}}}|75px|center|]]
The image you uploaded, [[:{{{1}}}]], has no {{{2}}} information. This is the second notice. Unless the copyright status is provided, the image will be marked for deletion in 2 days after this second notice.

Third and final notice:

[[{{{1}}}|75px|center|]]
The image you uploaded, [[:{{{1}}}]], has no {{{2}}} information. This is the third and final notice. Unless the copyright status is provided, the image will be marked for deletion tomorrow.

--AllyUnion (talk) 11:34, 16 October 2005 (UTC)[reply]

Hi, take a look at Template:No source since and see if that one will do you much good. I recently started using it. «»Who?¿?meta 11:36, 16 October 2005 (UTC)[reply]
Sounds good. PS. Who, template sigs are evil. Can we get a bot to kill them? Alphax τεχ 11:40, 16 October 2005 (UTC)[reply]
Corrected. I was wondering if it meant template namespace or not. trying to save edit space :) «»Who?¿?meta 12:15, 16 October 2005 (UTC)[reply]
Um, I meant of the form {{User:Foo/sig}} - but they should definately NOT be in Template namespace. Alphax τεχ 13:12, 16 October 2005 (UTC)[reply]
It kind of does... but the category isn't what I want... and I think I should just use a clean template anyway. --AllyUnion (talk) 04:11, 17 October 2005 (UTC)[reply]
Green tickY Go for it. --Cool Cat Talk 11:46, 16 October 2005 (UTC)[reply]
  • I think it would be great for a bot to automate these notifications as they are a real pain to do manually. I think 3 notices might be excessive though. Why not just have the first and only notice state the date on which the image will be scheduled for deletion unless the source/copyright info is supplied? I don't think cluttering up the uploader's talk page with 3 messages per each image is going to be at that effective. The 7 day notice is already provided on the upload page as well. RedWolf 17:50, 16 October 2005 (UTC)[reply]

Revised notice [modified further, see source link below]:

The image you uploaded, [[:{{{1}}}]], has no {{{2}}} information. The image page currently doesn't specify who created the image, so the copyright status is therefore unclear. If you have not created the image yourself then you need to argue that we have the right to use the image on Wikipedia (see copyright tagging below). If you have not created the image yourself then you should also specify where you found it, ie in most cases link to the website where you got it, and the terms of use for content from that page. See Wikipedia:Image copyright tags for the full list of copyright tags that you can use. Unless the copyright status is provided, the image will be marked for deletion on {{{3}}}.

--AllyUnion (talk) 04:22, 17 October 2005 (UTC)[reply]

Here is the source of the actual template: User:NotificationBot/Image source. --AllyUnion (talk) 04:30, 17 October 2005 (UTC)[reply]

How will this bot handle the existing thousands of no-source images that are over seven days old? What about cases where someone has uploaded several dozen unsourced images? --Carnildo 06:56, 17 October 2005 (UTC)[reply]
It goes through the above categories. Unfortunately, it would have to make several notifications, one after another. The other alternative is for me to program it far more extensively to have it operate on a flatfile database, collect all the information on what user needs to be notified of what, notify them and change all the images there after. That would be invariably more complex than it is now, and far more time consuming for me to write. As for anything existing no source image over seven days old, it won't mark a speedy on those. Though, it could, but the point of the bot is to give the uploader notification of 7 days, THEN mark it with a speedy deletion tag. --AllyUnion (talk) 13:09, 17 October 2005 (UTC)[reply]
It is under the assumption that anything in the former categories, the user has not been notified. --AllyUnion (talk) 13:42, 17 October 2005 (UTC)[reply]
  • I agree one notice is plenty. Either they ignore it, they aren't editing this week, or they respond to the first message, and multiple messages will just be annoying if they're ignoring it or not around. -- Beland 02:17, 20 October 2005 (UTC)[reply]
  • It would definitely be annoying for someone to continually be getting image-deletion notices on their talk page while they were trying to edit. People who upload lots of images (assuming they are worth keeping) are making important contributions, and it would be nice to keep them happy. At least on the first run, a simple compromise might be to just drop a line and say, "We noticed you have uploaded multiple images with unknown source or copyright status. To avoid future messages like the one previously posted, you may wish to review your contributions and identify sources and licensing." But this risks them not knowing which 2 out of the 20 they uploaded were problematic, or someone deleting those while the bot was waiting (perhaps another 7 days) to posted the threatened messages.

You could also just skip repeat customers on the first run, on the assumption that it will take a few days to run through all 20,000 or however many images need to be processed, and that a lot of people will probably check their other images after becoming aware that this might happen. I don't know which alternative is worse (annoyance or ignorance) but be prepared for complaints if you post more than two or three messages of the same kind to someone's user page in a relatively short timeframe. In the long run, will the bot check these categories every hour or every day or something? If it's hourly or thereabouts, I wouldn't worry about posting multiple messages. After the first one or two, they should get the idea, and stop uploading without attribution. It would be nice to batch-process, but I'd hate for that to delay implementation of this bot. Images are being deleted all the time, so people need to get notices ASAP. -- Beland 02:17, 20 October 2005 (UTC)[reply]

  • Oh, and what criteria does the bot use for marking {{db}}? I would think that if anyone has edited either the image description page or the corresponding talk page since the notification, it should hold off and/or flag for manual review. And I presume responses to the bot owner's talk page would prevent an article being so flagged, if only through prompt manual intervention. -- Beland 03:23, 20 October 2005 (UTC)[reply]

Issues

There are some issues I'm trying to resolve. One of them is that the bot is over writing itself. My initial idea that I had for the project / program would that the bot would make a first pass on the category of images and move them into a notified category. After the 7 day period in the notified category, it would be presumed that the images can be deleted if they are still in that category. The problem now seems that I'd have to build up a list or category or a hash table based on repeat customers. A database may seem overkill, but it seems to me to be the most reasonable solution. We are talking about a great deal of images in that category that really need to be deleted and all the users need to be notified about them, even if they are not active no longer. This covers our butts from someone getting really pissed about an image deletion that they were not notified. It's more of a, "Yes, we did let you know, and you failed to do anything about it, so it's not our fault." More information on my new project board: User:AllyUnion/Project board --AllyUnion (talk) 08:55, 20 October 2005 (UTC)[reply]

If you can do it, go for it. HereToHelp (talk) 02:35, 2 November 2005 (UTC)[reply]

301 redirects fixing

As part of the project Wikipedia:Dead_external_links I would like to fix 301 redirects. The bot will be run as User:KhiviBot.

This will be a manually assisted bot. It will be run using perl WWW-Mediawiki-Client-0.27. I believe cleaning up 301 redirects is a nice goal to have. Generating the list of url is a manual process since sometiimes the redirects might not be valid. Hence human intervention is needed to generate a list of url's. Once the url list is obtained then the bot can fix them.

Example of this is 114 instances of http://www.ex.ac.uk/trol/scol/ccleng.htm .

Seems like a good plan, only thing that has me slightly worried is that there could be people abusing HTTP 301 for temporary name changes or round robinning or such. Barring any examples of this actually happening I'd support a test run though. --fvw* 16:18, 18 October 2005 (UTC)[reply]
Sounds like a good idea, in general. The bot will need to retain the text which is displayed to the reader, if it already exists. It will also need to be excluded from talk pages of all namespaces. It might be prudent to exclude it from Wikipedia: space as well. There is one case where a simple replacement might do the wrong thing: if the URL is part of a citation that gives the date accessed. If the citation says "URL X, accessed on day A", then knowing both X and A, someone can find the document referenced on archive.org. The change we want to make is to "URL Y, accessed on day B", but a simple bot would do "URL Y, accessed on day A", which might not exist at archive.org. You might need to experiment a bit to find a heuristic to detect these cases and flag them for manual review. -- Beland 01:40, 20 October 2005 (UTC)[reply]

Bots and double redirects

I notice User:Kakashi Bot is fixing double redirects like A -> B -> C, it sounds like by just assuming that A should point to C. Looking at Special:DoubleRedirects, this is incorrect maybe 1% of the time. And many of the problems seem to occur with loops or bad human edits. Have we decided that any collateral damage here is acceptable, and in general, bots should just make this assumption? That would obviate the need for an entire project. Unless the bot could flag loops - that would be really handy. -- Beland 03:33, 20 October 2005 (UTC)[reply]

I see there are about 991 now, split across two pages. If we think this is OK, another run or two could clear these up entirely...I wonder how fast they are being created. Theoretically, this Special page is updated periodically. Hopefully, it's on a cron job, and it would be possible to put a bot on a cron job that ran just after the Special update. -- Beland 03:50, 20 October 2005 (UTC)[reply]
It would be a lot more efficient to link a bot on the Toolserver, Beland, then query dump the double redirects into a file, then correct them. --AllyUnion (talk) 08:50, 20 October 2005 (UTC)[reply]
  • There have been some objections on Wikipedia_talk:Computer_help_desk/cleanup/double_redirects/20051009 to making the "change A->B->C" to A->C, B->C" assumption in the case where A is a misspelling of B. -- Beland 18:34, 22 October 2005 (UTC)[reply]
    • By the way, the list at Special page is only 80% to 90% accurate. There are some inaccuracies due to the fact that are some false positives in the data, which I reported as a bug already. The logic behind Kakashi Bot that I eventually wrote Kakashi Bot on the following premises:
      1. If A equals C, then do nothing
      2. If A exists and C exists and A is a redirect page and A does not redirect to C and A redirects to B, we assume B redirects to C and therefore redirecting A to C is not harmful.
      3. If A's redirect target is not B, then do nothing and report the page as an error.
    • Of course, it would be far more logical not to trust the special page, find all the redirects that exist and find which of them are double, triple, etc redirects then have a bot operate on the following logic:
    • Premise: A is a double or more redirect.
    • In a do-while loop fashion:
      1. do:
      2. if page X is redirect, add page X to queue Q, get the redirect target for page X, and set page X to the redirect target
      3. else terminate the loop
      4. (after the loop) Change all redirects in Q to point to X, which should be an article which all the redirects point to.
    • That would technically solve all redirects. Of course, this is on the theory that all redirects in Q are valid moves and redirects. --AllyUnion (talk) 11:49, 25 October 2005 (UTC)[reply]

After pondering what "safe" assumptions would be, I propose the following:

Then we'll see if there are any further complaints or problems. And yeah, querying a database directly would be great, though in the short run, I'm happy to have anything. Oh, and will someone be checking for double redirects that cannot be fixed automatically? It's fine with me if you want to just dump them into some category for random editors to fix. -- Beland 08:03, 26 October 2005 (UTC)[reply]

Welcome bot

A bot that welcomes new users, providing they arn't vandals or don't have offensive usernames. The bot uses the new user list combined with a "memory", then waits for a specified amount of time (eg. 30 minutes) before welcoming them. If the user account is blocked before the time limit, the account won't be welcomed.

67.60.52.155 14:34, 20 October 2005 (UTC)[reply]

I think this is a bad idea. I believe it's like 5% or less of the users who actually create an account actually stay, so it would be a resource hog to continuously make new user talk pages. Plus, I think the Welcoming committee does a better job, even if slow sometimes, I know I would prefer to be welcomed by a human. Plus each user has their own welcome message that is unique, and you can reply to them and ask questions, it's part of being in the community. IMHO, I prefer there not be one. «»Who?¿?meta 15:15, 20 October 2005 (UTC)[reply]
Thank you for your feedback - all opinions are welcomed. With regards to the "resource hog" issue, the robot can be made a little more sophisticated by only welcoming users who have made edits, using the User Contributions feature. While I agree that it is better to be welcomed by a human, there is no reason why the message could not indicate a person to contact on the welcoming comittee, designated on a rolling basis. 67.60.52.155 16:30, 20 October 2005 (UTC)[reply]
I have put a note on the welcome committee page asking for any feedback they may have. 67.60.52.155 16:35, 20 October 2005 (UTC)[reply]
If we want to do this (and I'm not sure we do), why not just make it the default content for the talk page of newly created users? Should be a trivial mediawiki patch. --fvw* 18:42, 20 October 2005 (UTC)[reply]
Actually I was thinking the same thing, having a default message for the talk page. Wouldn't be a bad idea, and there are plenty of users already on WC that could probably come up with something decent. «»Who?¿?meta 18:57, 20 October 2005 (UTC)[reply]
The point of a greeting isn't to give information to newcomers. The point is to welcome them into the Wikipedia community. A bot can't do that. Isomorphic 02:57, 21 October 2005 (UTC)[reply]

Personally, welcome messages, whether from bots or humans, annoy me. If people need information, we should make that information easily accessible from the Main Page, Community Portal, or other obvious location. -- Beland 22:54, 22 October 2005 (UTC)[reply]

I think users should be welcomed by humans, not bots. The main reason I have for this is that many users who are welcomed go immediately to the welcomer's talk page and leave them a message and/or question. If someone is welcomed by a bot, they can't do this. Also, welcomes are much more personal when left by a human; it makes the user think, "Wow, there are people out there who care, and one of them noticed me." A bot makes someone think, "Wow, they have a bot that welcomes people...kind of cool, but if this is a community, couldn't someone have taken 60 seconds of their day to welcome me personally?" EWS23 | (Leave me a message!) 04:12, 24 October 2005 (UTC)[reply]

I had the same thought months ago. We have bots to show people the door, we have humans welcome them to a community of humans. --AllyUnion (talk) 11:52, 25 October 2005 (UTC)[reply]

When I welcome people, I like to comment on some of their edits and point them to WikiProjects or other project pages related to the articles they've edited. A bot can't do that (without complications, anyway). --TantalumTelluride 05:21, 2 December 2005 (UTC)[reply]

I'd like permission to use a bot (Mairibot) to assist with renames from WP:SFD. It would use pywikipedia bot; template.py for template renames and touch.py for category renames (I don't know if that needs approval as it's not actually changing anything, but it's better to be safe). --Mairi 22:51, 20 October 2005 (UTC)[reply]

If it is making edits, yes it would need approval, unless you felt that it needs the watch of human editors on RC patrol. --AllyUnion (talk) 11:54, 25 October 2005 (UTC)[reply]
I figured that, but touch.py makes null edits, so they wouldn't actually show up anywhere regardless. But either way, I'd need approval for the template renaming... --Mairi 17:13, 25 October 2005 (UTC)[reply]
  • I assume the pywikipedia framework respects our link ordering convention - i.e. at the end of the page, category links come first, followed by interwiki links, followed by stub notices? I've had to add support to Pearle (another category bot) to do that, and anything her parser can't handle gets dumped in Category:Articles to check for link ordering. Human editors don't always follow the convention, but I'm thinking if there are multiple bots rearranging thing, they should converge on the same ordering. (Whobot also does category work, and is actually based on Pearle.) -- Beland 07:31, 26 October 2005 (UTC)[reply]
I'm not sure about the other parts of pywikipedia, but template.py replaces any occurrances of the existing template with the new template, regardless of where they are (it isn't specific to stub templates, even though that's what I intend to use it for). So the ordering wouldn't matter, and the existing order would be preserved, I believe.
Incidentally, I wasn't aware that was our link ordering convention. I generally put stub notices before category links, and I think that's where I usually see them... --Mairi 07:56, 26 October 2005 (UTC)[reply]

It looks like I'll occasionally have to use replace.py when renaming redirects; because Mediawiki now apparently considers using Template:A (that redirects to Template:B) as only a use of and link to Template:B. --Mairi 05:31, 8 November 2005 (UTC)[reply]

I'd like permission to run a bot to find and list orphaned afds, of the sort that I've been doing by hand for the last month and a half or so. It is written in Perl and will be manually-assisted. Since it will probably only be making a dozen or so edits a day, I don't plan on asking for a flag. —Cryptic (talk) 20:37, 21 October 2005 (UTC)[reply]

You could easily generate a list of stats... actually, I've been asked for the AFD Bot to generate a list of a summary, but I haven't yet figured out how that could be done. Well, I just had a small thought, but that would assume everyone's sig ends with a "(UTC)" at the end. --AllyUnion (talk) 11:56, 25 October 2005 (UTC)[reply]
I actually already use a script to generate a list of pages that are in Category:Pages for deletion but not listed on a current afd subpage; the point of the bot is so I can take the list it generates and say "Y Y Y Y Y N Y Y" to have it timestamp the relistings and put them on afd instead of copy-pasting it in by hand. My current method of doing the last part by hand still takes me about an hour of repetitive work every day, and it's starting to get really old. —Cryptic (talk) 06:02, 26 October 2005 (UTC)[reply]

Expansion for archival duties

As requested at Wikipedia:Bot requests#Archival Bot and announced at Wikipedia:Village pump (technical)#Village Pump Archival Bot and Wikipedia talk:Administrators' noticeboard#AN&ANI ArchiveBot, I'm planning on adding a archival feature to Crypticbot (unless, of course, the editors of both WP:AN and WP:VP, or those here, object). The way this will work is:

  1. Fetch the current contents of the page.
  2. Split it into separate sections (using ^==[^=].*==\s+$ as a delimiter) (for those reading this who don't speak regexp, that's any line starting with exactly two equals signs, followed by at least one character where the first isn't an equals sign, and then ending with at least two equals signs, optionally followed with whitespace).
  3. Find the latest timestamp in the section's text.
  4. Optionally look for comments like <!-- don't touch this section --> or <!-- don't copy this section to the archive when it's removed -->, if folks think such would be useful.
  5. Add any sections where the latest timestamp was more than seven days in the past to the respective archive page. (In the case of Wikipedia:Village pump (technical)/Archive and the other village post archives, the sections are instead just removed.)
  6. Replace the current contents of the original page with all the remaining sections.

Like Crypticbot's current AFD-orphan task, this will be run once a day; unlike the AFDs, I won't be vetting the edits beforehand. Not sure if it merits a bot flag or not; if used on all the village pump (six, and their archives) and administrator's noticeboard pages (three, and their archives), it'll be at most eighteen edits a day. Perhaps one more to update Template:Administrators' noticeboard navbox, but I expect to be doing that by hand at first. —Cryptic (talk) 20:02, 16 November 2005 (UTC)[reply]

RussBot: proposed expansion

I am requesting permission to expand the operations of RussBot to include a regularly scheduled counting of links to disambiguation pages. I have a script that performs the link count and formats the output. For sample output, please see User:RussBot/Disambig maintenance bot test. If this is approved for regular operation, I would run it weekly and direct the output to Wikipedia:Disambiguation pages maintenance instead of to the User page. Russ Blau (talk) 11:16, 23 October 2005 (UTC)[reply]

So it would count the links for every disambig it needs to every time it runs? Or for only new links? What happens when the list is excessive, is there anyway for you to control the bot on the number of pages it needs to count links for? --AllyUnion (talk) 11:59, 25 October 2005 (UTC)[reply]
The bot would count all the links to pages that are listed on Wikipedia:Disambiguation pages maintenance. There are about 365 pages listed currently. For each one of those articles, the bot retrieves Special:Whatlinkshere/Article title and counts the number of links there. It does not retrieve all the referencing pages. For pages that have more than 500 inbound links (currently there are only two pages out of 365 that fall into this category), it makes one call to [[Special:Whatlinkshere/...]] for every 500 references. So the bot would send about 370 HTTP requests to the server. It uses the pywikipediabot framework, so the throttle spaces these requests out several seconds apart. What type of controls do you think would be appropriate? --Russ Blau (talk) 13:26, 25 October 2005 (UTC)[reply]
  • That sounds sane enough to me, as long as it obeys the 1-edit-every-10-seconds speed limit. Excellent idea, and looking at the demo page, excellent execution. Is the "history" in the HTML comment limited to a finite and reasonable length? And I assume the bot will not edit if it can't find its start and end markers? It looks like the bot properly ignores Talk: pages, etc.; it would be good to note that on the page once it goes live. -- Beland 07:20, 26 October 2005 (UTC)[reply]
    • Yes, the bot ignores all pages outside of the main article namespace. The history in the HTML comment will be limited to 5 entries. And you are correct that the bot will not edit if it does not find start and end markers. Thanks for the comments. --Russ Blau (talk) 12:45, 26 October 2005 (UTC)[reply]

I would like to run a simple replace.py bot to fix various problems with ==External links== and ==See also== sections. For example some people seem to call "External links" "Weblinks" (I think borrowed from the German wiki), and other mis-capitalisations and minor problems. Martin 19:25, 25 October 2005 (UTC)[reply]

Is there a specific list of replacements that folks can check for potential problems? -- Beland 07:08, 26 October 2005 (UTC)[reply]
  • ==Also see== to ==See also==
  • ==Internal links== to ==See also==
  • ==Weblinks== to ==External links==
  • ==Web links== to ==External links==

Plus all capitalisation variations, and equivalent changes for different types of headings (e.g. ===Weblinks===).

There are a quite a lot and they get introduced fairly quickly as well, I have done about 500 by hand already but its not worth it when a bot could do it! thanks Martin 09:02, 26 October 2005 (UTC)[reply]

Has there been any decision on this bot? I'm ready to set him up with a flag. Datrio 06:53, 5 November 2005 (UTC)[reply]
Over 22,000 edits with no valid complaints counts as a decision for me! thanks - Martin 10:34, 5 November 2005 (UTC)[reply]

Roomba requesting permission to clean

As part of the ungoing effort to remove illegally used images from Wikipedia, and to clarify our license tags, I have created, and am continuing to develop, a collection of python scripts to facilitate the bulk processes of media content. My scripts interact directly with a local copy of the Wikipedia backend database, which facilitate operations which would be unreasonable with a traditional bot. For example, a normal bot would need to load all 21,000 fair use images to determine which are linked in the main namespace, while my bot has the answer with a single query. Although I'm currenty running these scripts on my account in a supervised and very slow manner, I believe it would be a good idea to move these edits into their own accout and potentially operate with the bot flag. Operation of Roomba will be automatic, semi-manual, or manual depending on the exact nature of the changes but unless I later request retasking here, all of roomba's operation will be confined to tasks related to the bulk handling of media pages potentially including adding notices to user talk pages however I would want to discuss that more with others because I feel it is much better to have notices placed by someone who can answer questions, which is also why I have little desire to produce a bot which edits articles. --Gmaxwell 23:06, 25 October 2005 (UTC)[reply]

I suggest a new feature for Roomba, adding edit summaries on behalf of its boss. How's that sound? :) Oleg Alexandrov (talk) 03:50, 26 October 2005 (UTC)[reply]
On talk page posts no less? I dunno, sounds risky. Plus Roomba doesn't speak english. --Gmaxwell 05:42, 26 October 2005 (UTC)[reply]
  • What sort of "bulk processing" other than leaving talk messages would it do? AllyUnion is already working on a bot to leave deletion notifications on talk pages and flag pages for deletion. You would definitely need to coordinate to avoid accidental destructive interference. -- Beland 07:12, 26 October 2005 (UTC)[reply]
    • Ah, I said above that I am not really interested in doing talk page notices. I'm aware of that. Some of our rules are difficult for other bots to help with, I gave an example above: fair use images which are not used in the main namespace are a CSD 7 days after a notice is added. The problem is we have over 20,000 image. Because my bot talks directly to a wikipedia database I can identify the images in question instantly, then the bot goes and tags them (about 2,200) for other users to act on. Thats what I've done so far, but it will need to be done on an ongoing basis, they aren't easy to catch from the upload log because people need time to link them in the first place. Also, my database can answer questions like "what images look orphaned but are really provided as external style links in the main namespace" and then go tag images accordingly... This was a concern for the 2,200 I tagged above, it turned out that none were linked externally, but about 50 were cases of image pages linked in the main namespace (I talked a human into retagging these since there weren't that many). So really thats it, the bot is intended to tag content using criteria that are difficult to apply in other ways. --Gmaxwell 02:14, 27 October 2005 (UTC)[reply]

I'd like permission to run Commander Keane bot. It uses the pywikipedia framework, specifically solve_disambiguation.py and touch.py, but I may uses other pywikipedia functions when I become more experienced.

Admittedly, I have been running the bot for a couple of weeks. The instructions on the project page gave me the impression that only newly designed bots needed discussion here (the use of "bot-maker" specicfically). --Commander Keane 16:00, 27 October 2005 (UTC)[reply]

Approved for the official week's trial run. Please list it on the WP:BOTS page under "bots running without a flag" - I don't foresee any problems with you applying for a bot flag after a short while, though. Rob Church Talk 21:39, 4 November 2005 (UTC)[reply]

I'm asking for permission to use this bot. It uses pywikipedia, and I'd like to use it to fix highway disambiguation pages, add templates to pages, and maybe more after I learn more about the program. --Rschen7754 talk 04:12, 1 November 2005 (UTC)[reply]

I'm happy with the highway stuff; but what templates are you adding to which pages? Rob Church Talk 21:38, 4 November 2005 (UTC)[reply]
It's a big pain to do WikiProject templates... this bot would add stuff like {{California State Highway WikiProject}} to talk pages of highway articles. --Rschen7754 (talk - contribs) 23:56, 4 November 2005 (UTC)[reply]
Should I go ahead and do the trial run? --Rschen7754 (talk - contribs) 04:35, 6 November 2005 (UTC)[reply]
Authorised for a week's trial run. Please add a link on WP:BOTS under those without a flag, throttle edits to one every 60 seconds, and report back after the trial with exact details of anything else you have planned, before applying for a bot flag. Rob Church Talk 20:48, 6 November 2005 (UTC)[reply]
A question: How do you use template.py? Because it doesn't seem to work on my computer. The disambiguation is fine however. --Rschen7754 (talk - contribs) 02:56, 7 November 2005 (UTC)[reply]
I had the same problem, and fixed it by editing family.py; in the definition of self.namespaces for namespace 10 I added a line with 'en': u'Template', (at line 368 in my version). It's not the best fix (as just works around the problem), but it does the job. --Mairi 03:21, 7 November 2005 (UTC)[reply]
Another question: Does anyone know how to add Wikiproject notices to talk pages? Thanks... --Rschen7754 (talk - contribs) 03:55, 7 November 2005 (UTC)[reply]

Asteroid bot

I would like to run a bot with account User:HELLO, WORLD! to maintain the lists of asteroids and create new articles for asteroids. - Yaohua2000 03:44, 1 November 2005 (UTC)[reply]

Well, if you're going to use it, I recommend that you do as I suggested. You should also make sure that the asteroid articles are categorized correctly (into family level). Use also the Template:MinorPlanets Navigator, it's handy. If possible, could you add data from the PDS Asteroid Data Sets? It has plenty of information on the asteroid families, compositions, spin rates, occultations and so on.--Jyril 21:19, 1 November 2005 (UTC)[reply]
Please don't run it under that account; the name should make it clear that it's a bot. —Cryptic (talk) 01:41, 2 November 2005 (UTC)[reply]
What is the status of this request? Does the user still wish to run a bot? If so, please post here exactly what username will be used and what the bot will be doing. Rob Church Talk 20:51, 6 November 2005 (UTC)[reply]
Sure. I'd use this username: User:Asteroid botYaohua2000 02:26, 7 November 2005 (UTC)[reply]
Either create a mini user page for the bot, or redirect User:Asteroid bot to a subpage of your own user page - whichever option; have that page provide a short description of what the bot does, a link to its contributions, and instructions on getting in touch with you. Rob Church Talk 07:53, 7 November 2005 (UTC)[reply]

When to use a Bot user account

Hi! I have a question about the difference between using a bot and the very useful User:Lupin's Popups tool. I've tried working with solve_disambiguation.py in manual mode in conjunction with the popup-tool and met with great success. Based on the information contained within the WP:BOT article page, any use of pywikipedia should be done under a bot account. Though, I don't see much difference in running the bot in a manual mode (where each edit is scrutinized and responded to on screen) and using the Popups tool to do the same thing. Would one also need a bot account to use the automated editing available in Popups? Just wondering before I consider continuing with pywikipedia or stick with Popups. Thanks :-). >: Roby Wayne Talk 07:30, 3 November 2005 (UTC)[reply]

You will require a bot account because your edits would still be coming in fast enough to clog up Recent Changes. In order to avoid that, you need a bot flag, hence; we need to see a separate account. Rob Church Talk 21:31, 4 November 2005 (UTC)[reply]
See! I knew there was a logical reason :-). Thanks, RobChurch. >: Roby Wayne Talk 19:34, 7 November 2005 (UTC)[reply]

We are Luigi30. You will be assimil... wrong bot.

I (Luigi30) am looking for approval on a bot to clean out WP:RA's created articles. Obviously it'd require approval to delete each link so I don't clean out legitimate blue links (redirects, sources). It'd run using pywikipedia. I don't think I'd need to run it more than once a week seeing as how infrequently Requested Articles is used. It takes me 3 hours to do it by hand, I'd say it'd take a couple hours to put together the bot and it'd be a whole lot quicker to delete each one. I've got the username LuigiBot as you can tell. LuigiBot 22:32, 3 November 2005 (UTC)[reply]

Approved for a week's trial run. Please list it on Wikipedia:Bots under the section for bots without a flag and keep us posted here. Throttle your edits to one every 60 seconds max. and if all goes well, you'll have no problems getting consent to apply for the bot flag. Rob Church Talk 21:29, 4 November 2005 (UTC)[reply]

Bot status for User:MalafayaBot

I hereby request approval for running bot MalafayaBot. This bot will exchange interwikis between the English and Georgian Wikipedia using the pywikipedia software. Malafaya 18:01, 4 November 2005 (UTC)[reply]

Approved for a week's trial run. You can apply for a bot flag afterwards if there are no objections, but check back here first. Please list it on the WP:BOTS page, under the section for bots without a flag, and throttle edits to one every 60 seconds max. Rob Church Talk 21:41, 4 November 2005 (UTC)[reply]

Bot Status Aweebot

I am looking for approval to run my WikiBot: User:Aweebot . I will start the bot nightly or every other night at 8 PM PST. It will run on wikipedia en and it is running pywikipedia. This bot will do Solve Disambiguation, Categories, Redirects, interWiki. Why do I need the bot? Well I want to help out in the Wikicommunity to make wikipedia even better. Is it important enough for Wikipedia to allow my bot? I think another bot helping out would make wikipedia better, as there is a growing ammount of articles, we want to keep growing and not have mistakes that never get fixed on articles. Thanks --Actown 02:55, 3 November 2005 (UTC)[reply]

You'll have to be a bit more specific on what you want to do with disambiguation, categories, etc. --AllyUnion (talk) 01:10, 5 November 2005 (UTC)[reply]
Better? If not can you explain what you mean? I changed this: This bot will do Solve Disambiguation, Categories, Redirects, interWiki. Thanks, --Actown 01:24, 5 November 2005 (UTC)[reply]
How will it perform those tasks if it is unattended? Martin 10:43, 5 November 2005 (UTC)[reply]
Oh Then I will watch over it. --Actown 15:44, 5 November 2005 (UTC)[reply]
The scripts cited require user intervention by default. You will need to monitor the thing running. Exactly what will be it be doing with "categories, redirects and interwiki", however? We have a policy whereby users running bots modifying interwiki links must understand the languages being changed. Rob Church Talk 20:01, 6 November 2005 (UTC)[reply]
When talking about disambiguation, what you will have to do is not just 'watching over it'. The disambiguation bot is highly interactive; it is actually the user who decides for each separate page being changed. - Andre Engels 09:19, 9 November 2005 (UTC)[reply]
What will you be disambiguating? Everything? You'll have to be a bit specific on what you plan to disambiguate. What will you be doing with categories? Merging them? Editing them? Moving or changing the categories? What would you be doing with redirects? Correcting the redirects? Changing the redirects? --AllyUnion (talk) 12:37, 7 November 2005 (UTC)[reply]

Interwiki Bot policy

On basis of what is the interwiki bot policy being made? The policy as it is now makes it impossible to actually run the bot, because it will get all interwikis. Apparently it was created by someone who has not first checked the actual working of the bot. Who was it, where has this been decided, and what can I do to get the policy under discussion again? - Andre Engels 09:16, 9 November 2005 (UTC)[reply]

You may propose a new policy here. The consideration was after a considerable screw up made by Flacus and his FlaBot (FlaBot) which removed interwiki links on several articles. Flacus made no attempt to continuously check his bot at the time, and his bot was subsequently banned. He also attempted to, apparently, get his bot unblocked by stating that his bot was confused for a vandal bot with FIaBot (FIaBot).
According to the block log:
  • 00:22, 11 May 2005 AllyUnion blocked "User:FlaBot" with an expiry time of 3 weeks (Bot not operating as intended; Removing zh-min-nan or zh (chinese) links.)
And according to the contributions: Contributions of FIaBot (FIaBot)
  • 18:48, 11 May 2005 (hist) (diff) m Japanese cuisine (robot Modifying:de)
Flacus was also repeatedly warned, both on his German userpage and his English userpage to fix his bot, but to no avail. See: Wikipedia_talk:Bots/Archive_interwiki#FlaBot. However, since his unbanning, his bot seems to be operating properly. The main issue of concern here is that we have bots that seem to operate, and end up removing interwiki links. Shouldn't the removal of interwiki links be done by someone who familiar with the language? --AllyUnion (talk) 21:39, 11 November 2005 (UTC)[reply]
Furthermore, shouldn't they reasonably understand English if they plan to run their bot here or at least have someone who can speak for them? --AllyUnion (talk) 22:03, 11 November 2005 (UTC)[reply]
It's hard for me to check the facts here; the edit you give above is indeed by FIabot rather than Flabot as he claims, but there are other complaints on the history page of the user discussion page which do seem to be valid.
However, I still am of the opinion that restricting bots to only edit languages known by their operators is too strict. First, the basic working of the bot includes the addition of interwiki links on pages that interwiki links exist to. Secondly, a lot of information can be found in languages that one does not speak. As an example I show this recent edit by Robbot - I do not speak Swedish, but I understand enough of it and of Wikipedia to realize that sv:Färgelanda is a disambiguation page, and that of the three meanings on the disambiguation page, Färgelanda Municipality corresponds to sv:Färgelanda kommun. - Andre Engels 10:45, 13 November 2005 (UTC)[reply]
One thing I do, but which I guess is hard to get into rules, is that I check through the 'normal browser' before allowing Robbot to remove a page or before making a decision where there is more than one page found for a language. - Andre Engels 13:20, 13 November 2005 (UTC)[reply]

How about we amend the policy to the following:


If using the pywikipedia framework:

  • Please run the latest version
  • Update on a daily basis

Users who run such bots must prove themselves to be responsible and harmless, and work closely with the community of interwiki bot operators. Users who come from other language Wikipedias must demonstrate some proof of credibility, such as being an Administrator of a major language Wikipedia, Bureacrat, developer for the pywikipedia framework, or being vouched by a credibile user on the English Wikipedia.


Would that be much better? Instead of focusing on whether a user can speak the language, how about focus on whether the user is responsible enough to keep on eye on their bot, and stop their bot if it goes awry. The issue that I want to try to avoid is that an operator of an interwiki bot refuses to stop his or her bot on the basis they believe it is working correctly, when there is evidence that is not. --AllyUnion (talk) 00:15, 14 November 2005 (UTC)[reply]

It would be acceptable to me, yes. The important thing in my opinion would be that people are careful when doing removals or non-trivial changes to interwiki links with their bots. My problem with your rule was that I think 'speaking the language' is too strict a rule for that, especially when it's also extended to the standard interwiki-bot additions. - Andre Engels 09:19, 17 November 2005 (UTC)[reply]

A cut and paste bot

Please see Wikipedia:Administrators' noticeboard#A cut and paste bot. Who was running this? jni 08:37, 10 November 2005 (UTC)[reply]

Likely to be a vandal bot. --AllyUnion (talk) 21:39, 11 November 2005 (UTC)[reply]
Correction, apparently run by Infohazard who has not registered for a bot. --AllyUnion (talk) 22:06, 11 November 2005 (UTC)[reply]

Reporting unauthorised bots

I'm wondering if it would be worth having a standardised means of flagging unauthorised bots. How about a template to stick on the user talk page of a suspected bot? It would serve a triple purpose:

  • to tell the user to confirm if he/she/it is a bot or not, and where to request approval before continuing (just in case the human behind it sees it) (not to forget the possibility of the same user account being used for both bot and human edits)
  • to inform others that it is a suspected bot
  • to produce a list of unauthorised bots that admins can examine, either by What links here or a category.

-- Smjg 12:04, 11 November 2005 (UTC)[reply]

Unauthorized bots should be blocked, without exception, unless they have requested permission here, or at least indicated that it is a bot. For example, Kakashi Bot is not necessarily an authorized bot, but it is listed under "Bots running without a flag" on the project page. Kakashi Bot is subject to human scrutiny, and is intentionally left without a bot flag. I was blocked for running unauthorized bot as well. Typically, it is asked that editors do not run bot-assisted or bot edits under their own account. Exceptions are made to this for administrative oriented bots, like Curps who runs an autoblocker. --AllyUnion (talk) 21:47, 11 November 2005 (UTC)[reply]
True, but my points are:
  • Not all of us have the power to block.
  • Sometimes one may be unsure whether a user making bot-like edits really is a bot.
  • There's always the possibility of a random person, especially one who hasn't gone through the bot approval process, using a single account (or raw IP address) for bot and manual edits, either to try and disguise that he/she/it is running a bot or out of not knowing better, and that this needs to be taken into consideration when verifying bot accusations.
-- Smjg 11:36, 14 November 2005 (UTC)[reply]
Then find someone who does, like at WP:AN/I. --AllyUnion (talk) 00:57, 16 November 2005 (UTC)[reply]

Bot status request

User:DHN-bot - an Interwiki bot to exchange interwiki with the Vietnamese Wikipedia. DHN 22:18, 12 November 2005 (UTC)[reply]

Please be more descriptive with your bot's user page. --AllyUnion (talk) 09:41, 14 November 2005 (UTC)[reply]
I've updated the bot's user page accordingly. DHN 06:52, 29 November 2005 (UTC)[reply]

Question

What computer language do one uses to create Wikipedia bots? CG 22:35, 12 November 2005 (UTC)[reply]

Any language may be used to create a Wikipedia bot, but most typically people rely on the Python pywikipedia framework. --AllyUnion (talk) 01:01, 13 November 2005 (UTC)[reply]
Thank you. CG 12:01, 13 November 2005 (UTC)[reply]

Deletion of page

This is a request from my talk page: --AllyUnion (talk) 01:04, 13 November 2005 (UTC)[reply]


Would it be possible for you to add to sandbot (or some other regularly-run bot) the unconditional deletion of the weather in London? Wikipedia:How to edit a page uses this title as an example of a red link but of course it keeps getting created. -- RHaworth 17:02, 12 November 2005 (UTC)[reply]


The answer yes, but requires community approval. --AllyUnion (talk) 01:04, 13 November 2005 (UTC)[reply]

Why not use a more obscure article title to keep red. If the more obscure article title keeps getting created (which would probably be vandalism to mess around with Wikipedia:How to edit a page) could we protect the page from creation? Using a bot to regularly delete a page is an abuse of Wikipedia resources.--Commander Keane 09:35, 13 November 2005 (UTC)[reply]
The page we link to must be intended as a redlink, and must remain as a redlink. Protecting the page from creation would cause the red link to show up blue. --AllyUnion (talk) 13:29, 13 November 2005 (UTC)[reply]

This is the text from Wikipedia:How to edit a page:


The weather in London is a page that does not exist yet.

  • You can create it by clicking on the link (but please do not do so with this particular link).
  • To create a new page:
    1. Create a link to it on some other (related) page.
    2. Save that page.
    3. Click on the link you just made. The new page will open for editing.
  • For more information, see How to start a page and check out Wikipedia's naming conventions.
  • Please do not create a new article without linking to it from at least one other article.

Sorry about my ocnfusion over protecting a page. Will creation of the page show up on a Watchlist. If so, then a bot is hardly necessary (but won't do any harm), if a couple of admins have the page on their watchlists it will get deleted fast enough.--Commander Keane 05:34, 14 November 2005 (UTC)[reply]

Deleted pages can be placed on watchlists... although, the matter being discussed here is having my user account check the page's existence every so often and go off to delete the page. --AllyUnion (talk) 08:34, 14 November 2005 (UTC)[reply]

This is a new user who appears to be using some sort of automated or semi-automated spellchecker script, which appears to be systematically violating spelling conventions by switching to the British style. If someone could politely handle this matter, it would be appreciated, as I have little technical knowledge in this area. MC MasterChef :: Leave a tip 09:25, 13 November 2005 (UTC)[reply]

We do not allow automated spellchecker bots. We do permit semi-automated spellchecker scripts/bots so long as they include US and internationalized versions of spelling. --AllyUnion (talk) 13:35, 13 November 2005 (UTC)[reply]

User:Bluebot and 1911

This bot has been used by its owner Bluemoose (talk · contribs) to move the 1911 template under the References heading. I disagree with this action, for a number of reasons:

  1. References are usually primary resources, and an encyclopaedia is a secondary resource
  2. References are places you are directed to to check the veracity of the text in the article: in this case, the article is based on the 1911, so you would go elsewhere to check the validity of the 1911 itself.

Is there concensus for Bluebot's actions? This particular activity is not cited in the request (above) which is for other valuable stuff. (I have also asked this question on the template's talk page, and will do elsewhere as well.) Noisy | Talk 10:09, 13 November 2005 (UTC)[reply]

Please read the Wikipedia:Manual of style and Wikipedia:Cite sources, which say to put sources under a references heading. Also, the bot isn't really "moving" the template, it is adding the heading, because at present the 1911 tag often just floats around at the bottom somewhere, which is definately wrong. Plently of articles already use the 1911 tag under a references heading as well. Plus I might add that I have been monitoring all edits this bot has been making. Martin 10:15, 13 November 2005 (UTC)[reply]
Bluebot has been blocked temporary to resolve any issues raised by user Noisy. I also have confirmed that the bot is operating out of its specified parameters, in regards to adding the 1911 template. --AllyUnion (talk) 13:56, 13 November 2005 (UTC)[reply]
Additionally, User:Beland's question was still left unresolved above. --AllyUnion (talk) 13:58, 13 November 2005 (UTC)[reply]

Is there a specific list of replacements that folks can check for potential problems? -- Beland 07:08, 26 October 2005 (UTC)[reply]

The author is requested to resolve all questions, and problems before the bot is blocked. --AllyUnion (talk) 13:58, 13 November 2005 (UTC)[reply]

As is pointed out above, Noisy is completely wrong, there is no issue to resolve. I answered Belands question where he asked it, why do you say it was unresolved? Since then I have moved on to do other things with my bot. I have made ~60,000 edits with my bot with no valid compaints and many compliments. I have unblocked my bot now. Martin 14:12, 13 November 2005 (UTC)[reply]

I have no objection to the bot being used for the other purposes listed above, because I think they are valid and useful. If Martin desists from using it for the 1911 purpose, and proposes its use for 1911 in the proper way in which bot announcements are supposed to be used on this page, then I have no objection to the other work being done, and the bot can be unblocked.
As AllyUnion says, a more explicit listing of the actions undertaken by the bot would be welcome, as would a direct response to the two objections that I have listed so far. Pointing to an old Wikipedia page does not mean that I can't challenge the position, and ask for a new opinion from other Wikipedians before wholesale changes are made. Noisy | Talk 15:09, 13 November 2005 (UTC)[reply]
You don't have to build a concensus to stick to the guidlines, you have to build a concensus to change the guidlines, which is what you seem to want to do. You are allowed to challenge the guidline, but not revert my correct changes, and as it stands, no one agrees with your oint of view so I don't see the guidline changing.
As for listing what I want to do with my bot (note that I update my bot user page occasionally with a list of tasks) I don't mind doing it out of political correctness, but I don't see the point as lots of people don't bother mentioning it every time they do a new task. Anyway here is what I have been doing;
  • Fixing numerous ==See also== and ==External links== errors. (complete)
  • Re-categorising articles per WP:CFD.
  • Subst'ing templates.
  • "Touching" articles on request.
  • Requests from users (such as updating template names).
  • Ensuring 1911 tags are in a ==References== section.

Martin 15:50, 13 November 2005 (UTC)[reply]

Any operation you plan to make that isn't something you originally applied for needs to have a passing mention here. This gives the community some time to prepare for, or improve your idea that you plan to implement. It's an insurance type of thing. --AllyUnion (talk) 23:41, 13 November 2005 (UTC)[reply]
Ok, Note that things such as the subst'ing were discussed on the Subst'ing page, and changes to do with the manual of style were discussed on the manual of style page. thanks Martin 23:44, 13 November 2005 (UTC)[reply]
You still need to make a note here, either a link to the discussion or something. --AllyUnion (talk) 09:36, 14 November 2005 (UTC)[reply]

Notice!

Users using bots should be aware that they should make every effort to run their bots off a server, on a separate IP address. Due to changes in dealing with sockpuppets in the MediaWiki software, a block of a user with a bot that runs off the same computer that the user edits, will cause the user unable to edit. --AllyUnion (talk) 23:57, 13 November 2005 (UTC)[reply]

..."changes"? I thought autoblocking always worked like this. (Or at least, as long as I can remember.) In any case, this is more motivation to make bots halt if their talk page is edited, as mentioned at Wikipedia:Bots#Good form. —Cryptic (talk) 00:47, 14 November 2005 (UTC)[reply]
Does the pywikipedia bot stop when the talk page is edited?--Commander Keane 05:05, 14 November 2005 (UTC)[reply]
No. At the current moment, it does not. Although it does check whether you have new messages... --AllyUnion (talk) 09:37, 14 November 2005 (UTC)[reply]

Proposed cricket bot

First, I want to apologise for running a semi-automated edit tool from my own account without even realising that there was a procedure for seeking approval. Someone soon put me right on that one, so I've stopped using it and come here.

I want to run a program that helps me correct common errors in cricket articles by identifying candidate errors for me to consider. I'm not even sure whether this counts as a bot. One proposal earlier on this page suggests to me that some people might not consider it to be a true bot, although I would tend to think of it as one; but let me give you more details.

So far, I've been running it to correct "test" to "Test" (in the sense of Test match, an international cricket match). The bot is written in Perl and it works as follows:

  1. User (me) launches the program
  2. It reads List of international cricketers, and downloads the first few that it hasn't read before
  3. It finds ones with "test" in, changes "test" to "Test", and pops them up in a browser for me to read
  4. When it's found 20 candidate articles, it stops
  5. Then I go through the suggested changes and commit the correct ones

So it doesn't make any edits itself. I run it occasionally, it downloads maybe fifty pages, I make twenty edits myself in the space of a minute or two, and then I wait a few minutes or a few hours before running it again. Have a look at my last 500 contributions for the typical impact.

My ideas for future tasks that this could do

  1. Link "Test" and "Test match" to Test cricket, not to Test match or Test.
  2. Make sure all short biographies have {{cricketbio-stub}}
  3. Add country-bio-stub to all cricketbio-stubs (ask on country page first, because this could generate a lot of new stubs in their lists).
  4. Find articles entitled "N (cricketer)" but not linked from "N".
  5. Find very short (sub-stub) articles, because an anonymous user has recently been making a lot of these.

We currently have something over 1500 cricket biographies, and a few hundred other cricket articles, so a program like this is a natural solution to improve conformance to WikiProject Cricket agreed house style.

So my questions are

  1. Is this a bot that requires approval?
  2. If so, may I have approval?
  3. Even if not, should I run it as a separate user instead of as me, and should that user have the bot flag?

Thanks very much. I hope that's enough information, but please ask if you need to know anything more.

Stephen Turner 12:54, 15 November 2005 (UTC)[reply]

We also need a bot to change everying in categories such as Category:English test cricketers to Category:English Test cricketers :) jguk 13:41, 15 November 2005 (UTC)[reply]
That should go on WP:CFD as a speedy request, I can sort that out with my bot. '''''Martin''''' 13:52, 15 November 2005 (UTC)[reply]
jguk: Agreed, although I think that would be a job for a bot that committed its own edits (there are too many of them, and no risk of false positives), and I'm not sure I want to get into that yet. My program is deliberately ignoring those at the moment. Stephen Turner 13:54, 15 November 2005 (UTC)[reply]
This script looks pretty safe, as you are manually checking all the edits I don't think anyone can complain. '''''Martin''''' 13:52, 15 November 2005 (UTC)[reply]
Approved for test run for one week. --AllyUnion (talk) 21:41, 15 November 2005 (UTC)[reply]
Thanks very much, but I'm still unclear about one thing. Should I make the edits from a bot account or from my own account? Given that the bot is not making the edits itself but just suggesting edits to me. Sorry to be slow, but I want to be sure what the correct practice is. Stephen Turner 08:04, 16 November 2005 (UTC)[reply]
Bot account. --AllyUnion (talk) 21:14, 16 November 2005 (UTC)[reply]

Purging modification

I am requesting that AFD Bot to purge the page cache of Wikipedia:Articles for deletion at 00:01 UTC every day. --AllyUnion (talk) 21:53, 15 November 2005 (UTC)[reply]

2005 English cricket season redirects

I have listed a series of redirects that start with "2005 English cricket season/" here: Wikipedia_talk:WikiProject_Cricket#Redirects_that_need_fixing. Do you want a bot to go and remove the pages linking to them, then mark them for speedy deletion? --AllyUnion (talk) 18:18, 14 November 2005 (UTC)[reply]

Yes please, much appreciated. If you could turn Yorkshire v Worcestershire 7-10 September 2005 and similar pages to redirects as well, that'd be lovely - the list is here. Sam Vimes 18:43, 14 November 2005 (UTC)[reply]
Redirects to... where? --AllyUnion (talk) 07:20, 15 November 2005 (UTC)[reply]
Either the home team's page, or the period pages (1-14 June and so on). Whichever is easiest to program Sam Vimes 08:05, 15 November 2005 (UTC)[reply]

Kakashi Bot will be used to perform this task.

  1. Take the list of Wikipedia_talk:WikiProject_Cricket#Redirects_that_need_fixing, and make certain no pages link to it (except for the pages Wikipedia_talk:WikiProject_Cricket and any AFD subpage), and mark the redirect for deletion. As it requires manual checking, whether or not the bot did the task correctly, I am inclined not to auto-delete. Will use: {{db|Requested deletion via bot per [[User:Sam Vimes|]]. The bot is suppose to remove everything linking here. Also specified as a task at [[Wikipedia talk:Bots]]. Please check almost nothing [[Special:Whatlinkshere/{{PAGENAME}}|links here]] before deleting.}}
  2. Attempt to redirect articles in Wikipedia:Articles_for_deletion/Cricket_matches_articles/list to an appropriate page. (season date) --AllyUnion (talk) 22:04, 15 November 2005 (UTC)[reply]

Inter-wiki image harvesting/diffusion?

I had an idea this evening - what about a bot or similar program (perhaps running on a local copy of wikipedia). It would cross reference between articles in different languages, looking for images that are in one, but not another.

Images frequently break the language barrier, and who knows what hidden gems might be found using this methodology? I propose that the results of this program would be put in a Wikipedia namespace page, so that individuals could work on cross-dissemination of images in a decentralized fashion (in the spirit of wikipedia).

A much more aggressive approach (I'm not sure what I think about this) would be to have a bot take this output and tag the bottom of every page for which an image was available in another language.

What do people think? I wouldn't mind providing some of the programming talent for this, but this kind of wikiwide action would need some pretty hefty official support. - JustinWick 01:33, 17 November 2005 (UTC)[reply]

Do you mean the images that reside on WikiCommons or just images that are missing in general? Please remember, there are fair use images on the English Wikipedia that can not be transferred to other projects. Also, we have Commons images, which are used across several projects. --AllyUnion (talk) 03:46, 17 November 2005 (UTC)[reply]
Well, Fair Use is a very gray area (at least in the united states) but... I am only talking about taking images from, say, the English version of wikipedia, say for an article on robotics (how apropos) and suggesting its inclusion in articles about robots in other languages of wikipedia (if they don't have it already). This way if any language gets a nice image for an article, there's a good chance that it can be reused in other languages. It seems unlikely to me that this transfer would be considered illegal (wikipedias of all languages have the same legal status, right?) but I admit I am no legal expert! - JustinWick 16:26, 17 November 2005 (UTC)[reply]
I had a similar thought to this as well, it would be best done off-line on the database dumps rather than by bot. I imagine it would look at the inter language links on an en article, then see if any of the linked to foreign articles (or a maybe just the fr and de articles) have more images than the english one, and generating a list of these articles, and the missing images. Generating a list off-line would not need any community support, and would make a nice community project. Hope that makes sense. Martin 16:46, 17 November 2005 (UTC)[reply]
I agree that that's the most computationally efficient way to do it, that's actually, in my mind, the easy part. To me the hard part is - people in other languages are trying to improve their articles, how do they know someone has compiled this great list? Moreso, how do they know the particular article they are looking at has some alternate images? - JustinWick 17:05, 19 November 2005 (UTC)[reply]
Such suggested images are okay for suggestions, compiled as a report from an offline database dump and should rather be placed on meta or something, and so long as the suggested image has a free license. However, it be more beneficial for suggestion images to be moved to the Commons and placed on a gallery on the subject. --AllyUnion (talk) 16:54, 19 November 2005 (UTC)[reply]
I can see it being put in the Commons, however the big thing I'm worried about is, how do people editing articles know this resource is available for a particular article? Maybe if there was some way of getting a Wikipedia Project started for media cross-distribution, such that people in different languages would just take articles off the list and put whatever images are appropriate into each article for which they are available. (marking them off when they are done). - JustinWick 17:05, 19 November 2005 (UTC)[reply]
Perhaps you can link to the commons page on the talk page of the article which the two subject matters relate to? Or use one of those boxes that says, "The Commons has more images on <subject>" --AllyUnion (talk) 09:32, 9 December 2005 (UTC)[reply]

Shark articles unifying bot

I request to run a bot, this is answers to the questions to be answered when requesting to run one.

  • The bot will be manually supervised
  • The bot will run while there is suitable task for it to do, there is no fixed set of tasks and therefore no fixed time line as of now.
  • pywikipedia
  • The first purpose is to unify the shark articles, to put fishbase, ITIS, marinebio links using the correct templates, if this works out (I have not run any wiki bot before so not sure how complicated it is) I might want to do other work to unify the articles later, and change/add the same templates for other fish articles also. I expect to be able to find the scientific name from tax box and add fishbase template autoatically, need to think a bit more about other references.
  • There are about 30+ shark articles now, a bot would make things much faster (I expect)

I will run the bot under user User:StefanBot, Stefan 04:08, 19 November 2005 (UTC)[reply]

Please create a user page for the bot. --AllyUnion (talk) 10:17, 23 November 2005 (UTC)[reply]
Talk page created Stefan 12:23, 28 November 2005 (UTC)[reply]
1 week trial run permitted, may apply bot flag after that. Please make certain your bot is listed in the Category:Wikipedia bots, and on Wikipedia:Bots. --AllyUnion (talk) 10:49, 15 December 2005 (UTC)[reply]

Bot to create French administrative division info

I'd like permission to use a bot for this. There is some discussion at Wikipedia:WikiProject French communes and you can see some of the initial articles at [13] (machine generated but input by hand). I'm proposing adding arrondissements initially and then moving on to cantons and communes. I'll run it under User:Dlyons493Bot. It'll be manually supervised initially until I'm happy it's running OK. Currently working in the pywikipedia framework. I ran a test on three articles and it seems to be OK - see Arrondissements of the Eure-et-Loir département Dlyons493 Talk 18:43, 20 November 2005 (UTC)[reply]

Please create a page for the bot. --AllyUnion (talk) 10:19, 23 November 2005 (UTC)[reply]
Um... is there an English name for these pages? "Arrondissement" is not a word anyone would likely type normally. --AllyUnion (talk) 09:56, 25 November 2005 (UTC)[reply]
'Fraid there isn't really an alternative - see e.g. Arrondissement in France. Mostly people will arrive at these as links from other articles anyway. And it's more typeable than Transitional low-marsh vegetation with Puccinellia maritima, annual Salicornia species and Suaeda maritima which used to be an article :-) Dlyons493 Talk 21:17, 25 November 2005 (UTC)[reply]
Approved for one week trial period. --AllyUnion (talk) 10:28, 27 November 2005 (UTC)[reply]
Thanks. Dlyons493 Talk 15:11, 27 November 2005 (UTC)[reply]
I've extended this to add individual Arrondissements e.g. Arrondissement of Brioude and it seems to be working OK. I hope to move on to individual cantons and communes over the next few weeks. Can I apply for an extension of the permission period? Dlyons493 Talk 13:53, 4 December 2005 (UTC)[reply]
Oh, if you haven't any complaints for the first week run, you may apply for a bot flag. --AllyUnion (talk) 09:20, 9 December 2005 (UTC)[reply]

One time run request - Kakashi Bot; Text replacement

(copied from User talk:AllyUnion.)

Am I correct in assuming that one can request jobs to be done by your bot here? I'm currently trying to orphan Image:Flag of Czech Republic.svg and replace it with Image:Flag of the Czech Republic, but the amount of pages it is used in (especially English, French, Spanish wikipedias) is enormous... File:Austria flag large.png ナイトスタリオン 20:01, 24 November 2005 (UTC)[reply]

I don't understand the difference between the two. --AllyUnion (talk) 23:34, 24 November 2005 (UTC)[reply]
For countries/regions with the terms "Republic" or "Islands" in their short name, correct grammar is "Flag of the Czech Republic", not "Flag of Czech Republic". Image names are expected to follow the same naming rules as article, AFAIK, so this grammar mistake should be corrected, and it'll be much easier with a bot. Furthermore, I've managed to upload basically all the national flags in svg format, and those should replace the earlier png versions, so the bot could be put to further use... Sorry if I'm getting on your nerves, but your bot seemed to be available for tasks. File:Austria flag large.png ナイトスタリオン 00:04, 25 November 2005 (UTC)[reply]
I dont mind doing it, thats if we have agreement that it should be done. thanks Martin 15:21, 27 November 2005 (UTC)[reply]

User:Chlewbot (interlanguage bot)

I am asking for permision to run User:Chlewbot under a bot flag. It's primary goal is to check, add and fix interwikis originated at Spanish language Wikipedia.

Thank you.

Carlos Th (talk) 21:16, 29 November 2005 (UTC)[reply]

Are you using the pywikipedia framework? Please specify you are, or using something else. --AllyUnion (talk) 09:04, 9 December 2005 (UTC)[reply]
Yes. I am using pywikipedia framework.
Carlos Th (talk) 12:34, 10 December 2005 (UTC)[reply]
Approved for a trial test run of 1 week. Please keep your bot updated with the latest CVS code. May apply for bot flag after trial test run if no complaints. --AllyUnion (talk) 09:55, 11 December 2005 (UTC)[reply]

AzaBot

I would like to be allowed to run a manually assisted bot for the sole purpouse to touch pages to find obsolete templates, and eventually change templates calls to an other. AzaToth 00:03, 1 December 2005 (UTC)[reply]

Clarification, what I ment by change template calls, is mostly to subst: them. this for tfd semi-orphant template AzaToth 00:57, 3 December 2005 (UTC)[reply]

Please create a page for your bot. What is the reason behind doing this? --AllyUnion (talk) 09:05, 9 December 2005 (UTC)[reply]
Sorry for stepping in. Currently, I'm replacing calls of template:if to the server friendlier template:qif (AzaToth is the original inventor of qif).
We are currently accused for server strain by some Wikipedians at WP:AUM and I feel the technology behind if/qif is a bit "at stake" at the moment. There is a possibility that lot's of templates, among which are template:web reference and template:book reference are abolished due to pressure originating from WP:AUM.
"What links here" of template:if brings up a list of about 20'000 articles but I estimate that about 80..90% of the listed pages in fact do not use if as we are in the consolidation process to qif. So it is quite hard to find the real uses of template:if.
I've been working together with AzaToth around if/qif and templates "web reference" and "book reference" and others. I can affirm he is a nice person and he abides be the rules of wikipedia. I have just asked User:Bluemoose for a touch run on this, but I personally would think it would be good if AzaToth would be enabled to run a bot for touching so we might be able to lighten Bluemoose a bit from such requests, although I can say he's very kind and helpful. If AzaBot would get green light, I would ask AzaToth for such touch runs. – Adrian | Talk 21:58, 17 December 2005 (UTC)[reply]

I operate User:NetBot which can assist with these tasks. Contact me on my talk page. -- Netoholic @ 07:38, 18 December 2005 (UTC)[reply]

I oppose this bot - meta-templates ought be fully deprecated, not bot-inserted. Phil Sandifer 07:51, 18 December 2005 (UTC)[reply]

Does this also mean you oppose exchanging calls of template:if with the server friendlier template:qif? Please note that template:if is a meta-template whereas template:qif is not. If if is replaced by qif in a template X, then that does not change the "meta-template" state of X. In fact, one template level is eliminated. – Adrian | Talk 13:03, 18 December 2005 (UTC)[reply]

KaiserbBot

I would like to request a bot flag for a manually assisted bot running on the English wikipedia to assist in fixing a variety of firearms, woodworking, and other pages with numerous redirects, and double redirects. Kaiserb 02:16, 1 December 2005 (UTC)[reply]

You'll have to be more specific of what you plan to fix. --AllyUnion (talk) 09:31, 9 December 2005 (UTC)[reply]
When testing the bot it was used to disambiguate a number of pages from Wikipedia:Disambiguation pages with links. The bot performed well for this task and would continue to disambiguate pages. Additionally I would like to use it to solve disambiguation and redirects on specific firearm pages. There are numerous links on each firearm page to the particular ammunition used by said firearm. One that is a particular issue is 9 mm, 9 mm Luger and 9 mm Parabellum, that all link through a redirect pointing to 9 mm Luger Parabellum. In the first case 9 mm could refer to (9 x 18 mm), (9 x 20 mm SR), 9 mm Glisenti, (9 x 19 mm), (9 x 21 mm), (9 x 23 mm Steyr), or (9 x 23 mm Largo) while 9 mm Luger and 9 mm Parabellum are both 9X19. It would be nice to clean up this and other ambiguous ammunition links to tidy up these pages. --Kaiserb 05:14, 10 December 2005 (UTC)[reply]
Approved for an one week trial run. If no objections made, may apply for a bot flag. Please remember to leave a note on this page if you plan to change the scope of what your bot does. Also, please be more descriptive on your bot's user page... like listing the same information that you described in detail here. --AllyUnion (talk) 10:41, 10 December 2005 (UTC)[reply]

OrphanBot

I would like to run User:OrphanBot to orphan images in Category:Images with unknown source and Category:Images with unknown copyright status in preparation for deleting them. The bot would go through the categories, and for each image, it would replace the image tag with &nbsp;, to keep from breaking table layouts. It would then record on the image description page which articles it has removed the image from.

For images in articles in the main namespace, the Category namespace, and the Portal namespace, it would remove them. Images in articles in the User:, Talk:, User talk:, Template talk:, Image:, Image talk:, Category talk:, Wikipedia talk:, and Portal talk: namespaces would be ignored. Images in articles in the Wikipedia:, Template:, Help:, and Help talk: namespaces would be logged to the bot's talk page for human review, since they shouldn't have been there in the first place. --Carnildo 09:11, 1 December 2005 (UTC)[reply]

There is a conflict of interest in the regards with notification bot on the removal of the link to in User talk space, unless the only thing you plan to do is to change it from an image to a linked image. Furthermore, removal of links may cause problems in the Wikipedia namespace when it comes to WP:IFD. --AllyUnion (talk) 09:29, 9 December 2005 (UTC)[reply]
I specified that it doesn't touch anything in the User: namespace, or in most of the talk namespaces. Also, it doesn't do anything about images that have been linked, rather than inlined, so IfD is safe.
Also, I've made a slight change in how it removes images. It now replaces them with HTML comments, to make seeing what it's done easier. --Carnildo 09:42, 9 December 2005 (UTC)[reply]
I feel safer with it commenting it out, that way people understand there WAS a picture there, and something should replace it... rather than leaving a missing image notice. --AllyUnion (talk) 10:37, 10 December 2005 (UTC)[reply]
How's this? [14] [15] --Carnildo 22:20, 11 December 2005 (UTC)[reply]

Hmm. Can we tag these with a category? I go through runs asking people to delete images, and I generally start with Orphaned images advertising to prospective deleters that the probability of someone complaining about the deletion is fairly low... But that is only true for naturally orphaned images. Also be aware that if you do this to fair use images it is going to cause them, ultimately, to become targets under the fair use CSD. (course, they are already CSD so...). --Gmaxwell 03:44, 12 December 2005 (UTC)[reply]

What sort of categorization are you thinking of? The bot already puts a list of what pages the image was removed from on the image description page, see [16] for an example.
Any fair-use image that this bot removes is already a CSD, because it was unsourced. Making it an orphaned fair-use image won't change things one bit. --Carnildo 06:54, 12 December 2005 (UTC)[reply]

SuggestBot, matching people with pages they'd edit

Hi, I'm doing some research to try to help people find pages to edit -- in particular, stubs that they might be willing to contribute to. We're doing some information retrieval stuff on a dump of the database to come up with algorithms for predicting edits based on other edits, and eventually we'd like to see if we can help real humans find articles they'd like to work on.

I am considering writing SuggestBot, a bot that selects a set of users, looks at their contribution history, picks a set (10-100, not sure yet) of stub pages they might be willing to edit, and posts that set of pages to the user's talk page. All processing except posting would happen on our lab's servers, using dumps of the Wikipedia database.

1. Whether the bot is manually assisted (run by a human) or automatically scheduled to run

  • This bot would be manually assisted

2. The period, if any, we should expect it to run

  • Sometime in January for a few weeks, more if it produces favorable results.

3. What language or program it is running

  • Not yet developed, but likely a standard framework for editing Wikipedia pages.

4. The purpose of your bot

a. Why do you need it?

  • To test our algorithms for recommending work to do in Wikipedia with real people, to help reduce the number of stubs in Wikipedia, and to help increase the amount of content and the value of Wikipedia.

b. Is it important enough for the Wikipedia to allow your bot?

  • I think so. A recent paper we did showed that 4 times as many people were willing to contribute to a movie database if we picked movies they had seen compared to other plausible strategies like picking recommended movies, random movies, or movies that appeared to be missing information. Showing these techniques work on Wikipedia could lay the foundation for increasing contributions to Wikipedia (and other online communities where people build community resources) and help to reduce the number of stub articles in Wikipedia.

The slightly scary part to us is that we modify user talk pages. I'm not sure how people will react, and I'm looking for community guidance. NotificationBot alters talk pages, but at a user's request. I wonder whether/how much this would be perceived as spam.

-- ForteTuba 23:35, 1 December 2005 (UTC)[reply]

Interesting. I have some questions :
  • How does it work? Does it looks at the totality of what one user has edited, and the totality of what other users have edited, and then identify similar people (giving extra weighting to those who are extremely similar, and less to those who are semi-similar), and come to some list of recommendations based on what the people most like you have edited, but you have not? If so, I have wondered (purely conceptually) about doing the same thing for music. Basically we each like to think we unique - but we're not, and often specific people (who have never met, and who live in completely different locations) will have very similar tastes. For example, I have a good friend who lives in Sydney and who has musical tastes totally unlikely anyone else I know, who one day, purely-by-chance, found someone in the US, who had published their extensive list of music they owned on the Internet, and it was almost a verbatim match for his music collection. At first it kind of freaked my friend out, but then he promptly went out and bought those items this other person had and had liked, and he really enjoyed that new music. It saved him hours of research, of trial-and-error purchases - basically he could just go straight to the good stuff. He still checks back on the other guy's music collection, and will buy anything this other guy likes, because he knows that the chances he'll like it too are extremely high, because the totality of their musical tastes are extremely similar. Now, normally it's hard to do this for music because we just don't have the data (e.g. How do I know what music you've tried? How do you know what music I have tried? How do I know what you liked? How do you know what I liked?). However, with the Wikipedia, we do have some hard data, based on past edits. So, my question is: Is this how your approach will work (look at the totality of what I've edited, the totality of what you have edited, and if we're a very good match, then recommend to you some of the articles I have edited and that you have not yet edited?). Or is something else?
  • Spamming: Some users probably will see this as spamming. Better maybe to allow people to request this (at least at first), and give them some pages where they can provide feedback, both good and bad (e.g. positive feedback and negative feedback). You'll know after you've got 50 or 60 bits of feedback whether you're onto a winner or not. I'd certainly be willing to consider taking part in an early stage trial. -- All the best, Nickj (t) 00:56, 2 December 2005 (UTC)[reply]
We're testing a bunch of algorithms and variations against edit histories in the dump. Some are collaborative filtering style, like what you describe above (there are music recommenders, like AudioScrobbler/last.fm that are said to use CF. Amazon does this for many/all of its things, and MovieLens is our CF research site). There are a bunch of CF-based algorithms and we're going to try a number of them. We're also doing some search engine-style matching, using your edit history as a query against the wikipedia database. We won't necessarily use all edits, we're trying to figure out whether we can tell you're particularly interested in a topic by looking at characteristics of edits (length change, is it a reversion, marked minor, etc.) and maybe only use some of your edits to build the query/your profile. We'll actually deploy the algorithms that seem to do best in our offline tests.
Very good! I like that edits marked as minor (or where the amount of changed text is quite small) will be excluded. For example, as part of the Wiki Syntax Project, I frequently make small edits to topics that I personally don't care about, solely to clean up their wiki syntax. Also, I'm checking out Last.fm now - I had never heard of it before, so thank you for that! I'll also read over the collaborative filtering page (up until now I had only thought about the idea in a "wouldn't this idea be cool?" kind of way, and never knew before what the correct term was). One thing that may help you is if you can get users to show you their watchlists (if something is on my watchlist, chances are pretty good that I care about it), but since I don't think you can access this, it might be best to start off just using a user's edit history.
I was excited about watchlists until I posted on Wikipedia:Village pump asking people to mail them to me and someone said "you know, I put a lot of pages I hope get deleted on my watchlist". So, edit history it will be. We might come up with an interface that allows people to type in some arbitrary text as well, not sure yet.
On second thoughts, I agree with not using watchlists. I've reviewed the things on my watchlist, and I'm not so sure that they are representative of my interests (at least some things are listed because they were things I thought that should be deleted, some I thought should be merged, and some were things that I can no longer recall why they were originally watched, etc). Significant edits are probably the best indicator, as you say. -- All the best, Nickj (t) 01:26, 8 December 2005 (UTC)[reply]
When you say request, how do you mean? Would the bot just put a link that says "hey, we can find you more things we think you'll edit, click here" on user talk pages so it's not as intrusive as just dropping a bunch of links on someone's page? Or are you talking about trying to put a link in a public place? Someone suggested changing a welcome template up above... wasn't sure other Wikipedians would go for that. Thanks for the thoughts. -- ForteTuba 12:37, 2 December 2005 (UTC)[reply]
Well, I think you've got two issues. The first is, does it work and do people like it? For this, to start with, you might want to put up a notice at the Wikipedia:Village pump and the Wikipedia:Announcements page saying "we're running a trial and would like some participants" to get some people to test it and see what they think. This would be an opt-in type of thing (e.g. get people to add their names to a User:SuggestBot/sign me up for the trial page), so none of the participants will complain about spamming, and you're bound to learn something from the feedback that people give (both what they like about it, and what they don't like about it, and you should fine-tune it based on that feedback), and once you've done a trial with public feedback then people will be far more receptive on the second issue. The second issue is once you've got a working system, how do you tell people about articles they might like, without it being considered spamming? The danger for you is that if even a handful of people consider it to be spamming and complain, then your bot is likely to get blocked (just telling you the way it is - basically the onus is on you to structure it so that people will not get annoyed by this). For telling users about articles they maybe interested in, I think you have three categories of users:
Posting asking for watchlists on the pump didn't work very well, I've gotten 5 in two-ish weeks, but I agree, getting people to opt in would be nice. Maybe the bot could create subpages of user pages (or of user talk pages) that contain suggestions and then put a polite little note on the talk page. Not quite as intrusive as an in-your-face list of suggestions. Opting in via community pages mostly only works for experienced users... which dovetails nicely with your next.
  • Existing established users (these you should probably require to opt-in, and should not add to their talk pages unless they opt, as they are the most likely to complain loudly and extensively, and are the most likely to have already found the topics they are the most interested in).
Heh, I agree about both points.
  • Anonymous/not logged in users who use an IP address (you should probably not do anything with them, as they may be existing established users who have not logged in, or they may be a user who makes only one edit and is never seen again).
Also agreed.
  • New users who have a named account that's quite new and who have only a few edits (and for these users you could consider adding something to their talk page). For example: "Hello, I am the SuggestBot, and I notify new users about articles that they may be interested in, based on the edits that they make. I have included a list of the top 10 articles that you have not edited yet, and which people who edited the articles you have edited also liked. My suggestions are: (insert ranked bullet list of top 10 suggestions). I hope that you like these suggestions, but if you don't then please don't be alarmed because I will not leave you any more messages (unless you explicitly ask me to at my opt-in page). If you have any comments on this list, both good and bad, we would love to hear them at (insert link to positive feedback page and negative feedback page)."
Hope the above helps. Personally I think the approach outlined above will probably work, but others may have different opinions. -- All the best, Nickj (t) 02:20, 5 December 2005 (UTC)[reply]
This is close to what we were looking at. Opt-in for experienced via community portal and a polite page creation bot that samples infrequent/new editors might be a winner. Thanks for your thoughts, let me know if you have more. -- ForteTuba 21:55, 6 December 2005 (UTC)[reply]
All sounds good, but just a quick thought: if you're going to create talk subpages, you might want to not put them under the user's talk page, but rather under the SuggestBot's talk page (e.g. User:SuggestBot/suggestions/Nickj), and then put a quick note on the user's talk page linking to that. From a technical perspective, it makes almost no difference whether it's stored under the user's page or SuggestBot's; but from a human psychology perspective, they're very different things: one is a bot walking into "my" space and messing with it, whereas the other is the bot making a list in its space and then inviting me into that space to review the list. It's quite possible to get annoyed by the former, whereas it is much harder to get annoyed by the latter. -- All the best, Nickj (t) 01:26, 8 December 2005 (UTC)[reply]
That is a super-good idea. -- ForteTuba 03:45, 10 December 2005 (UTC)[reply]
I think this is a great idea, but I wanted to note that merely looking at who has edited an article, and perhaps the number of edits they made, isn't terribly indicative. What really indicates interest is substantial edits, where they have contributed significant new content or offered up a detailed explanation in the edit history or talk page. For example, many users performing disambiguation will edit hundreds or even thousands of pages they really don't give a damn about.
Also, keep in mind that unlike Amazon, our articles are strongly interlinked (some say you can't find two articles more than 6 links apart if you tried). These links, along with properties such as belonging to the same category/list, could be a valuable way of finding related articles once you've established the sort of article a person likes.
One last comment: don't recommend articles people already know about if you can. This would be really annoying. :-) Deco 06:16, 8 December 2005 (UTC)[reply]
Definitely agree suggesting things that the user has already edited in any way, shape or form is bad. Categories could be useful; Also articles that link to an article that the person has edited, but which are not backlinked, could be good (e.g. you've extensively edited A, and B links to A, but A does not link to B, and you have not edited B in any way; In this situation, you may like B, but not know about it). Could be more a "future directions" thing though. P.s. Off-topic: For a proof-by-existence of two articles being more than 6 degrees apart, see Gameboy to Maine Coon, which are at least 10 steps apart (but in one direction only). -- All the best, Nickj (t) 07:02, 8 December 2005 (UTC)[reply]
We were thinking of using local link structure to edited articles as a reasonable baseline recommender. We've thought about building one that uses categories to help cluster edits and make sense of interests; that is probably a Next Step compared to fielding a few plausible and easy-to-construct ones and seeing how people react first.
As for the substantive edits, we're trying both using all edits and trying to filter out non-substantive ones. A naive, ad-hoc approach to filtering non-substantive did pretty poorly compared to using all edits on the offline data, but it's hard to say how people would react without trying it both ways. (And, there are other strategies for filtering edits that we haven't tried yet.) It's a good suggestion. -- ForteTuba 03:45, 10 December 2005 (UTC)[reply]

Okay, I sincerely recommend that this be not a bot, and rather be a tool like Kate's contribution tool. We should have the ability to just simply load, based on a user's name, what pages we should attempt next. This provides it to anyone who wants to use it. You can even sign up for a m:Toolserver account to place your new script tool there, since the Toolserver has access to database dumps anyway. --AllyUnion (talk) 09:27, 9 December 2005 (UTC)[reply]

Interesting idea, why do you recommend so? My initial reaction: I don't know enough about the toolserver to know if this is a plus or minus at this stage. Long run it's probably a win, but currently some algorithms are slow and would be hoggy and not suitable for interactive use (think many-word queries against the searchindex). Also, we would probably be happier in our research hearts in the short term to control who gets recommendations when, for the purpose of getting more generally useful and more likely valid results. I definitely think in the long run it would be nice for people to be able to get results interactively and of their own volition. FWIW, the bot wouldn't be running continously: we'd probably pick a set of users, run the algos offline for them, and put up the suggestions as a batch, then get a dump a week or two later to see what happened. -- ForteTuba 03:45, 10 December 2005 (UTC)[reply]
There is one problem that I have not yet worked out of the NotificationBot, that is how to deal with user talk pages that have been redirected cross project (i.e. a person may have a meta redirect) - the closest answer I have so far is to allow the site to do the work for you, but that would involve a lot of correction and assumptions in the pywikipedia code.
Huh... yeah, that could be annoying. Is it painful to chase the redirects yourself?
That aside, it's my feeling that you could easily create some kind of tree of articles for each article on the Wikipedia. So if we have an article on A, it follows that if B relates to A, then it should be in the set S1. (Then we continue to build different sets of S for each article in the Wikipedia. I suppose somewhat on the idea of Nick's Link suggestion bot...) I figure that you could basically create a network for what relates to what article using trees and such. Then when a user queries against his own name, you could then pull up a list of articles he or she has edited, pull up the trees, and generate a list based on tree relationships. Does that make any sense to you? --AllyUnion (talk) 10:34, 10 December 2005 (UTC)[reply]
Yes, it makes sense. There are a lot of ways to define "relates": text similarity, pairs of pages that are edited by the same person, link structure, categories, and more I'm sure. For now we're going after the first three:
  • text similarity because it's so obvious to try (and is okay in our offline experiments, though slow)
  • co-editing because collaborative filtering is effective in lots of domains (and appears to do fairly well here in limited dump-based experiments, and is fairly fast)
  • a simple link-based relatedness algorithm because it's both reasonable and something that you could imagine people doing manually (still to be built)
As for building a relationship structure (trees are one way to do it), the relationships get represented differently for each. In the text similarity case, we use mysql's built-in text indexing because it's dead simple. For co-editing, we use revision history info from a full dump, lightly processed and text removed, to establish relationships between editors (and then to articles). Eventually I want to implement it using the revision table directly. For link-based relatedness I'm looking to use the pagelinks table. I'm hoping that with just a little optimization the last two will be quick enough to run online so we won't have to build the whole relateness model: time-consuming and space-expensive. -- ForteTuba 16:04, 11 December 2005 (UTC)[reply]

This bot has been adding category tags incorrectly, so I've blocked for 3 hours and notified the author. I'm going offline now, so I hope an admin can keep an eye out to see if the problem is A) fixed before the three hours is up, so that the bot can be restarted, or B) the block extended if the bot carries on after 3 hours with the same errors. Cheers! — Matt Crypto 01:59, 4 December 2005 (UTC)[reply]

Yes, my bot went nuts. I was running it in semiautomatic mode checking from time to time how it was doing. Due to a bug (which I meant to be a feature) in my code, it started spitting garbage. Now the bot is stopped, and I don't plan to use it until I figure out the problem (it screwed up the last 32 of the 354 articles it changed, and I reverted all of those and checked a bunch more). Sorry for that, I will pay much more attention on this kind of things from now own. Oleg Alexandrov (talk) 02:26, 4 December 2005 (UTC)[reply]
Given the above comments, I unblocked User:Mathbot. -- Jitse Niesen (talk) 03:42, 4 December 2005 (UTC)[reply]

CricketBot bot flag

I originally proposed CricketBot further up this page. It has now been running for over two weeks with no complaints, and positive feedback from WikiProject Cricket, so I've applied for a bot flag at m:Requests for bot status#en:User:CricketBot. Thank you. Stephen Turner (Talk) 10:25, 4 December 2005 (UTC)[reply]

Pfft Bot, again

I haven't run Pfft Bot much in a while, but have recently started using it again. It still has no bot flag. Anyway, I got a request from Natalinasmpf to upload ~300 (actually ) small Xiangqi related images. They are in PNG format. I'm told they're a total of around 3 megabytes, but I have not recieved the images yet. I have source and licensing information on them (gfdl-her). Additionally, I'm assuming upload.py in pywikipedia will do what I want? --Phroziac . o º O (mmmmm chocolate!) 03:24, 5 December 2005 (UTC)[reply]

Ok, now she wants me to upload more, for {{Weiqi-image}}, to upgrade the images. Am i going to need to post here for every run I want to do? It's not like i'm uploading copyvios or anything :) --Phroziac . o º O (mmmmm chocolate!) 03:46, 5 December 2005 (UTC)[reply]
Dude, upload them to commons...there's no reason not to--Orgullomoore 08:48, 5 December 2005 (UTC)[reply]
I thought about that last night, and well you're right. I put a similar request on the commons village pump a few minutes ago. :) --Phroziac . o º O (mmmmm chocolate!) 16:39, 5 December 2005 (UTC)[reply]

By the way, I just noticed pywikipedia has a script to copy images to commons and put a NowCommons template on wikipedia. Any objections to me using that? --Phroziac . o º O (mmmmm chocolate!) 00:36, 6 December 2005 (UTC)[reply]

None from me. --AllyUnion (talk) 09:22, 9 December 2005 (UTC)[reply]

New bot!

I have made a new bot, well its not necessarily a bot as it is primarily designed for semi automatic editting, there are many features to come, but if anyone wants to see what it is like then I would really like some feedback, let me know if you want a copy. See User:Bluemoose/AutoWikiBrowser for more details. Martin 21:33, 8 December 2005 (UTC)[reply]

This would be more classified as a tool, rather than anything else... so long as this tool doesn't perform high speed edits and makes a user logs in under themselves, then we're okay. --AllyUnion (talk) 09:21, 9 December 2005 (UTC)[reply]
It can function as a bot if you want. At the moment it is lacking some functionality that pywikibot's have, but ultimately it should be more powerful, and much easier to use. Martin 12:41, 9 December 2005 (UTC)[reply]
I rather not have it function as a bot. Unless you can built in safety functions to prevent the tool to be abusive and make certain that part of functionality is approved on a per user basis, I rather not you make it easier to make an automatic editor which may have the potential to damage the site rather than help it. Furthermore, please ensure that you add throttles to your code, as it is an apparent dev requirement. --AllyUnion (talk) 10:12, 10 December 2005 (UTC)[reply]
It does have a throttle, and also requires users to be logged in and their version of the software enabled here. I will disable the bot function on the freely available version anyway, only available on request. Remeber that anyone could use a pywikibot which is potentially just as dangerous. Martin 10:48, 10 December 2005 (UTC)[reply]
The pywikipedia bot framework has some built-in restrictions to at least prevent serious abuse. Additionally, while anyone can use the pre-existing bots available in the framework, some programming knowledge is required to turn the tool to be abusive. --AllyUnion (talk) 23:08, 10 December 2005 (UTC)[reply]
I am working on making it so it only works for users who have their name on a certain page (and are logged in with that name), so that way no one will be able to abuse it at all. Martin 23:11, 10 December 2005 (UTC)[reply]
Can we make it so that they have to be a logged in user with a bot flag? --AllyUnion (talk) 09:53, 11 December 2005 (UTC)[reply]

Bot permission please?

I have just been trying out Martin's AutoWikiBrowser. I would like to register User:Bobblebot as a bot account to use it. The task is to reduce linking of solitary months, solitary years, etc in accordance with the manual of style.

If another bot is already doing this task, please let me know. It is a huge slow task (for me anyway) and I would rather do something else. Bobblewik 18:06, 15 December 2005 (UTC)[reply]

Um, reduce linking? I don't quite understand. --AllyUnion (talk) 11:32, 16 December 2005 (UTC)[reply]
Yes. For example Economy of Algeria has many solitary year links, including 14 to 2004. The policy and popular misunderstandings are explained at:
Wikipedia:Make_only_links_relevant_to_the_context#What_should_not_be_linked
Wikipedia:Manual_of_Style_(dates_and_numbers)#Date_formatting
For example:
  • Text with links to solitary years: The short-lived ABC Cable News began in 1995; unable to compete with CNN, it shut down in 1997. Undaunted, in 2004 ABC launched a news channel called ABC News Now.
  • Text without links to solitary years: The short-lived ABC Cable News began in 1995; unable to compete with CNN, it shut down in 1997. Undaunted, in 2004 ABC launched a news channel called ABC News Now.
See this diff working towards that: [17]. Bobblewik 13:32, 16 December 2005 (UTC)[reply]
A bot can't distinguish between when it should or shouldnt be linked, it needs a human to watch over it. Plus it's pretty damn easy to do with the AWB! Martin 14:14, 16 December 2005 (UTC)[reply]
There is rarely, if ever, a valid reason to link to a year article when other date elements are not present, so removing such links (with proper edit summeries) is IMO one of the safest possible editing tasks for an automated or semi-automated process. Detecting the presence of other adjacent date elemetes (and they must be adjacent dor the link to function as a date preference mechanism, normally the only valid reason for the link) is pretty purely mechanical. DES (talk) 22:30, 16 December 2005 (UTC)[reply]
If the dynamic date formatting feature can recognize correctly-linked dates, there's no reason a bot can't. These links are pointless, look silly and should be removed by a bot. — Omegatron 22:41, 16 December 2005 (UTC)[reply]
I don't think bots should do the task suggested by Bobblewik nor I'm a particuarly convinced that Bobblewick bothers thinking about in which cases such links are definitely worth it, e.g. at Talk:Luxembourg_(city). -- User:Docu
There is almost never contextual reason to link a solitary year, if these can be detected by a bot they can be removed by one. Hence, the bot makes perfect sense and should be run. Neonumbers 09:46, 17 December 2005 (UTC)[reply]
Strong support as well for the above reasons. 99% of the time, the linked years are unnecessary. Gflores Talk 23:56, 17 December 2005 (UTC)[reply]
  • I'm not sure this is a good idea, since, as Martin points out, it says to generally not use them. In fact, in the section on dates of birth and death (which immediately follows), it does use bare years when giving the format for when only the year is known. So removing the links in all cases seems like a bad idea. --Mairi 00:14, 18 December 2005 (UTC)[reply]
  • Support. To Mairi: it should be possible to identify birth and death years from [[Category:xxxx births]] and [[Category:xxxx deaths]] and being in the first sentence.Susvolans 17:05, 18 December 2005 (UTC)[reply]

DFBot

I have created DFBot. Right now it has only one task. Once an hour it reads WP:RFA and creates a summary in my userspace of open noms. Maybe it will do more in the future, but that's it for now. Dragons flight 09:53, 18 December 2005 (UTC)[reply]