Wikidata:Requests for permissions/Bot/SamoaBot 26
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- On hold. I've reviewed the discussion, and didn't see any complaints about the technical implementation (Ricordisamoa said he would avoid Russian dates/things). The main concern is about whether this information should be sourced, and the quality of that source. This is an important discussion to have, but I don't think here is the place for this. I would recommend a section be started on Wikidata:Requests for comment/References and sources (or maybe another RfC) that discusses what things need sources, as the current RfC guidelines are more about how to source things. So I'm closing this as on hold since a discussion does need to happen, just not here. Legoktm (talk) 07:03, 9 June 2013 (UTC)[reply]
- Update: Per Wikidata:Requests for comment/Sourcing requirements for bots, I'm marking this as Approved. Legoktm (talk) 06:19, 23 August 2013 (UTC)[reply]
SamoaBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Ricordisamoa (talk • contribs • logs)
Task/s: import future "date of birth" and "date of death" properties from Wikipedia
Function details: as Lydia Pintscher said, the "Point in time" datatype is getting closer, so it's time we begin to think to import these data. I don't know the API model that will be used for this, though, but expect for this feature to be implemented in the next PWB release, so I'm getting ready. --Ricordisamoa 22:58, 23 May 2013 (UTC)[reply]
- Support --Tobias1984 (talk) 13:16, 27 May 2013 (UTC)[reply]
"Point in time" is now available, date of birth (P569) and date of death (P570) have been created, so I'm going to start a test run soon. --Ricordisamoa 20:24, 29 May 2013 (UTC)[reply]
Oppose Dates of birth and death should be referenced using an actual reliable source and this bot can't do it. This isn't a case of me asking for too much: Wikidata:Introduction explicitly states that the notion of verifiability is important around here. Pichpich (talk) 04:36, 30 May 2013 (UTC)[reply]
- So you're saying we've had unreliable information for years? --Ricordisamoa 07:29, 30 May 2013 (UTC)[reply]
- I'm saying that non-trivial data on Wikidata should be referenced. Does your bot check that the statement being imported is referenced? Does it check that the data you import actually matches the reference? Clearly the answer is no and no. How is that not in direct contradiction with Wikidata:Introduction? Every time we've tried to convince Wikipedia editors to use Phase 2 features, the one consistent complaint has been reliability and referencing and our response has always been that we'd source statements carefully. This bot is basically saying "we were just kidding". Pichpich (talk) 13:39, 30 May 2013 (UTC)[reply]
- Beside I think that date of birth and date of death are trivial data, I can't understand why all Wikipedians had never added sources for date of birth and date of death but they all cry out when we just copy their not referenced data. All authority control datasets I know have only the year of birth and the year of death. But because we want to be usable for Wikipedia, which has better dates, we can't simply copy it from GND or LCCN datasets and reference it with this. I propose to run this bot, because it adds more sources than the most human editors, it adds imported from Wikimedia project (P143) to all edits. We can't made this task manually with adding good references to all dates. --Pyfisch (talk) 07:35, 31 May 2013 (UTC)[reply]
- Most biographies include links to sources for the date of birth and date of death. It's true that this isn't always the case and many are working to change that. The people who cry out that these dates should be sourced are the same that are trying to add proper sources on en.wiki and they're the ones following Wikipedia policy. You, on the other hand, are on the side of people who believe that using sources gets in the way of adding data. That's against every policy on every Wikipedia and against the policy on Wikidata. Pichpich (talk) 13:29, 31 May 2013 (UTC)[reply]
- @Pyfisch What is trivial data ? You can't source element with imported from Wikimedia project (P143) because once this data will be used in wikipedia you will get wikipedia infoboxes sourced by wikipedia and this is against wikipedia rules. And if someones deletes some sources in wikipedia articles you won't know it in wikidata and all the system will be wrong. Data can't be separated from their sources: you can't have data in wikidata and references in wikipedia. Snipre (talk) 18:25, 31 May 2013 (UTC)[reply]
- If we take the the article w:en:Otto Hahn for example. A bot can import very easely the birth date and the death date, but because Wikipedia have not added machine detectable sources for these facts, a bot can't import these sources. At this moment Wikidata knows 1198423 persons (Source: WD:Database_reports/Constraint_violations/P107#Values_statistics), if dates would be added manually with sources and a human would need only five minuts per person in average, it would take 99868 h to do that. We would need over 33.000 editors which work every day three hours on this topic to do it one year. Since we don't have so many very active editors and many Wikipedians dont't contribute to Wikipedia and they don't add better sources also to Wikidata we can't do this. We need The Bots!
- @Snipre I define trivial data, as standard, not disputed data. Facts which are accepted by all people, facts about we don't think of them very much. You say that data could not be seperated from ther sources, why don't Wikipedia delete all articles with completetly no sources and after it all facts with no sources? WP would be pretty empty after it. Maybe whe should add a "citation needed" property to solve this issue for Wikidata. --Pyfisch (talk) 15:30, 4 June 2013 (UTC)[reply]
- That's misleading. The choice is not "forbidding bots to import any data" vs. "getting all data from Wikipedia". There are already proposed bots (SamoaBot 32, SLiuBot 2) that would import date of birth/death from reliable databases. We need to continue our search for such opportunities. Of course, harvesting data from these sources requires a bit more effort for the bot operators but it's doable. It will not get the data for 1.2M biographies overnight but we can get a lot done this way and it will quickly grow the databases around the more important items. Furthermore, it's absurd to claim that gathering data can't be done by human editors: after all these 1.2M biographical items correspond to many millions of biographical articles which were written by humans, not bots. Pichpich (talk) 18:03, 6 June 2013 (UTC)[reply]
- @Pyfisch What is trivial data ? You can't source element with imported from Wikimedia project (P143) because once this data will be used in wikipedia you will get wikipedia infoboxes sourced by wikipedia and this is against wikipedia rules. And if someones deletes some sources in wikipedia articles you won't know it in wikidata and all the system will be wrong. Data can't be separated from their sources: you can't have data in wikidata and references in wikipedia. Snipre (talk) 18:25, 31 May 2013 (UTC)[reply]
- Most biographies include links to sources for the date of birth and date of death. It's true that this isn't always the case and many are working to change that. The people who cry out that these dates should be sourced are the same that are trying to add proper sources on en.wiki and they're the ones following Wikipedia policy. You, on the other hand, are on the side of people who believe that using sources gets in the way of adding data. That's against every policy on every Wikipedia and against the policy on Wikidata. Pichpich (talk) 13:29, 31 May 2013 (UTC)[reply]
- Beside I think that date of birth and date of death are trivial data, I can't understand why all Wikipedians had never added sources for date of birth and date of death but they all cry out when we just copy their not referenced data. All authority control datasets I know have only the year of birth and the year of death. But because we want to be usable for Wikipedia, which has better dates, we can't simply copy it from GND or LCCN datasets and reference it with this. I propose to run this bot, because it adds more sources than the most human editors, it adds imported from Wikimedia project (P143) to all edits. We can't made this task manually with adding good references to all dates. --Pyfisch (talk) 07:35, 31 May 2013 (UTC)[reply]
- I'm saying that non-trivial data on Wikidata should be referenced. Does your bot check that the statement being imported is referenced? Does it check that the data you import actually matches the reference? Clearly the answer is no and no. How is that not in direct contradiction with Wikidata:Introduction? Every time we've tried to convince Wikipedia editors to use Phase 2 features, the one consistent complaint has been reliability and referencing and our response has always been that we'd source statements carefully. This bot is basically saying "we were just kidding". Pichpich (talk) 13:39, 30 May 2013 (UTC)[reply]
- Support — Ayack (talk) 08:18, 30 May 2013 (UTC)[reply]
- Support Tpt (talk) 11:22, 30 May 2013 (UTC)[reply]
- Support Magnus Manske (talk) 11:38, 30 May 2013 (UTC)[reply]
- Support Mushroom (talk) 12:40, 30 May 2013 (UTC)[reply]
- Support --Pyfisch (talk) 07:35, 31 May 2013 (UTC)[reply]
Some more test edits; I'm also going to check for references in articles. --Ricordisamoa 08:36, 31 May 2013 (UTC)[reply]
- Support.--CENNOXX (talk) 15:22, 31 May 2013 (UTC)[reply]
- I think the best source of birth/death date a bot can currently get is the one stored in Bibliothèque nationale de France ID (P268). Of course, this is not as good as providing the source that the BNF itself had used, but the quality standards are high, and it should be relatively easy to retrieve. It is in ISO format, and often with precision down to the day level. --Zolo (talk) 17:17, 31 May 2013 (UTC)[reply]
- Oppose need to be sourced and wikipedia is not a source. Snipre (talk) 18:25, 31 May 2013 (UTC)[reply]
- We do import information from Wikipedia. This isn't really specific to this request. -- Docu at 03:58, 2 June 2013 (UTC)[reply]
- Oppose – Cannot use Wikipedia as source for statements. That will make the data unusable in the Wikipedias. It is time to switch to independent databases to import from, and/or use individual relevant and reliable sources for each statement. Byrial (talk) 20:59, 31 May 2013 (UTC)[reply]
- Why should this be different for these properties? -- Docu at 03:58, 2 June 2013 (UTC)[reply]
- It should not be different for these properties. Statements should be sourced, and not just imported from Wikipedias without sources. I agree with Micru below. Byrial (talk) 04:30, 2 June 2013 (UTC)[reply]
- No doubt that we should source data. All data in Wikipedia should already be sourced. As such we can import it. Other than that, I disagree with Micru. Dbpedia is not a complement to Wikidata.
I suppose his vision makes it that he doesn't actively contribute to Wikidata. Oddly his sole contribution to statements in Wikidata fails his own criteria: it is unsourced .. LOL. -- Docu at 10:21, 2 June 2013 (UTC)[reply]- ^Maybe that's because, LOL, we don't yet have a proper referencing system. FallingGravity (talk) 20:29, 2 June 2013 (UTC)[reply]
- LOL .. so they wait till it's there while flooding us with theories? -- Docu at 04:57, 5 June 2013 (UTC)[reply]
- Surely you understand the difference between an unsourced statement and an authorization for a bot to add thousands if not hundreds of thousands of unreferenced statements. Pichpich (talk) 17:13, 5 June 2013 (UTC)[reply]
- LOL .. so they wait till it's there while flooding us with theories? -- Docu at 04:57, 5 June 2013 (UTC)[reply]
- ^Maybe that's because, LOL, we don't yet have a proper referencing system. FallingGravity (talk) 20:29, 2 June 2013 (UTC)[reply]
- No doubt that we should source data. All data in Wikipedia should already be sourced. As such we can import it. Other than that, I disagree with Micru. Dbpedia is not a complement to Wikidata.
- It should not be different for these properties. Statements should be sourced, and not just imported from Wikipedias without sources. I agree with Micru below. Byrial (talk) 04:30, 2 June 2013 (UTC)[reply]
- Why should this be different for these properties? -- Docu at 03:58, 2 June 2013 (UTC)[reply]
- Oppose for now, at least - Wikidata needs to grow up and realize we can't just pirate all our info from the Wikipedias. FallingGravity (talk) 23:53, 31 May 2013 (UTC)[reply]
- Comment Bot should take care about cultural differences. For example, dates for Russian Empire (and probably some other countries) related topics should be according to Julian calendar. Sure, dates are same, but Julian calendar could flag data users to handle such dates in special way, like this is done in Russian Wikipedia. Unfortunately Wikidata interface doesn't provide possibility to switch representation. --EugeneZelenko (talk) 03:28, 1 June 2013 (UTC)[reply]
- Anyway, I am not going to import dates from ru.wiki (at least until this problem is fixed). --Ricordisamoa 03:50, 1 June 2013 (UTC)[reply]
- It's not depend on language. Julian calendar (old style) dates are specified in English Wikipedia articles about Russian Empire related subjects too. May be not in infoboxes and not for all subjects. --EugeneZelenko (talk) 14:03, 1 June 2013 (UTC)[reply]
- Anyway, I am not going to import dates from ru.wiki (at least until this problem is fixed). --Ricordisamoa 03:50, 1 June 2013 (UTC)[reply]
- Oppose Wikidata is a collection of sourced statements, not a database of data from Wikipedia (dbpedia alredy does that). In my opinion Wikidata should strive to contain as many sourced statements as possible. If that means having less data, it doesn't matter, it is more important to keep the data quality high. Dates of birth and death are particularly tricky, and even if unsourced dates have been allowed in Wikipedia, that doesn't need to be the case for Wikidata. In fact Wikidata should set the bar higher in that regard, because in the past we could have some Wikipedias right and some others wrong, now we can have all of them wrong if we are not careful with things like this. I also recommend taking a look to the proposed guidelines for sourcing statements, where all statements would be required to be sourced except in some circumstances (common sense, basically).--Micru (talk) 05:00, 1 June 2013 (UTC)[reply]
- Comment Oppose unless the bot can import the source info as well as the date. This will need the web page datatype to be created first for a lot of these dates. Add P387 (P387) with text "web page needs to be checked by a human and quote added here" as a qualifier to the date. Filceolaire (talk) 08:48, 1 June 2013 (UTC)[reply]
- Comment I think using P387 (P387) for this function is an abuse of this property. --Pyfisch (talk) 16:04, 4 June 2013 (UTC) Please don't forget that this is a multilingual project, single language strings are discriminating and they are not well to search over API. --Pyfisch (talk) 15:09, 6 June 2013 (UTC)[reply]
- Support -- Docu at 03:58, 2 June 2013 (UTC)[reply]
- Support --Amir (talk) 03:09, 3 June 2013 (UTC)[reply]
Update: I've filed also SamoaBot 32. --Ricordisamoa 08:48, 2 June 2013 (UTC)[reply]
- Comment Editors who support this request should explain how their support can be reconciled with the soon-to-be-adopted guideline for sourcing statements. Pichpich (talk) 13:46, 3 June 2013 (UTC)[reply]
- Answer In the summary table there is a line which says: "... the source requirement can be skipped: When a value is common knowledge, and it has not been disputed." this fits on nearly all birth and death dates. --Pyfisch (talk) 16:00, 4 June 2013 (UTC)[reply]
- Absolutely not. Dates of birth are almost never common knowledge. Dates of death are somewhat better known although it's not uncommon to find uncertainty on the exact day, even for very recent deaths. Quite a few dates of birth/death on en.wiki are tagged with "citation needed" templates and a quick look at the very careful sourcing in en:Deaths in 2013 further shows that information about someone's death is considered (on en.wiki) as requiring a reference. Stryn also mentioned earlier that the Finnish wiki has very strict rules on referencing dates of birth/death. Pichpich (talk) 14:48, 5 June 2013 (UTC)[reply]
- Answer In the summary table there is a line which says: "... the source requirement can be skipped: When a value is common knowledge, and it has not been disputed." this fits on nearly all birth and death dates. --Pyfisch (talk) 16:00, 4 June 2013 (UTC)[reply]