Wikidata:Requests for permissions/Bot/MatSuBot 7
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 19:23, 9 June 2017 (UTC)[reply]
MatSuBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Matěj Suchánek (talk • contribs • logs)
Task: Remove/deprecate less precise dates without reliable source.
Code: Pywikibot, SPARQL (not yet written).
Function details: The number of sourced statements is growing as we are matching our entities against other datasets and import their data. Sometimes the data is different from what's been imported from Wikipedia infoboxes before. This violates constraints like Single value and should be sorted out.
My proposal is to have a bot which removes less precise time values, such as date of birth (P569), from less reliable sources (or without source). For example, Trần Nhân Tông (Q72083) has two dates of birth: 1258 and 11 November 1258. The former was imported from a Wikipedia edition and is less precise, the latter was imported from BnF authorities (Q20666306), uses day precision and is inside the former. The bot would either remove the less precise statement, made it deprecated or made the more precise statement preferred (which could be integrated to PreferentialBot).
Sample query:
SELECT ?item ?val1 ?prec1 ?val2 ?prec2 {
?item p:P569 ?statement1 .
?item p:P569 ?statement2 FILTER( ?statement2 != ?statement1 ) . # more than one statement
MINUS { ?item p:P569/wikibase:rank wikibase:PreferredRank } .
?statement1 psv:P569 [ wikibase:timeValue ?val1; wikibase:timePrecision ?prec1 ] .
?statement2 psv:P569 [ wikibase:timeValue ?val2; wikibase:timePrecision ?prec2 ] .
FILTER( ?prec1 < ?prec2 ) . # different precision
MINUS {
?statement1 prov:wasDerivedFrom ?ref1 .
?ref1 ?pr1 [] .
FILTER( ?pr1 != pr:P143 ) . # the less precise statement is without real source
} .
?statement2 prov:wasDerivedFrom ?ref2 .
?ref2 ?pr2 [] .
FILTER( ?pr2 != pr:P143 ) . # the more precise statement does have it
FILTER( YEAR( ?val1 ) = YEAR( ?val2 ) ) .
FILTER( ?prec1 = 9 || MONTH( ?val1 ) = MONTH( ?val2 ) ) . # one time value is inside the other one
}
--Matěj Suchánek (talk) 17:45, 25 March 2017 (UTC)[reply]
- "The bot would either remove the less precise statement, made it deprecated or made the more precise statement preferred" - I think you need to pick one. In the specific example given, I would just remove the less precise statement, it adds nothing to the information available. If the less precise statement disagreed, I would recommend keeping it but deprecating. I don't think we want to automate making statements preferred, that should be a human decision. ArthurPSmith (talk) 17:30, 27 March 2017 (UTC)[reply]
- Support ChristianKl (talk) 16:09, 23 April 2017 (UTC)[reply]