Commons:Batch uploading/Wellcome Images
- Source to upload from:
- See http://blog.wellcomelibrary.org/2014/01/thousands-of-years-of-visual-culture-made-free-through-wellcome-images/
- I shall email the Images Team to see if an API is available. The standard web search does not seem to filter by licence.
- Describe the works to be uploaded in detail (audio files, images by …):
- Historic medical related photographs and illustrations.
- Which license tag(s) should be applied?
- CC-BY, possibly PD on a case by case or age basis.
- Fae, I've noticed that your ~1300 image test run uses the CC-BY-SA-3.0 license, inconsistent with the CC-BY-2.0 claim that appears on the source pages for these lithographs (or the CC-BY-2.0-UK license mentioned in the announcement). What prompted you to use CC-BY-SA-3.0? —RP88 00:22, 19 February 2014 (UTC)
- Oversight rather than design. I'm swapping these to CC-BY-2.0 and if a different interpretation comes out of our discussions later with the Wellcome, I'll apply that decision.
- Fae, I've noticed that your ~1300 image test run uses the CC-BY-SA-3.0 license, inconsistent with the CC-BY-2.0 claim that appears on the source pages for these lithographs (or the CC-BY-2.0-UK license mentioned in the announcement). What prompted you to use CC-BY-SA-3.0? —RP88 00:22, 19 February 2014 (UTC)
- CC-BY, possibly PD on a case by case or age basis.
- Is there a template that could be used on the file description pages? Do you think a special template should be created?
- We probably should create a credit template in negotiation with Wellcome.
- Category:Files from Wellcome Images holds current related uploads.
- {{Wellcome Images}} is obviously a related template. The Haz talk 05:24, 17 February 2014 (UTC)
- I'm thinking that we should use {{Artwork}} instead of {{Information}} as this seems most appropriate. I've created Institution:Wellcome Collection. Considering the template contains probably every field we could desire, it might be the best template to use. The Haz talk 16:36, 18 February 2014 (UTC)
- The test run of 1,300 lithographs use the artwork template. These will probably be over-written with better information when they go from low res to high res images by using information from the full library catalogue. There may be some of the 100,000 images that are not artworks, a bridge to be crossed when we come to it. --Fæ (talk) 20:38, 18 February 2014 (UTC)
- I'm thinking that we should use {{Artwork}} instead of {{Information}} as this seems most appropriate. I've created Institution:Wellcome Collection. Considering the template contains probably every field we could desire, it might be the best template to use. The Haz talk 16:36, 18 February 2014 (UTC)
Opinions
editCC vs. PD
editIt's great to have these images available, digitally, and I support the proposal to upload them by bot, but Wellcome are claiming copyright over, and to be the original source of, artworks and images from books which are already in the public domain. The assertion of copyright, and the right to attribution, should be rejected. They have added a strapline underneath each image; this will need to be removed. The process of downloading high resolution versions of these public-domain works is tortuous, with a CAPTCHA, irrelevant terms & condition, and zipped files. Andy Mabbett (talk) 13:27, 21 January 2014 (UTC)
- For the relevant images, the T&Cs are very simple,[1] they just say that CC-BY applies and we must use "Wellcome Library, London" as an attribution, which the normal sort of credit we would give anyway. If I had to (if it turns out we can get no API access) then I can automatically trim the bottom strapline before upload, I already have a handy bit of Python that can do it and the strapline is not a requirement in the T&Cs. Note, the full high resolution version does not have a strapline.
- (After a bit more testing)
The download links are confusing,The first download link ("Download low-res images") guides you to download the "web quality" version on display, the second ("Download hi-res images") leads you through a CAPTCHA process to give you a download link for a zip file. I would have difficulty automatic the CAPTCHA process. The zipped full quality download is brilliant archive quality, my test example being >7,000px across showing beautiful detail of every figure in the painting, we definitely must have them. - I will see if my email gets suitable results before testing much more, or considering how the workflow for batch upload could work.
- Note If you examine my first example manual upload (thumbnail above) this is a good example of where {{PD-Old-70}} or an equivalent may not apply and the best licence we could justify may well be the CC-BY one. The painting is catalogued as being created in the 1920's even though of an event in 1911, further I can find no date of death for the particular painter and this may be made complex due to the copyright law in China that may apply. It is worth observing that the EXIF data includes their old conditions, so has the licence as "cc-by-nc"; this is not in agreement with the stated website terms. --Fæ (talk) 15:04, 21 January 2014 (UTC)
- Another bit of license confusion is that their announcement identifies the license for these image as CC BY 2.0 UK while the tems identify the license as the unported version of CC BY 2.0. With regards to the PD images, where appropriate, I think something like {{Licensed-PD-Art|PD-old-100|Cc-by-2.0-uk|attribution=Wellcome Library, London}} is a suitable compromise and see some uploads are already taking that approach. —RP88 18:01, 21 January 2014 (UTC)
- The Ts&Cs don't just impose an attribution on us, but on all re-users. We shouldn't be echoing that. And surely, if the Chinese image is not PD, then WT have no right to apply CC-by, or assert copyright in any other manner? Andy Mabbett (talk) 23:05, 21 January 2014 (UTC)
- Andy, you appear to be getting views on this in many channels right now. I would rather wait until I have an email back from the Wellcome to my first question, which may enable batch upload quite nicely, and this might then also give me a suitable single point of contact to discuss how best to interpret copyright licensing. As it happens I raised the release of these images around 2 years ago with the Wellcome head of publishing, I am relieved that we have got as far as allowing public reuse of the images even if individual assessment of copyright on the 100,000 historic images for which truly are PD and which may have concerns, has yet to be completed. One of the benefits of a release on Commons is that our community is interested in copyright and will tend to winkle out these issues, even for complex and changeable areas of international IP law. --Fæ (talk) 09:53, 22 January 2014 (UTC)
- Went ahead and created {{PD-Art-Wellcome Trust}}. Feel free to amend. Jean-Fred (talk) 09:43, 22 January 2014 (UTC)
- Thanks for setting this up. The assumption of PD may not be valid in some cases, until we start some test runs and have a better sense of how much of an issue this is, it is probably not worth engineering the solution much further at this moment. --Fæ (talk) 09:56, 22 January 2014 (UTC)
- Sure. What’s nice (and dangerous) with the template is that we can easily tweak the licensing information later based on a finer understanding of their terms (like cc-by Vs. cc-by-uk). :-) Jean-Fred (talk) 10:17, 22 January 2014 (UTC)
- Thanks for setting this up. The assumption of PD may not be valid in some cases, until we start some test runs and have a better sense of how much of an issue this is, it is probably not worth engineering the solution much further at this moment. --Fæ (talk) 09:56, 22 January 2014 (UTC)
After handling a couple of these images, I believe that the batch upload project will need the support of a named contact within the Wellcome Library, or regular access for one or more Commons volunteers to be able to research the background of some of the collection. A good example of a file likely to be contested is included on the right as a thumbnail. It may well be that this was donated to the Wellcome Library as part of an set of archives but there is potential for this to be questioned due to the copyright mark naming Wojnarowicz as a member of Act Up (unfortunately David Wojnarowicz died in 1992, it is a photograph of Wojnarowicz as a boy that is featured in the poster). If the Wellcome Library does have a relevant letter of release or similar from an Act Up representative or agent, then this would be the basis of an OTRS ticket, or a public clarification in the description on Commons. Act Up would have produced this poster as part of their public knowledge mission and I have no doubt that representatives of the organization would confirm this as public domain if approached. Should this need to be done, then volunteers in Wikimedia LGBT can assist.
I am not sure at the moment how many of the 100,000 might be questioned, this would be a nice bit of analysis to do early on in the project so that suitable workflows deal with questions and there is confidence in how the collection is assessed before uploading to Commons. Due to the volume of work, this may even be an area that we may want to propose funding for to ensure it is done consistently and in a timely way. --Fæ (talk) 13:50, 23 January 2014 (UTC)
With regard to ACT UP posters, I have sent off this email request for confirmation. --Fæ (talk) 09:11, 24 February 2014 (UTC)
- Excellent Fae. It would be great if we can get permission to host these ACT UP posters under a CC-BY license. However, to be honest, I confess that I'd like to know whether or not ACT UP had already given these posters to Wellcome under terms that permit Wellcome to distribute them with a CC-BY license. I really hope so, since If they haven't done so, even if they are willing to do so now, this would kind of cast a pall over Wellcome's generous release of their collection (as that might indicate we'll have to give closer scrutiny to the validity of the CC-BY claim on individual images, if Wellcome hasn't been careful when applying this license). —RP88 10:28, 24 February 2014 (UTC)
NC in the EXIF
editMore worryingly is in the EXIF data for the image to the right "Copyrighted work available under Creative Commons by-nc 2.0 UK" --AdmrBoltz 13:15, 27 January 2014 (UTC)
- As noted previously, this appears out of date compared to the terms on the site. Using NC was their *old* policy. During a batch upload we could change the EXIF, however it is better to keep the digital file identical to the original. --Fæ (talk) 13:22, 27 January 2014 (UTC)
- Must have missed that above. While I normally would agree that keeping original EXIF data is good, this could lead to confusion if someone were to reuse the content out of Wikimedia. --AdmrBoltz 14:07, 27 January 2014 (UTC)
- I have the skills to get Faebot to tweak the EXIFs with any agreed corrections, though I would suggest this only happens after the originals are uploaded so they appear in the file version history. I would look at this as part of the main upload project, once that gets under-way. --Fæ (talk) 14:16, 27 January 2014 (UTC)
- Must have missed that above. While I normally would agree that keeping original EXIF data is good, this could lead to confusion if someone were to reuse the content out of Wikimedia. --AdmrBoltz 14:07, 27 January 2014 (UTC)
Technical stuff
editJust noting I've enlarged and transferred their logo to Commons - File:Wellcome Trust logo.svg Nick (talk) 16:09, 21 January 2014 (UTC)
Avoiding over-categorization
editI am using the Keywords (when available) on the Wellcome Images library catalogue page as a starting list to then look on Commons for existing categories of the same name. This does lead to matches with "diffusion" categories such as China or Hospitals. After Roland zh and then Mark Marathon raised this as an issue on my talk page, I have created a housekeeping script that sniffs through all uploads, tests for use of categories using the {{Categorise}} template and trims them off. I have not integrated this into the upload itself as it is already 40% done (so I want to avoid monkeying around with it for consistency) and doing this a relatively short time after upload (possibly a few days) gives a moment for volunteers to spot the images appearing in their watched categories for them to "diffuse" by hand; in balance this feels like a better option than not taking some value from all the available keywords. Note, the Wellcome keywords all stay in the description.
The script seems to process around 25,000 images per day, but gets stuck and needs a kick probably due to my home wifi, and something like 1% or 2% of images are affected. As this is going to be a one-off fix, it will run only for the upload and once all uploads are complete. There may be some residual issues which I could trap, such as where diffusion categories have been added by volunteers rather than me on upload, however as this is one-off I'm not currently planning for it to get this smart for the likely small number of images that might be this sort of fringe case. --Fæ (talk) 12:58, 16 October 2014 (UTC)
- I have integrated this test into the upload script. From today (i.e. ~25,000 more images) this will ensure that diffusion categories are not automatically applied just because they appear as a keyword on the Wellcome Images catalogue page. --Fæ (talk) 16:26, 30 October 2014 (UTC)
Metadata and conventions
editname | data structure | conventions and notes |
---|---|---|
photo number | [A-Z]\d{7 } |
This number may be found in Wellcome catalogues as "photo no" or "image number" and may be shorter by having dropped leading zeroes. This number appears unique to the Wellcome Images collection but other identification numbers may be usefully included as references from other catalogues, such as the Wellcome Library reference number. |
source | "http://wellcomeimages.org/indexplus/image/" + <photo number> + ".html"
|
An alternative of wellcomeimages.org/ixbin/imageserv?MIROPAC=<photo number> will redirect to the same gallery page.
|
(Draft!) Mapping these to Commons parameters:
- filename = <safe version of WI short catalogue description TBD > + "Wellcome " <WI photo number> + ".jpg"
- source = <WI source>
Progress
editAssigned to, task | Progress | Bot name | Category |
---|---|---|---|
Fæ, to email Wellcome for information on the API or licence filtering. | Status: Done
Meetings arranged |
||
Fæ, run single exemplar manually | Status: Done Exemplar 1 - battle at the Ta-ping gate, 1911 Exemplar 2 - revolutionary women's army attacks Nanking, 1911 |
See image | |
Fæ, run sample batch upload | Status: Done aiming for before 26th Feb to coincide with the Wikipedia:WikiProject Medicine/Wellcome Library editathon 2014 A set of 1,300 lithographs is being uploaded in low resolution as a temporary measure to support the editathon using a customized upload rather than the GWToolset. This test batch can be upgraded to high resolution at a later stage. |
Faebot | Files from Wellcome Images |
Fæ, review potential as a candidate for using the GLAMwiki toolset | Status: Done based on Wellcome/WMUK meeting held on 3rd Feb 2014 | ||
Fæ, agree access for Faebot avoiding the manual CAPTCHA (Wellcome Images uses Google's reCAPTCHA service as an anti-bot device.[2] | Status: Done
|
Faebot | |
Fæ
|
Status: Done
|
Faebot | |
Fæ
|
98.9% completed (estimate) |
- | Files from Wellcome Images |
Fæ
|
Status: Done | - | Files from Wellcome Images |
Fæ
|
Status: Started | - | - |
Reports
editThe tables below are maintained by Faebot with updates running on a daily basis, for up to a month after batch upload completes. Then on a weekly basis for several months depending on user request and activity levels. If you have suggestions for improving the reports please leave a note at User talk:Fæ.
Volunteers
edit- Contributors by number of edits to Wellcome Images:
- Volunteers can be recognized using the Wellcome Images project barnstar.
Most popular categories
edit- This report lists the top 100 Commons categories by the number of Wellcome Images added as a result of this project, giving a sense of the broad range of topics. Some large categories, such as Lithographs will be under-represented as the files are sensibly sorted into sub-categories.
Usage
edit- Glamorous usage report: GLAMorous
- This table is extracted from Glamorous and shows the first 24 images most reused on other Wikimedia projects and the number of pages they are used in:
Most edited images
editLargest images
editOddities and investigations
edit- File:Michael Faraday. Coloured stipple engraving by J. Cochran, 1 Wellcome V0001852EL.jpg is a photograph of lockers rather than an engraving of Faraday. Note sent to W. library for follow-up on 2014-10-24.
- File:Medical Texts; Assyrian Wellcome M0019841.jpg Now in the British Museum, this is the only record of tablets from the ancient Ashurbanipal Library being owned by the Wellcome. Yet to find the record in the British Museum database to cross-link, request for advice put to an ancient historian on 2014-10-26.
- File:Luigi Galvani. Lithograph. Wellcome M0016506.jpg Ref:http://catalogue.wellcomelibrary.org/record=b1166582a Ref:http://wellcomeimages.org/indexplus/image/M0016506.html The media links include the title page of the book "The life of John Hunter". This appears to have been incorrectly identified as a lithograph of Luigi Galvani. Note sent to the W. library for follow-up on 2014-10-29. In the meantime, cross-links on Commons will be erroneous.
- File:Psychoanalysts including Freud. Photograph. Wellcome V0027600.jpg After description added on Commons, this feedback sent back for the Wellcome Library catalogue:
- Ref:http://catalogue.wellcomelibrary.org/record=b1162474a;
- After upload to Wikimedia Commons, one of our volunteers has identified the Psychoanalysts in the photograph as:
- Beginning with first row, left to right: Franz Boas, E.B. Titchener, William James, William Stern, Leo Burgerstein, G. Stanley Hall, Sigmund Freud, Carl G. Jung, Adolf Meyer, H.S. Jennings. Second row: C.E. Seashore, Joseph Jastrow, J. McK. Cattell, E.F. Buchner, E. Katzenellenbogen, Ernest Jones, A.A. Brill, Wm. H. Burnham, A.F. Chamberlain. Third row: Albert Schinz, J.A. Magni, B.T. Baldwin, F. Lyman Wells, G.M. Forbes, E.A. Kirkpatrick, Sandor Ferenczi, E.C. Sanford, J.P. Porter, Sakyo Kanda, Hikoso Kaksie. Fourth row: G.E. Dawson, S.P. Hayes, E.B. Holt, C.S. Berry, G.M. Whipple, Frank Drew, J.W. A. Young, L.N. Wilson, K.J. Karlson, H.H. Goddard, H.I. Klopp, S.C. Fuller.
- 2014-11-02.
- File:WMS 693, Rotulum hieroglyphicum G. Riplaei Wellcome L0032139.jpg hi-res appears to be corrupt at Wellcome Images and needs regenerating.
- File:David Livingstone memorial in Blantyre; Livingstone found de Wellcome V0018878.jpg doesn't relate to David Livingstone, but depicts a woodblock print of "Der Artz Louis Lobera d'Avila". Description also incorrect. Posted a comment to the library per Fæ's suggestion, updated file description and cats and moved file to new name File:Louis Lobera d'Avila in his study woodcut by H. Burgkmair.jpg. -- Deadstar (msg) 09:30, 2 December 2014 (UTC)
- File:Roderick Murchison resting his right hand on a short stick. Wellcome V0027580.jpg was previously a portrait of an "unidentified man" in the Wellcome catalogue. I sent them a little note on it. -- Deadstar (msg) 16:50, 2 December 2014 (UTC)
- Another note on the above: after Tineye indicated who this was, it appears that the library had the identity information available on the the full biographical record. The gallery description however says "unidentified man".? Same for another "unidentified man", now identified as John Ross (Full biographical record vs gallery description. -- Deadstar (msg) 09:46, 3 December 2014 (UTC)
- File:AIDS prevention advert from Lisbon Wellcome L0054385.jpg may not have a proper permission; see File talk:AIDS prevention advert from Lisbon Wellcome L0054385.jpg#Copyright. -- Tuválkin ✉ 18:28, 13 March 2015 (UTC)
Errors logged during maintenance
editEntries are being added at the bottom during automated checks of the Wellcome Library catalogue entry for an artefact from the Wellcome Image library image catalogue entry. One artefact (such as a painting or rare book) may have many images (such as photographs of the painting or scans of pages of a book). Entries on this list are added when a link to the Wellcome Library catalogue given in the Wellcome Images catalogue fails to open, and presumably does not exist under that database identity.