Jump to content

User:Harej (WMF)/cloud

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Harej (WMF) (talk | contribs) at 08:13, 29 January 2019 (Rounding out). It may differ significantly from the current version.

Executive summary

Introduction

Wikimedia Cloud Services expands on the core technical infrastructure of the Wikimedia projects by providing technical resources for tool developers and others working on software that benefits the Wikimedia movement. The Technical Engagement team at the Wikimedia Foundation wants to understand who uses Cloud Services, what they use Cloud Services for, and why. Since 2015 we have surveyed developers on our Toolforge platform to learn how to better improve that service; this year, we expanded our survey to include both Toolforge and Cloud VPS. We also included questions on tool development and MediaWiki development as major use cases for our platforms. With this additional feedback, we hope to get a better sense of how Wikimedia Cloud Services can contribute to the Wikimedia movement.

Methodology

We prepared a questionnaire containing 24 questions, covering topics including basic demographic information, use of the Toolforge and Cloud VPS platforms, specific use cases of Cloud Services, and feedback around the Cloud Services. Respondents answered different parts of the survey depending on whether they have used Toolforge or Cloud VPS or whether they identify as a "tool developer."

The survey was distributed to 1,722 Wikitech users based on their membership in one or more Cloud VPS project (including the Tools project). The survey was completed by 163 respondents, a response rate of 9.5%. As this is not a random sample but a self-selecting sample, the results of this survey may not necessarily be statistically representative of the Cloud Services user population as a whole; rather, they reflect the perspectives of those who were motivated enough to respond.

The response set was divided into these demographic cohorts:

  • Tool developers vs. non-tool developers
  • Users of the predecessor Toolserver service vs. otherwise
  • Wikimedia Foundation staff vs. non-staff
  • Hours per week spent on tool development
  • Number of tools maintained
  • Number of years using Toolforge

For free-text responses, we tagged each response (where possible) with one or more categories, based on the different topics covered by the responses.

Findings

Demographics

Wikimedia Cloud Services provides computing resources free of charge to members of the community while also acting as an internal service provider for Wikimedia Foundation software engineers. Surveyed users of Toolforge and Cloud VPS are predominantly tool developers from the community: 85.9% of survey respondents identified as tool developers, while only 23.3% of survey respondents reported working for the Wikimedia Foundation as an employee, contractor, vendor, or intern.

Most users of Toolforge have been using the service for years. Of the 132 respondents who stated they used Toolforge, 20.5% reported using Toolforge for one year, 28.0% reported using Toolforge for 2-3 years, and 51.5% reported using Toolforge for four or more years.

By number of tools developed (i.e. created), 5.3% of tool developers developed zero tools, 22.1% have developed one tool, 35.1% have developed 2-3 tools, and 37.4% have developed four or more tools.

By number of tools maintained (i.e. not created, but worked on), 11.5% of tool developers maintain zero tools, 31.3% maintain one tool, 26.0% maintain two tools, and 31.3% maintain three or more tools.

By number of hours/week spent developing/maintaining tools, 26.0% spend zero hours per week, 29.8% spend one hour per week, 34.4% spend between two and eight hours per week, and 10.0% spend nine or more hours per week.

Toolforge is a successor to the Toolserver platform run by Wikimedia Deutschland from 2005 to 2014. Only 33.3% of respondents reported being users of Toolserver, which is down from last year's 40.3%.

Motivation

Wikimedia Cloud Services is just one of many providers of cloud computing service. We asked respondents to describe why they chose Wikimedia Cloud Services over other options. The most significant factor is access to Wikimedia-specific resources, including the wiki replicas, with a plurality of 30.7% using Wikimedia Cloud Services for this reason. The next two biggest factors were cost at 22.6% and "philosophical or ideological reasons" at 19.7%. Ease of use and privacy and security considerations were also considerations for some.

Respondent-submitted answers to this question vary. The most popular text response revolves around the idea that Wikimedia Cloud Services is an extension of the Wikimedia projects, making it a natural destination for Wikimedia-related code. One respondent wrote "my code belongs to the Wikimedia universe." Other comments focused on Cloud Services' collaborative environment, the use of only free software, good ping to Wikimedia servers, software testing, discoverability of tools developed, as well as simply working for the same organization.

Support and satisfaction

In general, survey respondents reported having little contact with the Cloud Services team; 86.5% reported contacting the Cloud Services team once per month or less. It is not clear whether this is due to support not being needed or an inability to find support. It is worth noting that there is a mixed opinion as to whether people feel supported when they contact the Cloud Services team; while 55.2% strongly agreed or agreed with the statement that they "feel [they are] supported by the Cloud Services team," 39.2% reported neither agreeing nor disagreeing with that statement. (5.5% disagreed or strongly disagreed.) The large proportion of people who neither agreed nor disagreed could suggest a lack of familiarity with Cloud Services support options. There is a similarly mixed opinion regarding how easy it is to run code, with 65.6% agreeing or strongly agreeing with such a statement while 23.3% neither agreeing nor disagreeing and 11.1% disagreeing or strongly disagreeing. Only 36.8% agreed (or strongly agreed) that information from the Cloud/Cloud-Announce mailing lists are useful.

Though relatively few in number, staff possibly have a significantly different experience from the volunteer population. Affiliation with the Wikimedia Foundation as a member of staff was found to be associated with the frequency with which the Cloud Services team was contacted for support (p = 0.01), with feeling supported by the Cloud Services team (p < 0.01), and the amount of work done locally vs. remotely on Toolforge (p = 0.04). Similar relationships were found elsewhere:

  • The number of years as a Toolforge user was found to be associated positively with one's opinion on tech support on Wikimedia Cloud Services as opposed to the Toolserver (p < 0.01), as well as one's opinion on the usefulness of the mailing lists (p = 0.01).
  • The number of hours per week spent on developing or maintaining tools was found to be positively associated with how often the Wikimedia Cloud Services team is contacted, how useful they think the mailing list is, whether they agree the documentation is clear, and the amount of work done locally vs. remotely. This suggests that those who spend more time working on tools tend to be more advanced users who are more comfortable with Wikimedia Cloud Services' operating environment.
  • The number of tools maintained was found to be positively associated with whether they considered information from mailing lists to be useful as well as the amount of work done locally vs. remotely.

This may suggest that the user experience for Toolforge and Cloud VPS favors those with strong technical skills, connections to Wikimedia Foundation staff, or both.

Survey respondents were broadly displeased with the state of Cloud Services documentation, with only 36.2% agreeing (or strongly agreeing) that they find Cloud Services documentation easy-to-find, 31.3% agreeing that it is comprehensive, and 36.8% agreeing that it is clear. Indeed, the subject of documentation came up frequently in comments, with complaints noting the documentation is geared toward advanced users, poorly organized, out of date, and not easy to find. One comment noted how documentation was spread between Wikitech, MediaWiki.org, and Meta, while another comment noted that it is difficult to distinguish between Cloud Services documentation and documentation intended for the production cluster. Documentation is a known issue and continues to be an area of focus for the Technical Engagement team.

There is broad agreement that Wikimedia Cloud Services has high uptime, with 89.0% agreeing or strongly agreeing with such a statement.

Four comments were submitted concerning access to support, including time zone issues that hinder online collaboration, complaints about Phabricator tasks taking too long to resolve, and one person feeling they were on their own. Miscellaneous comments concerned the ease of use of Cloud Services and issues with software versions.

Areas for improvement

We asked Toolforge users what we should do to improve Toolforge in the next year. Responses were clustered as follows:

  • Software support (10 responses)
    • Simpler deployment processes
    • Update software versions
    • Node.js, Java 8 support
    • Support for GNU Screen
  • Ease of use (7 responses)
    • Different respondents have suggested UIs for different facets of running/managing tools, as well as for managing git repos
  • Improve documentation (7 responses)
    • This includes documentation for bootstrapping projects, as well as other how-to tutorials
  • NFS speed (4 responses)
    • NFS as used on Wikimedia Cloud Services is very slow, and users notice

Other comments included recommendations for better deployment workflows, concerns about there not being enough resources, bring-your-own-Docker-container requests, improvements to Grid Engine, better monitoring support, better access to help, availability-related comments, backups for projects, better build processes, concerns about Cloud Services branding, cron for human users (and not just tool accounts), joins between user tables and wiki replicas, better metrics reporting, remote access to wiki replicas, deletion of tool accounts, a tool bootstrapping kit, and better tool discoverability.

We similarly asked Cloud VPS what we should do to improve Cloud VPS in the next year. Responses were clustered as follows:

  • NFS speed (4 responses)
  • Access to wiki content (3 responses) (referring largely to things like the text of articles, not currently available in the replicas)
  • Better documentation (3 responses)
  • Puppet should be easier to use (3 responses)

Other comments included requested improvements to Horizon, support for monitoring, logical object storage, backups, git-based workflows, making it easier to understand database resource limits, more flexible VM resource allocation, and easier ways to install/provision MediaWiki.

Additional reports

Toolforge

Programming language use was reported as follows:

Python 3 27.9%
PHP 26.3%
Python 2 16.6%
NodeJS 8.5%
Java 4.9%
Perl 3.2%
Mono/.NET 2.8%
C# 2.0%
Ruby 1.6%
Other 6.1%

Other programming languages submitted include Rust, Bash, Make, C++, PGSQL, Haskell, and Awk.

Most Toolforge users reported doing all or most of their development work locally on their machine, as opposed to remotely on Toolforge:

Almost all of the work 56.8%
More than half of the work 14.4%
About half of the work 8.3%
Less than half of the work 10.6%
I don't know 9.9%

The vast majority of Toolforge users use some kind of source control, with Git being the most popular choice:

Git 81.8%
I do not use source control 16.7%
Mercurial 0.8%
Other 0.8%

Cloud VPS

This was the first annual survey to ask Cloud VPS users about their usage of the service. Responses were as follows:

Hosting one or more tools or other public services such as a web app, bot, dashboard, API, etc. 32.1%
Testing and experimenting with software 26.2%
Running one or more MediaWiki instances 17.3%
Conducting data analysis or other large computational tasks 12.5%
Running a backing service (database, cache, etc.) for a tool or other public service 11.9%

No additional write-in responses were provided.

Additionally, 20.3% of Cloud VPS users reported relying on NFS to access the same files across different servers. Notably, 20.3% were not sure, which could possibly indicate the question was not phrased clearly.

Tool developers

While past Toolforge surveys implicitly focused on tool developers, this year's survey was the first to survey tool developers specifically as a population.

Web apps (tools that are accessed through a web browser) 31.6%
Bots, including pywikibot 29.0%
APIs 12.5%
On-wiki gadgets 8.9%
Dashboards and other data visualizations 7.9%
Microsites (small static websites) 7.6%
Other (please specify) 2.6%

Additional responses included: developer tooling, data processing tools, and maps-related tools.

We also asked about backing services used by tools. Responses were as follows:

MySQL/MariaDB (not including ToolsDB or the wiki replicas) 50.9%
SQLite 11.7%
Redis 8.6%
PostgreSQL 6.1%
Memcached 2.5%
MongoDB 1.8%
Other (please specify) 18.4%

Additional responses included: filesystem-based storage, AWS Lambda, Cron, Druid, ElasticSearch, Hadoop, Kafka, Kubernetes, LevelDB, and Memcached.

MediaWiki development

We are exploring how MediaWiki is used on Wikimedia Cloud Services. We asked respondents to describe how they set up MediaWiki on Cloud VPS. Five responses noted a preference for running the master branch of MediaWiki, while four prefer a stable branch and one preferred the WMF branch. Vagrant was the most popular mechanism for running MediaWiki, while Docker and Ansible were each mentioned once.

Discussion