Wikipedia:Wikipedia Signpost/2012-11-05/Technology report

Technology report

Hue, Sqoop, Oozie, Zookeeper, Hive, Pig and Kafka

October engineering report published

In October:
  • 110 unique committers contributed patchsets of code to MediaWiki (up 12 on September)
  • The total number of unresolved commits stayed at 440.
  • About 35 shell requests were processed (no change).
  • 57 developers received developer access to Git and Wikimedia Labs (up 16).
  • Wikimedia Labs now hosts 137 projects (up 6), has seen 268 instances created (up 27) and has 694 registered users (up 61).

—Adapted from Engineering metrics, Wikimedia blog

The Wikimedia Foundation's engineering report for October 2012 was published this week on the Wikimedia Techblog and on the MediaWiki wiki, giving an overview of all Foundation-sponsored technical operations in that month (as well as brief coverage of progress on Wikimedia Deutschland's Wikidata project, phase 1 of which will soon be trialled on the Hungarian Wikipedia). Of the three headlines picked out for the report, two (the redesign of the mobile site "emphasizing readability and navigation" and the launch of a Wikipedia app for Windows RT and Windows 8 tablets) have already received Signpost coverage. The third drew attention to a proposed redesign on the signup page.

There was news that Parsoid, the new JavaScript-based parser due to be released in December, can "round trip" wikitext to HTML and back perfectly on 75% of a sample of 100,000 pages, does the same imperfectly but without great fault in a further 18%, and chokes on 7%. Parsoid will form the basis for the new Visual Editor and it will be rewritten in C++ to improve its performance. Likewise, there was news of improvements to both the "Page curation" and "Article Feedback" projects; version 5 of the latter is expected to go live on all English Wikipedia articles later this year. The "Wiki Loves Monuments" app was wound down during October, while the most cryptic update came from the WMF's Analytics team, who reported that they had worked on "puppetizing ... Hue, Sqoop, Oozie, Zookeeper, Hive, Pig [and] Kafka", all of which are services that help with distributed, large scale number crunching.

This month's Engineering report is the first for which a "friendly" summary version is also available.

Coding programs

Because only six mentors signed up to help with the Wikimedia Foundation's provisional Google Code-In programme—far fewer than the necessary number of approximately twenty-one—the Wikimedia Foundation has decided not to participate. (mailing list). "I know this disappoints some of you; we do want to encourage new participants, and we want some structured mentorship that isn't just Google Summer of Code," wrote WMF Engineering Community Manager Sumana Harihareswara, adding that she would "start a new thread about a more suitable program for us to participate in." The later thread referred to recognised that the Foundation (on behalf of MediaWiki generally) would still like to participate in a separate Outreach Program for Women, a FOSS outreach program to offer paid internships to women to work on our open source projects (mailing list). Applicants and mentors are welcome.

In brief

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks. Though there is no poll this week, last week's question (about the utility of videos) is still open for opinions.

  • TimedMediaHandler goes live: As previewed in last week's "Technology report", the TimedMediaHandler extension (which overhauls MediaWiki's video handling capabilities) finally went live to a production wiki this week in the shape of the English Wikipedia. Given that this initial deployment proved successful, the extension will almost certainly go live on Wikimedia Commons next week, followed by other smaller wikis later in the month.
  • GitHub replication: Replication of Gerrit repositories containing code from MediaWiki extensions to popular alternative code-hosting site GitHub has now successfully been set up under the account "wikimedia". There was another account "mediawiki" in the past which has been closed to avoid confusion and duplication (mailing list). Code from extensions joins so-called "core" MediaWiki code on the site, which began replicating last month. As with core code, the service is still read-only and a way to transfer contributions on GitHub back to Gerrit is still being worked on.
  • Chart showing a line in positive territory but with gaps during which it is at 0.
    It was a bad week for the JobQueue
    Bad week for the job queue: There were problems this week with the Wikimedia job queue, which handles delayed processing tasks such as updating category membership lists. The problems are now thought to be resolved.
  • WMF hires: Steven Bernardin joined the Operations team as Data Center Technician, working in the Tampa, Florida data center (no announcement was made but it was mentioned in this month's engineering report, summarised above). Although Tampa has long been the Foundation's primary data centre, its alternative facility in Ashburn, Virginia has been scheduled to take over that role for the last year (though it has not done so yet due to unforeseen difficulties).
  • Five bots approved: 5 BRfAs were recently approved for use on the English Wikipedia:
    1. Legobot's 23rd and 25th BRfA, tagging http://ap.google.* links with {{dead link}} and moving {{ODNBsub}} outside of {{cite xxx}};
    2. Italic title bot's 1st BRfA, adding {{Italic title}} to species/film/album/book articles that do not have the template;
    3. Snotbot's 12th BRfA, clerking and archiving on WP:RFPP;
    4. Makecat-bot's 3rd BRfA, maintaining {{Link FA}}, {{Link GA}} and {{Link FL}};
At the time of writing, 15 BRfAs are active. As usual, community input is encouraged.