Wikidata:Requests for permissions/Bot/SamoaBot 33
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 18:21, 25 June 2013 (UTC)[reply]
SamoaBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Ricordisamoa (talk • contribs • logs)
Task/s: import sex or gender (P21) from full name (Q1071027)
Function details: it loops into this list by Magnus Manske, and adds sex or gender (P21) with the above "source". --Ricordisamoa 13:45, 9 June 2013 (UTC)[reply]
- I am skeptical about assuming the sex based on the name alone. Some people may have a name of the opposite sex for various reasons, and some names are used for different sexes in different cultures or languages. Byrial (talk) 13:59, 9 June 2013 (UTC)[reply]
- One of the names on the list, Chris, can be used as shortform for the normally female names Christina and Christine and possibly more. Byrial (talk) 14:06, 9 June 2013 (UTC)[reply]
- While there may be the odd instance of a woman assuming a man's name to the degree that it shows up as their Wikipedia title, I tried to take great care to use only names that are used only for men across cultures and languages. I have given the list of names I used; if you think some of them are not used purely for men, I can remove them and update the list.That said, any bot action editing >130K items is bound to get a few wrong, but it will be significantly less work fixing the bot errors than adding them all manually, in a process that is bound to be error-prone itself. UPDATE after edit conflict: I'll remove Chris from the list. Any others? --Magnus Manske (talk) 14:07, 9 June 2013 (UTC)[reply]
- "Chris" removed. 134,965 items remaining. Same location. --Magnus Manske (talk) 14:24, 9 June 2013 (UTC)[reply]
- Please also remove Jan which may be a shortform for Janice or Janet. Then I will Support, as the error rate probably will low enough to justify the saved work. Besides I note that the statements will be recognizable by the source indication, so the users will be "warned". Byrial (talk) 14:49, 9 June 2013 (UTC)[reply]
- Done, 133,927 items remaining in the list. --Magnus Manske (talk) 15:21, 9 June 2013 (UTC)[reply]
- Please also remove Jan which may be a shortform for Janice or Janet. Then I will Support, as the error rate probably will low enough to justify the saved work. Besides I note that the statements will be recognizable by the source indication, so the users will be "warned". Byrial (talk) 14:49, 9 June 2013 (UTC)[reply]
- Support. Good idea. Mushroom (talk) 18:23, 9 June 2013 (UTC)[reply]
- Support. Even if there are one or two errors, this is a large step forward. -- Docu at 19:10, 9 June 2013 (UTC)[reply]
Zolo thinks "Michele" is a female name. --Ricordisamoa 19:04, 9 June 2013 (UTC)[reply]
- Maybe "Michelle" ? -- Docu at 19:09, 9 June 2013 (UTC)[reply]
- That may be a good idea, once we have exhausted other solutions. But I think we should first try to fill as much as we can with better sources (like possibly the databases that are already linked from Wikidata). There is no real hurry for this, unless it is really important to have this property filled out for some other job. At least we should remove Michele (mostly a female name in the US, like Michelle, you can find a few examples in en:Michele (given name)). I would also remove Joe (a nickname, sometimes used by female, I think) and José (apparently a female name in the Netherlands (see en:José#Female_form). --Zolo (talk) 19:43, 9 June 2013 (UTC)[reply]
If one is willing to trade-off a higher success rate for a more complex process, some of the error could be mitigated by verifying if the article is categorized in some descendent of de:Kategorie:Frau, en:Category:Women and so on. 132.203.167.146 01:05, 10 June 2013 (UTC)[reply]
- I think that these categories are already used to add sex to items. Byrial (talk) 09:01, 10 June 2013 (UTC)[reply]
For biographies of a certain length in German one would expect that the ratio of occurrences of the pronoun "sie" by occurrences of the pronoun "er" is very high for women and very low for men (of course the same is true for the English pronouns "she" and "he"). It's a fairly simple check to implement. Pichpich (talk) 15:36, 10 June 2013 (UTC)[reply]
- It is a good idea to count the occurrences of sex specific pronouns. I will do that first in the Danish Wikipedia, and then also in other Wikipedias if it is a success. I will use the categorization to find all person articles, and I can use the articles that already have a known sex to see how reliable the method is. (But in the meantime I see no reason not to do the bot job this request is about).Byrial (talk) 16:33, 11 June 2013 (UTC)[reply]
Comment I have removed, Michele, Joe, and José; 128,552 items remain on the list. If no one is opposed, can we move forward with this? --Magnus Manske (talk) 19:49, 10 June 2013 (UTC)[reply]
- bump. --Magnus Manske (talk) 10:16, 13 June 2013 (UTC)[reply]
- still waiting for further comments, too... --Ricordisamoa 17:58, 13 June 2013 (UTC)[reply]
Please remove the items below from the list. It is either not persons (Christian Science!), more than one person, pen names for women, or just women with a typical man's name:
- en:The Wachowskis - Q195719
- en:Vivian Stuart - Q260262
- en:Christian Science - Q624477
- en:Marie-Azélie Guérin Martin - Q2180087
- en:Ellen Wood (author) - Q2493772
- en:Damiete Charles-Granville - Q3666372
- en:Frank and Doris Hursley - Q5490567
- en:James Harmon Brown and Barbara Esensten - Q6135589
- en:Raymond Fernandez and Martha Beck - Q928926
- en:Jennell Jaquays - Q4117080
- en:Roger Arliner Young - Q7357808
- en:Dan D. Yang - Q5213337
- en:Steve Chadwick - Q7612185
- en:Ian Galliguez - Q5981606
- en:Michael Hyatt - Q281951
- en:Daniel Lesueur - Q5217915
- en:Gabriel Hayes - Q5515670
- en:Christian Beranek - Q5109329
- en:Marie-Azélie Guérin Martin - Q2180087
- en:Dan Poncet - Q5214215
- en:Michael Field (author) - Q839369
- en:Henry Cow - Q1474555
- en:Arthur Loves Plastic - Q4799566
- en:Adam and Eve - Q58701
- en:Sergius and Bacchus - Q140013
- en:Thomas Dean Donnelly and Joshua Oppenheimer - Q281088
- en:James and Oliver Phelps - Q343954
- en:Tom and Ray Magliozzi - Q680922
- en:Jonathan Davis and the SFA - Q825290
- en:Epipodius and Alexander - Q934123
- en:Richard and Maurice McDonald - Q1029178
- en:Jimmy Jam and Terry Lewis - Q1063111
- en:Peter and Gordon - Q1256943
- en:John Morrison and The Miz - Q1321581
- en:Giacomo and Giovanni Battista Tocci - Q1525945
- en:William and Mary - Q1947603
- en:Billy and Chuck - Q2066327
- en:Tom Petty and the Heartbreakers - Q2117272
- en:Wendy and Richard Pini - Q2151017
- en:Luigi Beltrame Quattrocchi and Maria Corsini - Q2271829
- en:Daniel and Miguel Falcon Græsdal - Q2817299
- en:Billy and Bobby Mauch - Q2903680
- en:Paul London and Brian Kendrick - Q2943282
- en:George and Elizabeth Peckham - Q3051246
- en:Scott Alexander and Larry Karaszewski - Q3218068
- en:John Brancato and Michael Ferris - Q3308192
- en:Jeffrey Price and Peter S. Seaman - Q3376859
- en:Star Names: Their Lore and Meaning - Q3934810
- en:Peter and the Wolf (band) - Q4046719
- en:Christian and Joseph Cousins - Q4241280
- en:Charles and Lee-Lee Chan - Q4261937
- en:Adam and Joe - Q4680021
- en:Alan and Michael Perry - Q4708127
- en:Andrew Nicholls and Darrell Vickers - Q4758123
- en:Bill and Imelda Roche - Q4911491
- en:Christopher and Kevin Graves - Q5113486
- en:Dan and Frank Carney - Q5214633
- en:Fernando and Nefty Sallaberry - Q5444973
- en:James Berg and Stan Zimmerman - Q6129702
- en:Jim and Mary McCartney - Q6199106
- en:John Boy and Billy - Q6222736
- en:John Littleton and Kate Vogel - Q6244990
- en:John Lloyd Cruz's awards and recognitions - Q6245054
- en:John Whitfield Bunn and Jacob Bunn - Q6263868
- en:Jonathan Aibel and Glenn Berger - Q6272409
- en:Louis Alvarez and Andrew Kolker - Q6686622
- en:Mark Fergus and Hawk Ostby - Q6767587
- en:Paul and Gaëtan Brizzi - Q7154558
- en:Richard and Esther Shapiro - Q7330162
- en:Thomas Jefferson and slavery - Q7791294
- en:Christopher Markus and Stephen McFeely - Q12422250
- en:Paul and Mattheus Brill - Q12857485
- en:Baschet Brothers - Q2897501
- en:Farrelly brothers - Q262337
- en:Russo brothers - Q2853003
- en:Boulting brothers - Q3181105
- en:Robert brothers - Q5931812
- en:Ian Fairbrother - Q5981493
- en:La Villa brothers - Q6465811
- en:Spierig brothers - Q7577128
- en:The Dear & Departed - Q1154243
- en:The Crying Boy - Q2461258
Byrial (talk) 11:40, 14 June 2013 (UTC)[reply]
- I will not add sex or gender (P21) if the title contains
\band\b
. --Ricordisamoa 18:01, 14 June 2013 (UTC)[reply]- I guess "brothers" and "sisters" should also be discarded as likely to pose problems. Pichpich (talk) 19:51, 14 June 2013 (UTC)[reply]
- (edit conflict) It is not as simple as that. Articles like en:Charles Wood (singer and actor) and en:James Lynch (bishop of Kildare and Leighlin) and en:Raymond Asquith, 3rd Earl of Oxford and Asquith should have sex or gender (P21). Besides there may be no English title, or articles about more people may be named in some other way (typical with brothers/sisters/twins/family or whatever). It is hard to find out if items with P107 (P107): person (Q215627) is about one or more persons, and we don't even have a property to indicate that as far as I know. The pairs can/should use has part(s) (P527) though if it survives the actual deletion request. Byrial (talk) 20:11, 14 June 2013 (UTC)[reply]
- But it's ok to skip entries that are potentially problematic. There are multiple bots adding P21 statements with completely different methods. If this one is likely to fail when "and" occurs in the title, it's much better to leave those to other bots rather than trying to take an educated guess. No statement is better than a wrong statement. Pichpich (talk) 13:29, 16 June 2013 (UTC)[reply]
- I agree that skipping a few "good" ones is OK in this case. Also, if you could omit pages with titles containing
\bthe\b
? --Magnus Manske (talk) 20:43, 16 June 2013 (UTC)[reply]- OK, I'll do. Thanks, --Ricordisamoa 01:19, 17 June 2013 (UTC)[reply]
It's been two weeks today, let's get this started? --Magnus Manske (talk) 14:29, 25 June 2013 (UTC)[reply]