Wikipedia:Wikipedia Signpost/2015-04-08/Op-ed
We are drowning in promotional artspam
I used to think of myself as an inclusionist. I used to write articles. I still do, certainly. However, I recently came to a sad realization that I am spending less and less time creating new content, and more and more deleting things.[1] Let me tell you a slightly worrisome story of how this came to be. From 2007 I have been regularly monitoring the list of new articles related to WikiProject Poland. This started as a (moderately successful) attempt to recruit people for WikiProjects I am involved in. Over time I sought to automate this process (reviewing all of those articles and reacting to them can take several hours each week). To this end I developed a few templates. At first, they were only invitations – to WikiProjects, DYKing, such. But looking at them now, a big chunk of my tools are paste-in prods for "ARTicles that are merely SPAM" (aka "advertisements masquerading as articles", ADMASQ), most commonly in the categories for biographies and companies/products. It is a sad testament of what I thought I would need (rewards and words of encouragement) and what I ended up needing (in essence, words of discouragement). I haven't kept specific numbers, but for the past few years, at least, each week I have to prod/AfD articles, whereas I use my WikiProject/DYK invitations maybe once or twice.
Not all of my deletion nominations come from the new article reports. In fact, if I was just limiting myself to those, I would not be here, calling for your attention. Few year ago I started to realize that many of the articles I prod/AfD share similar topics. What's common? Biographies (primarily artists failing WP:CREATIVE). Music bands, songs, and tours failing WP:MUSIC. But I can stomach them, perhaps it's what remains of my inclusionist sentiment – I will prod those articles with no mercy, but the poor fame-starved artists are not whom I want to draw your attention to. No, we have a bigger problem, or – perhaps, a fatter, juicier and more problematic target. Those of you following The Signpost for a while know well the recurring theme of paid editors and promotional advertising of products and companies. I personally don't have a problem with paid editing if our policies and guidelines are respected. Unfortunately, they – namely, Wikipedia:Notability and its child-guidelines- are not. I would go as far as to say that in fact they are rampantly disregarded. They are disregarded by WP:VANITY-seeking individuals, but even more so, by those creating articles about products and companies (and here I sadly have to concede that majority of such articles are almost certainly a work of people who were paid to create them).
What I am looking at right now are several categories: Category:Business software, Category:Websites, Category:Law firms, Category:Internet marketing companies, Category:E-commerce. They are gateways to many related categories, and I estimate that they are filled with lots of spammy articles. Let me now define spam in the context of this op-ed as advertisements masquerading as articles (in short, artspam) rather than external links spamming. The latter is more easily identifiable through automated tools, and Wikipedia:WikiProject Spam and others seem to be managing it well enough, as far as I can tell. What I am concerned with is the former: articles that fail notability criteria, aiming to promote a certain topic, not (only) through biased wording, but through their very existence ("I/we/our product is/are on Wikipedia, hence we are important/respectable/famous/encyclopedic").
I said, now, that those categories are filled with lots of artspam. By that I estimate that between 25% to 75% of entries in them would not survive PROD/AfD. And those are not the worst categories; I am afraid they represent an average of hundreds of categories related to companies and certain types of products (websites, software, etc.). After a while – having reviewed hundreds of such articles – you learn to recognize patterns. Few are created by editors active across numerous topics. Most are the work of single purpose accounts; either ones focused on a single article, or a group of them. A small percentage are so bad they classify for near-speedy deletion (zero references, for example) – but those are rare, as the proverbial low-hanging fruit of deletionists they don't survive long. Through just few days ago I stumbled upon an unreferenced product stub from 2005, so... Distressingly, in the last year or so I have noticed a significant proportion (<20%) of problematic entries as having passed through Articles for Creation process or similar.
This leads me to conclusion that (as observed by some prior research on the subject) many Wikipedians (even myself) often pass quick assessments of articles by looking at the reference list. If there are many references (bonus points for being formatted), we check the article as "probably ok" and move on. This is a problem, because while understandable (we are all busy), many sources fail the reliability requirements, while others mention the company just in passing (notability requires in-depth coverage) and this is a trick that artspammers have learned to use against us – and it appears, very successfully. Most of companies and product pages I nominate for deletion have several, if not dozens of inline references. Many are to their own pages (in other words – self-published), but quite a few are masked better. It is quite common for slightly smarter artspammers to use other websites – such services are cheaply offered by various PR companies, who maintain extensive portals filled with dime-a-dozen press releases such as PRWeb, many of them are distributed through news sites and appear in search engine results, giving them a surface appearance of legitimacy.
Here's a case study. "www.reuters.com/article/" looks nice, until you notice the literal small print: "Reuters is not responsible for the content in this press release". The article about the associated product, Faircoin, had been deleted twice so far. Those articles are often rich mines of bad sources: I have seen everything from twitter, youtube, facebook, irrelevant awards (another PR trick), to numerous blogs and the myriad of low-key promotional websites masquerading as professional press. Such websites might sport names that imply reliability, but usually are quite WP:QUESTIONABLE. Those, at some point, transition into reputable sources (magazines maintained by professional associations), but with tens of thousands of websites out there, it's a pain to figure out which are good and which are bad (i.e. ones that do fact checking and have editorial oversight from ones that will publish anything for few dollars); we desperately need more initiative like the few found in Category:WikiProject lists of online sources. For now, however, thousands of articles about organizations or products linger in the mainspace, sustained by nothing but false appearance of being well referenced, well defined as nothing but numerous. Even worse, we have even worse articles – ones that clearly sport no reliable references (usually referencing their own websites), or ones with no references at all. Notability may be "just" a guideline, but Wikipedia:Verifiability is a policy. Yet it is a policy with patchy enforcement, and numerous artspam entries survive happily with no reference to speak of.
What's the scope of this problem? Category:All articles with topics of unclear notability has about 63,000 entries, but less than 20% (and that's a generous estimate) of articles I prod/AfD have it; ditto for the nearly 18,000 of articles with a promotional tone. Of course, not all categories have similar levels of artspam, but I am afraid that we are looking at a number of up to, maybe, 300,000 such articles. Now, this is a napkin type calculation, based on extrapolating from the few, very rough, statistics presented here ("if out of five artspam articles, only one is tagged as such, and we have about 60,000 tagged ..."). Yes, I am well aware not everything with a notability tag on it will fail notability once some research is done, but if, let's say, just about a half will, then the napkin equation ends up with 150,000. That's something like 3% of our total articles. Even if I am grossly exaggerating this, and we just have few thousand entries to clean up, this is a significant number – and there's no way the few of us working on this can make any sizeable dent in this amount of artspam. Worse, I am afraid we are losing – our backlog in just notability topics goes seven years and the one for promotional tone is about the same. If you think that we are doing better with unreferenced content, the backlog for Category:Articles lacking sources goes back to 2006 and lists over 200,000 entries (including over 2,000 in Category:All unreferenced BLPs)![2]
This shouldn't come as a surprise. Artspam, by its very definition, is about things nobody else cares about; it is advertising. Neither experienced editors nor newbies visit such pages often. They are underlinked, hidden in the dusty corners of our project, with the scope of the issue only visible on few cleanup backlogs, or during category reviews. Many die early, when they are spotted by a recent change patrollers, but those that survive the first few weeks can feel pretty secure, particularly if (counter-intuitively) they were created by a SPA whose further actions won't draw scrutiny to their prior creations. In short, by their lack of encyclopedic value and obscurity they become the proverbial bugs not seen by many eyeballs. And so they linger, bloating numerous categories which are quietly becoming little but business and product listings with little concern for notability.
Enough is enough, I say. It is spring of 2015. Wikipedia has been gravitating towards a vehicle for business and product promotion for too long. We need a major artspam cleanup drive, a literal purge of promotional articles, and a push for development of tools and frameworks to stem the tide of such articles in the future. Perhaps something similar to Wikipedia:WikiProject Unreferenced Biographies of Living Persons, an effort which a few years back cut down the number of unreferenced BLPs from 50,000 or so by more than a tenfold.
Either way, it is high time for some spring cleaning. Please help out, go to a category for a business type or product of your choice, and start enforcing notability, with fire. Prod an artspam each day, and save this project, before we become a Yellow Pages clone, with a small encyclopedia attached to it.
- Piotr Konieczny is a Polish sociologist at Hanyang University in South Korea, specializing in Internet studies and wikis. He edits Wikipedia as Piotrus.
- The views expressed in this op-ed are those of the author alone; responses and critical commentary are invited in the comments. Editors wishing to submit their own op-ed should look at our opinion desk.
Notes
- ^ In my last 500 edits to Wikipedia namespace, over 70% are in the AfD space. Going back in time, those numbers are 50% for 2014, 45% for 2013, 12.5% in 2012, 11% in 2011, 10% in 2010... well, that's enough, the trend should be clear, and few care about my editing patterns.
- ^ And no, it's not just from the recent weeks – I saw at least one dated to 2012 – but that failing or WP:BLP enforcement is probably a topic for its own opinion piece...
Discuss this story
Being driven away from Wikipedia
Artspam - bad neologism?
Various comments
Bad example - notable company?
See Article content does not determine notability:
Notability is a property of a subject and not of a Wikipedia article. If the subject has not been covered outside of Wikipedia, no amount of improvements to the Wikipedia content will suddenly make the subject notable. Conversely, if the source material exists, even very poor writing and referencing within a Wikipedia article will not decrease the subject's notability.
User:Fred Bauder Talk 10:22, 12 April 2015 (UTC)[reply]
It would help determine notability for some legal firms if we had better articles on the history of the legal profession in America. (I just looked at American Bar Association, & except for the date of its founding, & a few controversial events during its existence, there is nothing about its history: not about the challenges it faced getting established, not about its relationship with state bar associations, not even an explanation why the ABA was founded. So if a given legal firm was notable for, say, improving the quality of lawyer training & professionalism, there is no way to objectively determine it.) And I suspect this weakness exists for many other professions: accountants, doctors, engineers, etc. Having links from articles where notability has been proven (or is self-evident) helps disinterested editors to believe that a given article is notable. -- llywrch (talk) 20:09, 13 April 2015 (UTC)[reply]
Why shall we care? Let Capitalism do its job
Why don't we look at the issue from another point of view? Applying the WP:DGAF principle, why should we give a shit for all these pitiful businesses? By wasting our time on filtering the "notability" we ourselves create a problem for ourselves, by supporting an illusion that having a wikipedia article will make you exclusive and rich. This is capitalism, right? Let the competitors kick each other's ass, nominate each other for deletion, clean the hype, double check the references, etc. If competition does not give rat's arse, then why must we care? Don't we have better things to do? Why deleting the article "Mike Bob's Banana Shack" will save the world? Staszek Lem (talk) 22:21, 10 April 2015 (UTC)[reply]
Guys, I have done my share of deletionism myself and got scolded for that, because I was targeting software tools and companies, and stepped on many toes of wikipedian's statistical majority. My post here is an attempt to rationally questioning this approach and NOT defend paid advocacy. We are severely understaffed and spending our best time in policing wikipedia, and I am just brainstorming new approaches to the problem. So instead of going ballistic and politically-correct, why don't we discuss the suggestion in a civilized way. Staszek Lem (talk) 01:12, 11 April 2015 (UTC)[reply]
Staszek Lem, are you arguing for free market fundamentalism? I'm curious how that has ever solved a single problem on the planet. As far as I can tell from my reading of history, it creates more problems than it solves. Viriditas (talk) 02:47, 11 April 2015 (UTC)[reply]
different suggested directions
One way we could go is to to include even small companies and all schools, all local churches, all people running for political office, all active researchers, all productive artists. We would need to would like to remove the provision that WP is not a directory. The only valid reason for keeping this fundamental rule, besides that of looking like a traditional encyclopedia, is that we have not been able to do this while still having objective coverage. To me a provision for tightening the rules of inclusion is a very unfortunate compromise--I would rather we ourselves had the people to write proper entries for these organizations. We might perhaps do this by writing systematic basic articles in advance, as we do for places,, but we would still have to monitor content. I do not think this a likely direction at present; the sympathy is in the other direction.
If we are not going to be all-inclusive, we need a basis for selection. The current basis is a combination of one factor: what happens to interest people here and what people are willing and knowledgable enough to argue for effectively--a factor which causes our well-known distortions in coverage, and the second factor: the availability of easy to find online sources that meet the artificial requirements of WP:GNG which does not match any normal concept of notability that makes sense to anyone outside WP. The result is very susceptible to manipulation by paid editors, who can write specious sourced articles, and which will survive through indifference unless we happen to catch the stupider among them as sockpuppets. This is not in my opinion an acceptable direction to pursue, and I suggest that we need to stop following it: any alternative would be better.
Personally, I would prefer to follow a direction by which we have true objective standards of real-world importance, based upon achecklist, or a quantitative factor. This would have to be specialized for each topic, and also modified at a geographical level: the size of a business that is notable in a small country would not be notable in a large one. I think this would be quite difficult to do with organizations, and though I can think of criteria for various businesses in the US, it would be quite difficult to specify in a way which would get general acceptance. I think the current mood is against this,but I think the real problem is agreeing on a set of specification that would involve years of inconclusive debating between the people who like different subjects -- and let us not fool ourselves, Like It/Don't Like It is the basis of most afd discussions, though anyone with experience here knows how to word it otherwise. The one potential merit of the GNG is that it is supposed to eliminate this effect, because the standards apply to everything. But they don't, in practice,, because the amount of easy to find publicity is different in different areas, and, curiously enough, they just happen to favor those areas that most people here are interested in.
The increasing attractiveness of WP for paid editing has upset this balance. I think it essential to differentiate an encyclopedia form an advertising medium--if advertising is what one want to find, google et al. already give a n extremely effective way to find it, especially taking into account the advertising placements that is the basis for their financial success.
The most effective way to address it, and the one I currently suggest, is essentially the same as the one Pietrus suggests also: to adjust the application of the key phrases in the GNG "substantial coverage" and " third-party independent sources" . I think the best approach is to say that material only about funding or acquisitions is non-substantial for the purposes of notability, though certainly reliable; and that local news stories and interviews are intrinsically non-independent; and that a trade interview with a person is not even reliable for anything other than what they would like us to believe. (As an example , we were was able to get the principle accepted that local book reviews of local authors are unreliable for notability--no one really challenges that one.) If we were to do it formally, we need to think much more about the wording. Along with Pietrus, I don't thing doing it formally is the best idea: we will spend more time trying to find a consensus than we do in removing the AdSpam.
The way to proceed is informal: to individual propose higher standards at the discussion of each article. This iwill only work if people pay a great deal more attention to AfD than any of us have been doing. There are all too many articles being passed because not enough people bother to comment. There are now about 100 AfDs a day: If 100 people make a point on commenting on 4 or 5 a day, we'll get a good representative opinion. (Like Pietrus, I hope the discussions will generally go the way we think they ought to--but if they do not, it is because not enough people want to support removing advertising, and against such indifference, WP is helpless. DGG ( talk ) 04:48, 11 April 2015 (UTC)[reply]
Unreferenced BLPs
Just to clarify, the WP:URBLP project didn't reduce it "ten-fold", but it completely eliminated the known backlog at the time. The project was created and completed under admin and User:Jimbo Wales endorsed threats of bulk deletion, regardless of notability. Not a nice way to encourage the elimination of a backlog. And, whilst some of it was cleared by topic specialists or wikiprojects, a lot/majority was done by a very small number (10-20?) of extremely diligent editors who just kept going and going. The number of users willing to do that sort of thing is minimal, as can be seen by most of the huge backlogs we have.
All articles in that cat now have been tagged since it was cleared (some have been backdated), and similar to the discussions above about notability or quality of references, many of them are incorrectly tagged because some users don't recognise anything not in ref or cite tags as being a reference. As an example, most, if not all, of the 193 NFL related articles tagged as BLPunreferenced would have a link to the NFL or a similarly reliable stats website in their infobox or external links section. (and this isn't a place to argue about the low notability bar for sportspeople, that's WT:NSPORTS.)
But as it has risen from 0 to many thousand, it does show that the tagging of problem articles is only part of the solution, finding the "lost/hidden" articles is also critical. The-Pope (talk) 05:15, 15 April 2015 (UTC)[reply]
Appearance of articles
threefour articles shown look different from anything I've ever seen. Two have their titles in red. I've also never seen the article described as "sub class" or "unassesed" at the top.— Vchimpanzee • talk • contributions • 18:14, 14 April 2015 (UTC)[reply]Sources for company notability
I've been reflecting on business analysis/journalism lately and I think a big issue here is that the underlying culture of business coverage is antithetical to building a neutral encyclopaedia. Businesses generally have no interest in neutral coverage, and typically actively try to prevent that or block access to company materials that would form the basis of such coverage.
I actually think that many businesses (like the law firms in the article) are important parts of our culture and society that don't get enough neutral coverage to display their worth (or in a way that our encyclopaedia can interpret). Businesses of all sizes sit at the heart of our communities (whether we like that or not) and are of much greater importance than plenty of other topics we allow coverage of (such as footballers who played one professional game fifty years ago...). Sadly, the lack of good source material ultimately leads to company articles being a controlled expression of brand image, rather than a true reflection of how they affect our communities every day. SFB 00:13, 17 April 2015 (UTC)[reply]
Problem is communication