Networks of reader and country status: an analysis of Mendeley reader statistics

View article
PeerJ Computer Science

Introduction

Bibliometrics is not only a mature research field, which develops advanced indicators for research evaluation purposes, but also a research field, which studies patterns in science. The best method for studying these patterns is bibliometric networking or science mapping. Here, bibliometric data are used to generate networks of citation relations (e.g., between scholarly journals), networks of co-authorships (e.g., between highly-cited researchers in information science), or networks of co-occurrence relations between keywords, words in abstracts and/or words in titles (e.g., co-occurrence relations between words in abstracts of papers published in information science) (Van Eck & Waltman, 2014). Powerful computers have led to the analysis of large networks, which may include the whole Web of Science (WoS) database from Thomson Reuters (Milojević, 2014). Today, these networks are not only of interest for specialists in bibliometrics or networking, but also for stakeholders from publishers, research institutions, and funding agencies. According to Martin, Nightingale & Rafols (2014) “network and science-mapping visualizations have considerably enhanced the capacity to convey complex information to users. These tools are now sufficiently mature to be used not only available in academia but also in consultancy and funding organisations” (p. 4). Overviews of publications dealing with networking and mapping have been published, for example, by Börner, Sanyal & Vespignani (2007), Leydesdorff (2014), and Mingers & Leydesdorff (2015).

In recent years, altmetrics has developed to a popular research field in bibliometrics (Bornmann, 2014). Altmetrics counts and analyzes views, downloads, clicks, notes, saves, tweets, shares, likes, recommends, tags, posts, trackbacks, discussions, bookmarks, and comments to scholarly papers. Altmetrics data reflect different kinds of research impact which has been demonstrated, for example, in the case of Mendeley readership data for social sciences and humanities (Mohammadi & Thelwall, 2013; Mohammadi & Thelwall, 2014; Sud & Thelwall, in press). Mendeley readership data are essentially bookmarking data. For the sake of simplicity, we refer to the Mendeley data only as reader counts. Because it is not clear, what altmetrics counts really measure, most of the studies in this field have calculated the correlation between altmetric counts and citation counts (Bornmann, 2015). A substantial positive correlation points to a certain, but otherwise undefined, meaning of altmetrics in a scientific context.

Similar to bibliometric data, altmetric data can not only be used for research evaluation purposes, but also for network analysis and science mapping. Kraker et al. (2014) presented a methodology and prototype for creating knowledge domain visualizations based on readership statistics (from Mendeley). Haunschild & Bornmann (2015) generated a readership network which is based on Mendeley readers per (sub-)discipline for a large dataset of biomedical papers.

In this study, we investigate Mendeley readership data for all articles and reviews in WoS where a DOI (digital object identifier) was available from 2012 with the following research questions:

  1. Are there differences and similarities between disciplines in bookmarking papers?

  2. How do researchers in different career stages differ in terms of bookmarking papers? Which groups of researchers read similar or different papers?

  3. Researchers from which countries read papers? Are there patterns of similar readership between specific countries?

We address these questions by studying the network nature of the Mendeley readership data. For this purpose, we generate three different networks: (1) the network of disciplinary affiliations can show similarities of and differences in the readerships of papers. (2) The status group network shows which status groups (e.g., students, lecturers, or professors) commonly read papers (or not). (3) The country network focuses on global readership patterns: similar and different readings of papers are visualized at the country level.

Methods

Dataset used

During December 11–23, 2014, Mendeley readership statistics for nA = 1,133,224 articles and nR = 64,960 reviews were requested via the Application Programming Interface (API), which was made available in 2014, using HTTP GET requests from R (http://www.r-project.org/). An example of the R script is available at http://dx.doi.org/10.6084/m9.figshare.1335688. All papers studied here were published in 2012. The publication year is a compromise of taking a rather recent publication year because Mendeley was founded in 2009 and allowing enough time after publication for reader counts to aggregate. However, as Mendeley reader counts are known to accumulate much faster than citation counts (Maflahi & Thelwall, in press), we feel justified using the publication year 2012.

The DOIs of the papers in the samples were obtained from the in-house database of the Max Planck Society (MPG) based on the WoS and administered by the Max Planck Digital Library (MPDL). The DOI was used to identify the papers in the Mendeley API. The Mendeley reader counts of 1,074,407 articles (94.8%) and 62,771 reviews (96.6%) were retrieved via the Mendeley API. These percentages are higher than those reported in other studies (Haustein & Larivière, 2014b). The papers which were matched via their DOI in the Mendeley API (n = 1,137,178) are analyzed in the remainder of this study. In total, we recorded 9,352,424 reader counts for articles and 1,335,764 reader counts for reviews.

It is optional for the users of Mendeley to provide their disciplinary affiliations (selecting from predefined sub-disciplines) and location. However, Mendeley does not provide the possible values of country names1 in the API. Therefore, we used the ISO (International Organization for Standardization) names (see http://countrycode.org) as possible values. Out of the 237 countries we could not find any contributions from 59 countries. However, we are not able to distinguish between a country value which is not possible and a paper with no readers from this country. For example, one is less surprised to find no reader counts for countries like Holy See (Vatican City) than for Singapore.

We retrieved 1,572,240 reader counts (16.8%) for articles and 212,693 reader counts (15.9%) for reviews where the users shared their location information. Country-specific readership information was available for 558,221 (49.3%) articles and 42,935 (66.1%) reviews. The academic status seems to be a mandatory piece of information, as the total number of Mendeley readers found agrees with the status-specific readership information. The self-assigned sub-discipline is not mandatory but most Mendeley users provide it in our sample set. Only 4,924 (0.05%) of the Mendeley article readers and 531 (0.04%) review readers did not share their (sub-) disciplinary affiliation.

Software and statistics

The data was organized at three levels of aggregation:

  1. groups of individual readers who bookmark the papers, in terms of disciplinary affiliations;

  2. groups of readers in terms of their professional status (Professor, PhD student, postdoc, etc.);

  3. groups of readers in terms of their countries as provided by Mendeley readers in their profile.

The Mendeley bookmarking can be considered as referencing, and then the analysis of this Mendeley data is analogous to bibliographic coupling in bibliometrics (Kessler, 1963). Although being analogous to bibliographic coupling, the bookmark coupling provides different kinds of information in comparison to bibliographic coupling: first, bibliographic coupling is based on the references in the paper, while Mendeley reader counts are similar to times cited data and thus reflect the citing-side (reader-side) perspective. Second, bibliographic coupling captures only authors of papers which are indexed in a citation index. There is no necessary relationship between authoring and reading papers: some people read more literature and author few papers or write more monographs.

Bookmark couplings also capture users of Mendeley who author fewer papers or publish in journals which are not indexed in popular citation indices. However, bookmark coupling has another bias, as not everyone uses Mendeley to bookmark papers. Both methods (bibliographic and bookmark coupling) are interesting to analyze networks of publications. They complement each other.

In each of the three analyses, the largest component is extracted, and further analyzed using the community finding algorithm of Blondel et al. (2008). Pajek is used for the network analysis. Default values were used during construction and analysis of all networks. All reader counts are weighted equally (Pajek option “unweighted”), and each network connection is counted as a single co-bookmarking event. The results are visualized using VOSviewer.

Results

Statistical parameters

The three networks (disciplinary affiliations, professional status, and countries) presented below are compared in terms of network statistics in Table 1. The network among the 13 status groups is fully connected; but we will discuss the relative weights of the relations in the following. The other two networks are very different in nature, despite the seeming similarity in some of these parameters.

Table 1:
Statistics of the full networks of disciplinary affiliations, countries, and status groups.
Statistical parameter Disciplinary affiliation Country Status group
Number of vertices 465 178 13
Average degree 243.45 76.02 13.00
Degree centralization 0.47 0.54 0.00
Density 0.53 0.43 1.00
Closure 0.72 0.70 1.00
Average distance 1.48 1.58 1.00
Standard deviation of average distance 0.50 0.51 0.00
Diameter 3 3 1
Compactness 0.76 0.71 1.00
Modularity 0.25 0.02 0.00
DOI: 10.7717/peerj-cs.32/table-1

Disciplinary affiliations

Among the disciplinary affiliations, 470 could be distinguished in this data, of which 465 (98.94%) form a largest component. The five affiliations which were not connected are: “Judaism”, “Catholicism”, “Transport Law”, “Entertainment, Sports and Gaming Law”, and “Air and Space Law”. These five affiliations belong to the humanities (theology and law, respectively). We found a total of three reader counts for “Judaism” and one reader count each for the other four disconnected disciplinary affiliations. Only very few researchers in these disciplines seem to use Mendeley. Similar results have been reported by Jeng, He & Jiang (2015) who reported that they “did not see many group users from the humanities and other related fields” (p. 898).

The 465 affiliations in the main component can be sorted into four groupings by the community-finding algorithm of Blondel et al. (2008); the modularity—a measure for the quality of the clustering between zero and one—is Q = 0.25 (cf. Table 1). The four groups are, respectively:

  1. 256 affiliations mainly in the social sciences and the humanities (Fig. 1);

  2. 71 affiliations in the bio-medical sciences (Fig. 2);

  3. 84 affiliations in the natural sciences and engineering (also included in Fig. 2);

  4. 54 affiliations in biology and the geo-sciences (not shown separately).

Zahedi & Van Eck (2014) have found similar results. They reported that Mendeley users are most active in the biomedical sciences, life sciences, and social sciences. Figure 1 shows 256 sub-discipline affiliations of Mendeley readers (group 1) with their connections in the social sciences and humanities. The network shown in Fig. 1 also includes some reading in the computer sciences and mathematics. The relation seems to be via cognitive psychology, artificial intelligence, etc. The humanities are positioned more at the periphery of this set. The sub-disciplines “Taxation law” and “German language” are not directly connected to this sub-group, but nevertheless sorted into it by the community-finding algorithm. The number of readers providing bookmarks to these sub-disciplines is low.

256 affiliations, mainly in the social sciences and the humanities (group 2).

Figure 1: 256 affiliations, mainly in the social sciences and the humanities (group 2).

This figure can be web-started at http://www.vosviewer.com/vosviewer.php?map=http://www.leydesdorff.net/mendeley/fig1_map.txt&network=http://www.leydesdorff.net/mendeley/fig1_net.txt&n_lines=10000.
71 affiliations in the bio-medical sciences (group 2 in yellow) and 84 affiliations in the natural sciences and engineering (group 3 in pink).

Figure 2: 71 affiliations in the bio-medical sciences (group 2 in yellow) and 84 affiliations in the natural sciences and engineering (group 3 in pink).

Figure 2 shows 71 sub-discipline affiliations in the bio-medical sciences (group 2) and 84 sub-discipline affiliations in the natural sciences and engineering (group 3). We do not show the links in order to keep the distinction between the two sets of nodes (with different colors) focal to the visualization. A version with the network links visible can be web-started from http://www.vosviewer.com/vosviewer.php?map=http://www.leydesdorff.net/mendeley/fig2_map.txt&network=http://www.leydesdorff.net/mendeley/fig2_net.txt&n_lines=10000

It is somewhat surprising to see the sub-disciplines “regional law” and “Latin” sorted into the network of mainly bio-medical sciences in Fig. 2. As the links in the web-started version show, these bookmarks have many links to several sub-disciplines within the bio-medical network.

Figure 3 visualizes the entire network of sub-disciplinary affiliations. It shows that the core set is occupied by readers who characterize themselves as “miscellaneous” readers from different disciplines such as “biology miscellaneous”, “environmental science miscellaneous”, etc. The social sciences (“miscellaneous”) are one among these reading communities. The humanities, however, are placed more in the periphery. The algorithmically generated distinctions among the four groups (using Blondel et al., 2008) cannot be clearly distinguished using this projection, because the domains overlap when projected on a two-dimensional plane. The figure is therefore based on the mapping of VOSviewer in this case. This figure can be web-started from http://www.vosviewer.com/vosviewer.php?map=http://www.leydesdorff.net/mendeley/fig3_map.txt&network=http://www.leydesdorff.net/mendeley/fig3_net.txt&n_lines=1000.

Four communities (colors) of affiliations among co-bookmarking readers.

Figure 3: Four communities (colors) of affiliations among co-bookmarking readers.

Status hierarchy

Mendeley users have to assign one of 14 predefined status groups to themselves. Some of these professional status groups seem redundant, such as “Student PhD”, “Student Post-Graduate”, and “Doctoral Student”. Any merging or regrouping of these status groups, however, would be a rather arbitrary choice (Haustein & Larivière, 2014a). We analyze the Mendeley reader counts in the status groups as provided by Mendeley and discuss these issues in the light of the results.

Figure 4 shows that a common interest in papers is mainly shared among PhD students, Master’s students, and postdocs. Other studies confirm the dominant position of these groups (Zahedi, Costas & Wouters, 2014). In this study, researchers at academic institutions follow, but less so when compared with researchers at non-academic institutions. Lecturers and Senior Lecturers are less involved than Professors. Librarians hardly participate in this network. Note that this network is not modularized at all (Q = 0.0, cf. Table 1). All groups are fully connected to all (13) other groups.

Network of co-readers in terms of professional status.

Figure 4: Network of co-readers in terms of professional status.

Table 2 shows eigenvector centralities of the different status groups among networked Mendeley users (Bonacich, 1972). Groups with high eigenvector centrality (in this case, students) are more central, because they share their interests in publications with many other groups, while recursively taking into account the (eigenvector) centrality of these other groups (De Nooy, Mrvar & Batagelj, 2011). However, the very high eigenvector centrality is probably to some extend due to the fact that students (especially PhD and Master) and postdocs form by far the largest status groups. This is in agreement with previous studies which also found that students and postdocs represent the largest user status groups at Mendeley (Bornmann & Haunschild, 2015; Mohammadi et al., 2015). Senior Lecturers—a group with the lowest eigenvector centrality—seem to be interested in publications different from the other status groups. However, the eigenvector centrality is strongly influenced by the absolute number of reader counts. The Spearman rank correlation coefficient between eigenvector centrality and reader counts is 0.986.

Table 2:
Eigenvector centralities and absolute number of reader counts (N) of different status groups among networked Mendeley users (using the Hubs & Authorities routine in Pajek).
Status group Eigenvector centrality N
Student PhD 0.45 3,364,144
Student master 0.39 1,514,606
Post doc 0.34 1,148,860
Researcher at an academic institution 0.30 667,995
Doctoral student 0.29 616,738
Student bachelor 0.28 678,839
Student post-graduate 0.25 482,784
Assistant professor 0.23 409,591
Researcher at a non-academic institution 0.23 444,874
Full professor 0.22 378,685
Associate professor 0.20 316,606
Lecturer 0.10 126,848
Librarian 0.06 77,046
Senior lecturer 0.05 63,345
DOI: 10.7717/peerj-cs.32/table-2

Note that the status indication may be different among nations. For example, the ranks of “Assistant Professor” and “Lecturer” are virtually non-existent in some countries. On the other side, ranks such as “Reader” (sometimes different from “Lecturer”) and “Habilitand”2 are not covered by the Mendeley classification system. The data suggests that Mendeley readers in the career stages “Reader” and “Habilitand” assign the status “Assistant professor” to themselves, as this is the highest populated among the professorship categories. Furthermore, some status groups seem redundant, e.g., “Doctoral student”, “Student post-graduate”, and “Student PhD”. However, most Mendeley readers who are working on a doctoral thesis identify themselves as “Student PhD”.

Decomposition in terms of nations

Among the 200+ countries in the world, 178 countries are indicated among the readership of Mendeley that actively bookmarked records in this database. These countries are all connected with an average degree of 76.02 which means that on average each node is linked to 76 (42.7%) other nodes in the network of 178 nodes. The density of the network is 0.43 (cf. Table 1). The eigenvector centralities of the countries vary only between 0.055 and 0.077. This small variation of eigenvector centrality between countries is probably due to the high connectivity of the countries although there is a large variation of reader counts from 1 (Liberia) to 396,198 (USA).

The community-finding algorithm distinguishes four groups. However, the modularity among these four groups is low (Q = 0.02, cf. Table 1) because of cross-group network connections:

  1. A group of 53 nations that are core to the scientific enterprise, including Russia and China as well as two thirds of the OECD countries (Fig. 5). The OECD member states Chile, Greece, Iceland, Mexico, New Zealand, Norway, Portugal, Slovak Republic, Slovenia, and Turkey are not part of this group. They are part of the second group.

  2. A largest group of 115 nations centered around Brazil and India (Fig. 6).

  3. A group of ten small nations with “Niger” and “Nigeria” as the central core (not shown).

  4. The smallest group with only “Guinea” and “Guinea Bissau” (not shown).

Group of 53 nations.

Figure 5: Group of 53 nations.

The unlabeled circles next to the UK and the US indicate the Netherlands and Spain, respectively. The unlabeled circle between Russia and Hong Kong is the Czech Republic. A version with all labels visible can be web-started from http://www.vosviewer.com/vosviewer.php?map=http://www.leydesdorff.net/mendeley/fig5_map.txt&network=http://www.leydesdorff.net/mendeley/fig5_net.txt&n_lines=10000&label_size=1.0&label_size_variation=0.34.
115 countries in the second group of nations.

Figure 6: 115 countries in the second group of nations.

Figures 5 and 6 show the country groups 1 and 2. A version of Fig. 6 can be web-started at http://www.vosviewer.com/vosviewer.php?map=http://www.leydesdorff.net/mendeley/fig6_map.txt&network=http://www.leydesdorff.net/mendeley/fig6_net.txt&n_lines=10000&label_size=1.0&label_size_variation=0.34. As in the case of Fig. 5, one can run mapping and clustering of the subsets in VOSviewer for obtaining more details.

Discussion

Networks are one of the most important and popular methods to analyze bibliometric data. In this study, we explored whether Mendeley data can also be successfully used as a data source for network analysis. Only a few attempts have been made up to now to analyze the rich Mendeley data using network analyses techniques. It is a great advantage that the data can be retrieved for comprehensive publication sets using an API. Thus, one can download readership data on a large scale, which is very suitable for network analyses. We encourage other researchers to use Mendeley data for larger publication sets in order to inspect usage structure of publications (Gunn, 2014).

The Mendeley readership networks were generated by using different types of user information: their (1) disciplinary affiliation (2) professional status, and (3) country. All three information sources can be used to produce meaningful network results. In terms of disciplines, first, we found four groups: (1) biology, (2) social science and humanities (including relevant computer science), (3) bio-medical sciences, and (4) natural science and engineering. In all four groups, the category with the addition “miscellaneous” prevails. Probably, the readers who identify themselves with cross-disciplinary research interests are more inclined to generate these “bookmark couplings” than more specifically specialized readers. The pronounced position of the social sciences and the humanities was not expected. Some sub-disciplines, e.g., “Judaism” and “Catholicism”, are disconnected from the other sub-disciplines.

The decomposition in terms of status hierarchies within the network makes clear that this hierarchy is inversed in Mendeley. The lead among the users is taken by students working on theses. More than professionals, students have time to explore the literature beyond their specialization. Lecturers and Senior Lecturers entertain a different reading pattern, given their primary tasks in education. Librarians make use of Mendeley (and scholarly literature) differently from researchers. Students—having the highest absolute number of reader counts—also have the highest eigenvector centrality in the network which indicates that they have a strong bookmark coupling when compared with other status groups (e.g., Lecturer or Librarian). The calculated eigenvector centralities correlate strongly with the absolute number of observed reader counts.

The decomposition in terms of nations highlights the worldwide divide between developed and less-developed nations. A similar prevailing divide was recently also found in portfolio analysis of journal literature by Leydesdorff, Heimeriks & Rotolo (in press). More fine-grained delineations can partially be recognized as regional, but could not always be provided with an obvious interpretation.

The academic status information is provided by every Mendeley user and nearly every Mendeley user provides (sub-) discipline information. Surprisingly, the vast majority of Mendeley readers assign the miscellaneous sub-discipline of their main discipline to themselves. Only a minority of Mendeley users seems to provide their location. This makes it more difficult to analyze the reader counts broken down by countries. Some Mendeley academic status groups seem redundant (e.g., Doctoral student and Student PhD), while others seem to be tailored to the British (e.g., Lecturer and Senior lecturer) or the US system (e.g., Assistant professor and Associate professor). It is not clear to which extent Mendeley users assign the precise sub-discipline, status, and location information to themselves and whether they update this information regularly. Despite these shortcomings of the Mendeley classification system and the quality of information the users provide, the network analyses of Mendeley reader counts from three different perspectives produced interesting insights in readership patterns. This shows that useful network analysis can be performed using Mendeley readership counts.

Conclusions

In this study, we analyzed Mendeley readership data of a set of 1,074,407 articles and 62,771 reviews with publication year 2012 to generate three different networks: (1) The network based on disciplinary affiliations of Mendeley readers contains four groups: (i) biology, (ii) social sciences and humanities (including relevant computer sciences), (iii) bio-medical sciences, and (iv) natural sciences and engineering. In all four groups, the category with the addition “miscellaneous” prevails. (2) The network of co-readers in terms of professional status shows that Mendeley is mainly shared among PhD students, Master’s students, and postdocs. (3) The country network focusses on global readership patterns: it identifies a group of 53 nations that are core to the scientific enterprise, including two thirds of the OECD countries as well as Russia and China.

The country names in the Mendeley web frontend are standardized. The user provides the city name and Mendeley proposes different city–country combinations from which the user can choose.
“Habilitand” is a status in German-speaking countries for those working on a “Habilitation” as a second PhD which provides teaching rights in the university.
12 Citations   Views   Downloads