Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 09 September 2021

The effects of remote work on collaboration among information workers

  • Longqi Yang   ORCID: orcid.org/0000-0002-6615-8615 1 ,
  • David Holtz   ORCID: orcid.org/0000-0002-0896-8628 2 , 3 ,
  • Sonia Jaffe   ORCID: orcid.org/0000-0001-8924-0294 1 ,
  • Siddharth Suri   ORCID: orcid.org/0000-0002-1318-8140 1 ,
  • Shilpi Sinha 1 ,
  • Jeffrey Weston 1 ,
  • Connor Joyce 1 ,
  • Neha Shah 1 ,
  • Kevin Sherman   ORCID: orcid.org/0000-0001-5793-3336 1 ,
  • Brent Hecht   ORCID: orcid.org/0000-0002-7955-0202 1 &
  • Jaime Teevan   ORCID: orcid.org/0000-0002-2786-0209 1  

Nature Human Behaviour volume  6 ,  pages 43–54 ( 2022 ) Cite this article

472k Accesses

224 Citations

3000 Altmetric

Metrics details

  • Business and management

An Author Correction to this article was published on 05 October 2021

This article has been updated

The coronavirus disease 2019 (COVID-19) pandemic caused a rapid shift to full-time remote work for many information workers. Viewing this shift as a natural experiment in which some workers were already working remotely before the pandemic enables us to separate the effects of firm-wide remote work from other pandemic-related confounding factors. Here, we use rich data on the emails, calendars, instant messages, video/audio calls and workweek hours of 61,182 US Microsoft employees over the first six months of 2020 to estimate the causal effects of firm-wide remote work on collaboration and communication. Our results show that firm-wide remote work caused the collaboration network of workers to become more static and siloed, with fewer bridges between disparate parts. Furthermore, there was a decrease in synchronous communication and an increase in asynchronous communication. Together, these effects may make it harder for employees to acquire and share new information across the network.

Similar content being viewed by others

microsoft research papers

The effect of co-location on human communication networks

Daniel Carmody, Martina Mazzarello, … Carlo Ratti

microsoft research papers

The impact of COVID-19 on digital communication patterns

Evan DeFilippis, Stephen Michael Impink, … Raffaella Sadun

microsoft research papers

Accelerated demand for interpersonal skills in the Australian post-pandemic labour market

David Evans, Claire Mason, … Andrew Reeson

Before the COVID-19 pandemic, at most 5% of Americans worked from home for more than three days per week 1 , whereas it is estimated that, by April 2020, as many as 37% of Americans were working from home (WFH) full-time 2 , 3 . Thus, in a matter of weeks, the pandemic caused about one-third of US workers to shift to WFH and nearly every American that was able to work from home did so 4 . Many technology companies, such as Twitter, Facebook, Square, Box, Slack and Quora, have taken this shift one step further by announcing longer term and, in some cases permanent, remote work policies that will enable at least some employees to work remotely, even after the pandemic 5 , 6 . More generally, COVID-19 has accelerated the shift away from traditional office work, such that even firms that do not keep full-time remote work policies in place after the pandemic has ended are unlikely to fully return to their pre-COVID-19 work arrangements 7 . Instead, they are likely to switch to some type of hybrid work model, in which employees split their time between remote and office work, or a mixed-mode model, in which firms are comprised of a mixture of full-time remote employees and full-time office employees. For example, some scholars predict a long-run equilibrium in which information workers will work from home approximately 20% of the time 1 . For long-term policy decisions regarding remote, hybrid and mixed-mode work to be well informed, decision makers need to understand how remote work would impact information work in the absence of the effects of COVID-19. To answer this question, we treat Microsoft’s company-wide WFH policy during the pandemic as a natural experiment that, subject to the validity of our identifying assumptions, enables us to causally identify the impact of firm-wide remote work on employees’ collaboration networks and communication practices.

Previous research has shown that network topology, including the strength of ties, has an important role in the success of both individuals and organizations. For individuals, it is beneficial to have access to new, non-redundant information through connections to different parts of an organization’s formal organizational chart and through connections to different parts of an organization’s informal communication network 8 . Furthermore, being a conduit through which such information flows by bridging ‘structural holes’ 9 in the organization can have additional benefits for individuals 10 . For firms, certain network configurations are associated with the production of high-quality creative output 11 , and there is a competitive advantage to successfully engaging in the practice of ‘knowledge transfer,’ in which experiences from one set of people within an organization are transferred to and used by another set of people within that same organization 12 . Conditional on a given network position or configuration, the efficacy with which a given tie can transfer or provide access to novel information depends on its strength. Two people connected by a strong tie can often transfer information more easily (as they are more likely to share a common perspective), to trust one another, to cooperate with one another, and to expend effort to ensure that recently transferred knowledge is well understood and can be utilized 10 , 13 , 14 , 15 . By contrast, weak ties require less time and energy to maintain 8 , 16 and are more likely to provide access to new, non-redundant information 8 , 17 , 18 .

Our results show that the shift to firm-wide remote work caused business groups within Microsoft to become less interconnected. It also reduced the number of ties bridging structural holes in the company’s informal collaboration network, and caused individuals to spend less time collaborating with the bridging ties that remained. Furthermore, the shift to firm-wide remote work caused employees to spend a greater share of their collaboration time with their stronger ties, which are better suited to information transfer, and a smaller share of their time with weak ties, which are more likely to provide access to new information.

Previous research has also shown that the performance of workers is affected not only by the structure of the network and the strength of their ties, but also by the temporal dynamics of the network. Not only do the benefits of different types of ties vary with their age 19 , but people also benefit from changing their network position 20 , 21 , 22 , adding new ties 23 , 24 and reconnecting with dormant ties 25 . We find that the shift to firm-wide remote work may have reduced these benefits by making the collaboration network of workers more static—individuals added and deleted fewer ties from month-to-month and spent less time with newly added ties.

Existing theoretical perspectives and empirical results suggest that knowledge transfer and collaboration are also affected by the modes of communication that workers use to collaborate with one another. On the theoretical front, media richness theory 26 , 27 posits that richer communication channels, such as in-person interaction, are best suited to communicating complex information and ideas. Moreover, media synchronicity theory 28 proposes that asynchronous communication channels (such as email) are better suited for conveying information and synchronous channels (such as video calls) are better suited for converging on the meaning of information. There is also a rich body of empirical research that documents the myriad implications of communication media choice for organizations. For example, previous research has shown that establishing a rapport, which is an important precursor to knowledge transfer, is impeded by email use 29 , and that in-person and phone/video communication are more strongly associated with positive team performance than email and instant message (IM) communication 30 .

Remote work obviously eliminates in-person communication; however, we found that people did not simply replace in-person interactions with video and/or voice calls. In fact, we found that shifting to firm-wide remote work caused an overall decrease in observed synchronous communication such as scheduled meetings and audio/video calls. By contrast, we found that remote work caused employees to communicate more through media that are more asynchronous—sending more emails and many more IMs. Media richness theory, media synchronicity theory and previous empirical studies all suggest that these communication media choices may make it more difficult for workers to convey and/or converge on the meaning of complex information.

There is a large body of academic research across multiple disciplines that has studied remote work, virtual teams and telecommuting (see ref. 31 for a review of much of this work), including previous research studies that examined the network structure of virtual teams and how individual network position in virtual teams correlates with performance 32 , 33 , 34 . During the COVID-19 pandemic, there has been renewed public and academic interest in how virtual teams function. Recent analyses of telemetry and survey data show that the pandemic has affected both the who and the how of collaboration in information firms—while working remotely during the pandemic, workers are spending less time in meetings 35 , communicating more by email 35 , collaborating more with their strong ties as opposed to their weak ties 36 , and exhibiting patterns of communication that are more siloed and less stable 37 . However, these analyses, like much of the previous research on remote work, virtual teams and telecommuting, are non-causal 31 and are therefore unable to separate the effects of remote work from the effects of pandemic-related confounding factors, such as reduced focus due to COVID-19-related stress or increased caregiving responsibilities while sheltering in place. Although previous research on the causal effects of remote work does exist, this work has mainly studied employees who volunteer to work remotely, and has focused on settings such as call centres and patent offices 38 , 39 where, relative to the majority of information work, tasks are more easily codifiable and are less likely to depend on collaboration or the transfer of complex knowledge.

In this article, we contribute to the research literatures on remote work, virtual teams and telecommuting by analysing the large-scale natural experiment created by Microsoft’s firm-wide WFH policy during the COVID-19 pandemic. As remote work was mandatory during the pandemic, we are able to quantify the effects of firm-wide remote work, which are most relevant for firms considering a transition to an all-remote workforce. Furthermore, as our model specification decomposes the overall effects of firm-wide remote work into ego remote work and collaborator remote work effects, our results also provide some insight into the possible impacts of remote work policies such as mixed-mode work and hybrid work.

We analysed anonymized individual-level data describing the communication practices of 61,182 US Microsoft employees from December 2019 to June 2020—data from before and after Microsoft’s shift to firm-wide remote work (our data on workers’ choice of communication media goes back only to February 2020). Our sample contains all US Microsoft employees except for those who hold senior leadership positions and/or are members of teams that routinely handle particularly sensitive data. Given the scope of our dataset, the workers in our sample perform a wide variety of tasks, including software and hardware development, marketing and business operations. For each employee, we observe (1) their remote work status before the COVID-19 pandemic, and what share of their colleagues were remote workers before the COVID-19 pandemic; (2) their managerial status, the business group they belong to, their role and the length of their tenure at Microsoft as of February 2020; (3) a weekly summary of the amount of time spent in scheduled meetings, time spent in unscheduled video/audio calls, emails sent and IMs sent, and the length of their workweek; and (4) a monthly summary of their collaboration network. Before the COVID-19 pandemic, managers at Microsoft used their own discretion in deciding whether an employee could work from home, which was the exception rather than the norm.

The natural experiment that we analysed came from the company-wide WFH mandate Microsoft enacted in response to COVID-19. On 4 March 2020, Microsoft mandated that all non-essential employees in their Puget Sound and Bay Area campuses shift to full-time WFH. Other locations followed suit and, by 1 April 2020, all non-essential US Microsoft employees were WFH full-time. Before the onset of the pandemic, 18% of US Microsoft employees were working remote from their collaborators. For this subset of employees, the shift to firm-wide remote work did not cause a change in their own remote work status, but did induce variation in the share of their colleagues who were working remotely. For the remaining 82% of US Microsoft employees, the shift to firm-wide remote work induced variation in both their own remote work status and in the remote work status of their coworkers.

We analysed this natural experiment using a modified difference-in-differences (DiD) model. Standard DiD is an econometric approach that enables researchers to infer the causal effect of a treatment by comparing longitudinal data from at least two groups, some of which are ‘treated’ and some of which are not. Provided that the identifying assumptions of the DiD model are satisfied, the causal effect of the treatment is obtained by comparing the magnitude of the gap between the treated and untreated groups after the treatment is delivered with the magnitude of the gap between the groups before the treatment is delivered. Our modified DiD model extends the standard DiD model by estimating the causal effects of changes in two different treatment variables (one’s own remote work status and the remote work status of one’s colleagues) and by introducing additional identifying assumptions such that it is possible to draw causal inferences in the presence of an additional shock (in our case, the non-WFH-related aspects of COVID-19) that affects both treated and untreated units, and is concurrent with the exogenous shock(s) to our treatment variables. The time series trends shown in Fig. 1 suggest that the identifying assumptions of our modified DiD model are plausible; further details on the model are provided in the Methods .

figure 1

a – d , The average number of bridging ties per month ( a , c ) and the average unscheduled video/audio call hours per week ( b , d ) for different groups of employees, relative to the overall average in February. These plots establish the plausibility of the ‘parallel trends’ assumption that is required by our modified DiD model. The error bars show the 95% CIs and are in some places thinner than the symbols in the figure; s.e. values are clustered at the team level. a , b , The graphs show employees who, before COVID-19, worked from the office (blue; n  = 50,268) and a matched sample of employees who worked remotely (orange; n  = 10,914). c , d , The graphs show two subgroups of the blue lines in a and b —employees who, before COVID-19, had less than 10% of their collaborators working remotely (dashed; n  = 36,008) and those who had more than 50% of their coworkers working remotely (dotted; n  = 1,861). Both variables were normalized by subtracting and dividing by the average across the entire sample of that variable in February. Most employees transitioned to WFH during the week of 1 March 2020, although our analysis omits the month of March as a transition period.

In all of the analyses that follow, we cannot report the actual level of our outcome variables due to confidentiality concerns. Instead, throughout the paper we report outcomes and effects in terms of February value (FV)—the average level of that variable (for example, number of bridging ties) for all US employees in February.

Effects of remote work on collaboration networks

We start by presenting the non-causal time-series trends for different collaboration network outcomes across our entire sample. These trends provide insights into how work practices have changed during the COVID-19 pandemic, and also represent the type of data that many executives may use when making decisions regarding their firm’s long-term remote work policy.

Descriptive statistics

Figure 2 shows the average monthly time series for various aspects of workers’ collaboration egocentric (ego) networks from December 2019 to June 2020: the number of connections, the number of groups interacted with, the number of and share of time with cross-group connections, the number and share of time with bridging connections, the clustering coefficient, the share of time with weak connections, the number of churned and added connections, and the share of time with added connections. Mathematical definitions for these measures are provided in the Methods . Although we did not find evidence of a clear pattern of change around the shift to firm-wide remote work for many of these measures, we did observe large changes in the average shares of monthly collaboration hours spent with cross-group ties, bridging ties, weak ties and added ties, which all decreased precipitously between February and June.

figure 2

a – k , The monthly averages for the collaboration network variables for all employees relative to the February average. Each variable was normalized by subtracting and dividing by the average FV for that variable. The vertical bars show the 95% CIs, but are in most places not much taller than the data points; s.e. values are clustered at the team level. The variables are employees’ average number of network ties ( a ), distinct business groups in which they have a collaborator ( b ), cross-group ties ( c ), ties that bridge structural holes in the network ( e ), individual clustering coefficient ( g ), collaborators from the previous month that they did not collaborate with that month ( i ) and added collaborators they did not collaborate with the previous month ( j ), as well as the share of time spent with cross-group ties ( d ), bridging ties ( f ), weak ties ( h ) and added ties ( k ). n  = 61,279 for each panel.

Causal analysis

We next used our modified DiD model to isolate the effects of firm-wide remote work on the collaboration network, which are shown in Fig. 3 . Although we found no effect on the number of collaborators that employees had (the size of their collaboration ego network), we did find that firm-wide remote work decreased the number of distinct business groups that an employee was connected to by 0.07 FV ( P  < 0.001, 95% confidence interval (CI) = 0.05–0.10). Firm-wide remote work also decreased the cross-group connections of workers by 0.04 FV ( P  = 0.008, 95% CI = 0.01–0.07) and the share of collaboration time workers spent with cross-group connections by 0.26 FV ( P  < 0.001, 95% CI = 0.23–0.29). In other words, firm-wide remote work caused an overall decrease in the number of cross-group interactions and the fraction of attention paid to groups other than one’s own.

figure 3

The estimated causal effects of both an employee and that employee’s colleagues switching to remote work on the number of collaborators an employee has, the number of distinct groups the employee collaborates with, the number of cross-group ties an employee has, the share of time an employee spends collaborating with cross-group ties, the number of bridging ties an employee has, the share of time an employee spends collaborating with bridging ties, the individual clustering coefficient of an employee’s ego network, the share of time an employee spent collaborating with weak ties, the number of churned collaborators, the number of added collaborators and the share of time spent with added collaborators. The reported effects are ( β  +  δ ) from equation ( 1 ), normalized by dividing by the average level of that variable in February. The symbols depict point estimates and the lines show the 95% CIs. n  = 61,182 for all variables. The full results are provided in Supplementary Tables 1 and 2 .

Although formal organizational boundaries shape informal interactions 40 , the formal organization of firms and their informal social structure are two distinct, interrelated concepts 41 . Connections that provide access to diverse teams may not bridge structural holes in the network sense 9 , and connections that bridge structural holes in the network sense may not provide access to different parts of the formal organizational chart. We therefore also analysed how the shift to firm-wide remote work affected the structural diversity of employees’ ego networks with respect to the firm’s observed communication network, as opposed to the formal organizational chart. We label each tie as ‘bridging’ or ‘non-bridging’ on the basis of its local network constraint, which is a measure of the extent to which a given tie bridges structural holes in a network 9 , 42 . We then measured the effect of firm-wide remote work on the number of bridging ties that each worker had and the amount of time that each worker spent with their bridging ties. We found that, on average, firm-wide remote work decreased the number of bridging ties by 0.09 FV ( P  < 0.001, 95% CI = 0.06–0.13) and the share of time with bridging ties by 0.41 FV ( P  < 0.001, 95% CI = 0.35–0.47). The fact that firm-wide remote work caused workers to have fewer bridging ties, and to spend less time with their remaining bridging ties, suggests that firm-wide remote work may have reduced the ability of workers to access new information in other parts of the network. These results, in conjunction with our finding that firm-wide remote work reduced workers’ cross-group interactions, also suggest that firm-wide remote work caused the collaboration network to become more siloed, both in a formal sense and in an informal sense.

We also found that firm-wide remote work caused a 0.06 FV ( P  = 0.005, 95% CI = 0.02–0.10) increase in the individual clustering coefficient, which provides a measure of what proportion of an individual’s network connections are also connected to each other (the higher a person’s individual clustering coefficient, the more dense their ego network). Given the fact that we did not observe a statistically significant effect of remote work on the number of colleagues with whom workers collaborate, this result suggests that, on average, firm-wide remote work caused workers to substitute ties that were not connected to one another for those that were. In other words, different portions of the network, which became less interconnected, also became more intraconnected.

The ability of a worker to effectively access knowledge from other parts of an organization is a function of not only the organizational and/or topological diversity of their connections, but also the strength of those connections. For each month, we classified ties as strong when they were in the top 50% of an employee’s ties in terms of hours spent communicating, and as weak otherwise. Although we have not seen strong and weak ties defined in this exact way elsewhere in the research literature on social networks, the research community has not, to our knowledge, converged on a standard way to measure tie strength. Our operationalization is similar to a common tie strength definition that simply counts the amount of contact between ties 43 , 44 , 45 and allows tie strength to vary over time on the basis of the relative amount of contact between two people 46 . Also, it is consistent with Granovetter’s original notion that tie strength is determined by a combination of “the amount of time, the emotional intensity, the intimacy (mutual confiding) and the reciprocal services which characterize the tie” 8 .

Although weak ties by definition will always get less of an employee’s time than strong ties in a given month, we found that the shift to remote work reduced the share of time that workers spent collaborating with weak ties by 0.32 FV ( P  < 0.001, 95% CI = 0.29–0.35). As the median is just one possible cut-off to distinguish between strong and weak ties, we also analysed the entire distribution of collaboration time for each worker and confirmed that the average ego-level-normalized Herfindahl–Hirschman index (HHI) 47 of the collaboration time is increased by remote work, and that the average ego-level Shannon entropy 48 of collaboration time is decreased by remote work. The effects of firm-wide remote work on both of these outcomes are provided in Supplementary Table 2 . In total, these results indicate that, above and beyond the impact of firm-wide remote work on the organizational and structural diversity of workers’ ego networks, the shift to firm-wide remote work also made the allocation of workers’ time more heavily concentrated.

We also found that the shift to firm-wide remote work caused workers’ ego networks to become more static; firm-wide remote work reduced the number of existing connections that churned from month-to-month by 0.05 FV ( P  = 0.006, 95% CI = 0.02–0.09), and decreased the number of connections workers added month-to-month by 0.04 FV ( P  = 0.015, 95% CI = 0.01–0.07). Furthermore, the shift to firm-wide remote work decreased the share of time that workers spent collaborating with the connections they did add by 0.29 FV ( P  < 0.001, 95% CI = 0.24–0.34). Of the added ties we observed in June 2020, 40% existed in at least one month between January 2020 and May 2020, whereas the remaining 60% did not. This suggests that the added ties that we observed are a mixture of dormant ties 25 and ties that are truly new. Overall, the changes that we observed in the temporal dynamics of ego networks may have made it more difficult for workers to capture the benefits associated with forming new connections 23 , 24 , reconnecting with dormant connections 25 and modulating their network position 20 , 21 , 22 . These results are robust to the use of alternative definitions of added and deleted ties (full details are provided in the Supplementary Information ).

In summary, our results suggest that firm-wide remote work ossified workers’ ego networks, made the network more fragmented and made each fragment more clustered. We tested for heterogeneity in the effects of the shift to firm-wide remote work on collaboration ego networks with respect to a worker’s managerial status (manager versus individual contributor), tenure at Microsoft (shorter tenure versus longer tenure) and role type (engineering versus non-engineering), and did not find meaningful heterogeneity across any of these dimensions (Supplementary Figs. 1 , 2 and 4 ).

The effects of remote work on the use of communication media

In addition to estimating the effects of firm-wide remote work on workers’ collaboration networks, we also estimated the impact of firm-wide remote work on workers’ choice of communication media.

Figure 4 shows the non-causal time-series trends for workweek hours and different communication media outcomes across our entire sample. Detailed definitions for each of these outcomes are provided in the Methods . For unscheduled call hours, meeting hours, total video/audio hours and IMs sent, we observed considerable increases around the time of the switch to firm-wide remote work; these increases are sustained through our data timespan. The change in email volume is much smaller and shorter-lived. Figure 4f shows the change in workweek hours, a metric that measures the total amount of time between the first observed work activity and the last observed work activity on each work day in a given week. Although there was a sustained increase in workweek hours, it was too small to account for the large increases that we observed in the use of various communication media without a simultaneous shift in the way that employees were conducting work.

figure 4

a – f , The weekly averages for each variable, relative to the February average. Each variable was normalized by subtracting and dividing by the average FV for that variable. The vertical bars show the 95% CIs, but are in most places not much taller than the data points; s.e. values are clustered at the team level. The variables are the employees’ average number of unscheduled audio/video call hours ( a ), scheduled meeting hours ( b ), total hours in scheduled meetings and unscheduled calls (the sum of a and b ) ( c ), IMs sent ( d ), emails sent ( e ), and hours between the first and last activity (sent email, scheduled meeting, or Microsoft Teams call or chat) in a day, summed across the workdays ( f ). The dips in all six metrics during the weeks of 16 February, 24 May and 14 June were due to four-day workweeks, in observance of Presidents’ Day, Juneteenth and Memorial Day, respectively. n  = 61,279 for all variables.

Figure 5 shows the estimated causal effects of firm-wide remote work on the amount of communication conducted through different media, as well as the length of workers’ workweeks. Relative to the baseline case of all coworkers working in an office together, we found that firm-wide remote work decreased scheduled meeting hours by 0.16 FV ( P  < 0.001, 95% CI = 0.13–0.19) and increased unscheduled video/audio call hours by 1.6 FV ( P  < 0.001, 95% CI = 1.5–1.8). The increase in unscheduled calls was more than offset by the decrease in scheduled meeting hours. To observe that, we defined the sum of unscheduled call hours and scheduled meetings hours as the synchronous video/audio communication hours. We estimate that firm-wide remote work caused a slight decrease of 0.05 FV ( P  = 0.006, 95% CI = 0.01–0.08) in the total amount of synchronous video/audio communication. Given that, by definition, a shift to firm-wide remote work causes in-person interactions to drop to zero and synchronous video/audio communication decreased overall, our results also indicate that firm-wide remote work led to a decrease in the total amount of synchronous collaboration, both in-person and through Microsoft Teams.

figure 5

The estimated causal effects of both an employee and their colleagues switching to remote work on the employee’s hours spent in scheduled meetings, hours spent in unscheduled calls, the sum of meetings and call hours, IMs sent, emails sent and estimated workweek hours. The reported effects are ( β  +  δ ) from equation ( 1 ), normalized by dividing by the average level of that variable in February. The symbols depict point estimates and lines depict 95% CIs. n  = 61,182 for all variables. The full results are provided in Supplementary Table 3 .

Although firm-wide remote work caused a decrease in synchronous communication, it also caused an increase in the amount of asynchronous communication. Firm-wide remote work increased the number of emails sent by workers by 0.08 FV ( P  < 0.001, 95% CI = 0.05–0.12) and the number of IMs sent by workers by 0.50 FV ( P  < 0.001, 95% CI = 0.46–0.55). Firm-wide remote work also increased the average number of workweek hours by 0.10 FV ( P  < 0.001, 95% CI = 0.09–0.11); however, this effect is small relative to the effect on IM volume. This suggests that the increase in IMs reflects a change in workers’ collaboration patterns while working, as opposed to changes in how much workers were working. The fact that shifting to firm-wide remote work increased the number of workweek hours also makes the negative effect of firm-wide remote work on synchronous collaboration more notable. The increase in workweek hours could be an indication that employees were less productive and required more time to complete their work, or that they replaced some of their commuting time with work time; however, as we are able to measure only the time between the first and last work activity in a day, it could also be that the same amount of working time is spread across a greater share of the calendar day due to breaks or interruptions for non-work activities.

Heterogeneous effects of firm-wide remote work on communication media choice

Although the effects of firm-wide remote work on collaboration networks did not exhibit heterogeneity across the worker attributes that we observed, the effects of firm-wide remote work on communication media were in some cases larger for managers and engineers. We found that the switch to firm-wide remote work caused larger increases for managers than individual contributors in IMs sent, emails sent and unscheduled video/audio call hours (Fig. 6 , left). This is probably because, relative to individual contributors, a larger share of managers’ time is dedicated to communicating with others, that is, their direct reports (for example, to address issues blocking progress or conduct performance reviews), and representatives of other groups within the organization (for example, to coordinate activity and goals across different groups). We also find that the shift to firm-wide remote work caused larger increases for engineers than non-engineers in the number of IMs sent and the number of unscheduled call hours (Fig. 6 , right). This may be reflective of the fact that software development teams are particularly reliant on informal communication 49 , 50 , 51 , much of which may have taken place in-person before the shift to firm-wide remote work. We did not find meaningful heterogeneity with respect to employee tenure at Microsoft.

figure 6

The causal effects, estimated separately for managers ( n  = 9,715) and individual contributors (ICs) ( n  = 51,467) (left) and engineers (n = 29, 510) and non-engineers ( n  = 31,672) (right), of an employee and their colleagues switching to remote work on hours spent in scheduled meetings, the sum of scheduled meetings and unscheduled call hours, IMs sent, emails sent and estimated workweek hours ( a ), and hours spent in unscheduled calls ( b ). The reported effects are ( β  +  δ ) from equation ( 1 ), normalized by dividing by the average level of that variable for all employees in February. The symbols depict point estimates and the lines show the 95% CIs. The full results are provided in Supplementary Tables 8 , 9 , 22 and 23 .

Decomposing the effects of firm-wide remote work

One benefit of our empirical approach is that it enables us to decompose the causal effects of firm-wide remote work into two components: the direct effect of an employee working remotely on their own work practices (ego effects) and the indirect effect of all an employee’s colleagues working remotely on that employee’s work practices (collaborator effects). The model is linear, so the predicted effects from having half of one’s collaborators switch to remote work would be half as large.

Figure 7 shows the ego and collaborator effects of firm-wide remote work on people’s collaboration networks. Notably, the remote work status of an employee and that employee’s collaborators both contributed to the total effect of firm-wide for most network outcomes. An employee’s collaborators switching to remote work seems to have had a particularly large impact on the amount of time that workers spent with ties that are most likely to provide access to new information, that is, cross-group ties, bridging ties, weak ties and added ties. As seen in Fig. 8 , collaborator effects also dominate ego effects when we decomposed the effects of firm-wide remote work on communication media usage. More than half of the increase in IMs sent and emails sent was due to collaborators switching to remote work, and approximately 90% (+0.09 FV, P  < 0.001, 95% CI = 0.07–0.10) of the increase in workweek hours was due to collaborators switching to remote work. Overall, we found that collaborators switching to remote work caused workers to spend less time attending to sources of new information, communicate more through asynchronous media and work longer hours. Looking to the future, these findings suggest that remote work policies such as mixed-mode and hybrid work may have substantial effects not only on those working remotely but also on those remaining in the office.

figure 7

The estimated causal effects of either an employee ( δ from equation ( 1 )) or their colleagues ( β from equation ( 1 )) switching to remote work on the number of collaborators that an employee has, the number of distinct groups the employee collaborates with, the number of cross-group ties an employee has, the share of time an employee spends collaborating with cross-group ties, the number of bridging ties an employee has, the share of time an employee spends collaborating with bridging ties, the individual clustering coefficient of an employee’s ego network, the share of time an employee spent collaborating with weak ties, the number of churned collaborators, the number of added collaborators and the share of time spent with added collaborators. All effects were normalized by dividing by the average level of that variable in February. The symbols depict point estimates and the lines show the 95% CIs. n  = 61,182 for all variables. The full results are provided in Supplementary Tables 1 and 2 .

figure 8

The estimated causal effects of either an employee ( δ from equation ( 1 )) or their colleagues ( β from equation ( 1 )) switching to remote work on hours spent in scheduled meetings, the sum of scheduled meetings and unscheduled call hours, IMs sent, emails sent and estimated workweek hours ( a ), and hours spent in unscheduled calls ( b ). All effects were normalized by dividing by the average level of that variable in February. The symbols depict point estimates and the lines show the 95% CIs. n  = 61,182 for all variables. The full results are provided in Supplementary Table 3 .

Our results suggest that shifting to firm-wide remote work caused the collaboration network to become more heavily siloed—with fewer ties that cut across formal business units or bridge structural holes in Microsoft’s informal collaboration network—and that those silos became more densely connected. Furthermore, the network became more static, with fewer ties added and deleted per month. Previous research suggests that these changes in collaboration patterns may impede the transfer of knowledge 10 , 12 , 13 and reduce the quality of workers’ output 11 , 23 . Our results also indicate that the shift to firm-wide remote work caused synchronous communication to decrease and asynchronous communication to increase. Not only were the communication media that workers used less synchronous, but they were also less ‘rich’ (for example, email and IM). These changes in communication media may have made it more difficult for workers to convey and process complex information 26 , 27 , 28 .

We expect that the effects we observe on workers’ collaboration and communication patterns will impact productivity and, in the long-term, innovation. Yet, across many sectors, firms are making decisions to adopt permanent remote work policies based only on short-term data 52 . Importantly, the causal estimates that we report are substantially different compared with the effects suggested by the observational trends shown in Figs. 2 and 4 . Thus, firms making decisions on the basis of non-causal analyses may set suboptimal policies. For example, some firms that choose a permanent remote work policy may put themselves at a disadvantage by making it more difficult for workers to collaborate and exchange information.

Beyond estimating the causal effects of firm-wide remote work, our results also provide preliminary insights into the effects of remote work policies such as mixed-mode and hybrid work. Specifically, the non-trivial collaborator effects that we estimate suggest that hybrid and mixed-mode work arrangements may not work as firms expect. The most effective implementations of hybrid and mixed-mode work might be those that deliberately attempt to minimize the impact of collaborator effects on those employees that are not working remotely; for example, firms might consider implementations of hybrid work in which certain teams come into the office on certain days, or in which most or all workers come into the office on some days and work remotely otherwise. Firms might also consider arrangements in which only certain types of workers (for example, individual contributors) are able to work remotely.

Although we believe these early insights are helpful, firms and academics will need to undertake a combination of quantitative and qualitative research once the COVID-19 pandemic has ended to better measure both the benefits and the downsides of different remote work policies. Large firms with the ability to collect rich telemetry data will be particularly well-positioned to build on the quantitative insights presented in this work by conducting large-scale internal field experiments. If published externally, these experiments could have the capacity to greatly further our collective understanding of the causal effects of both firm-wide remote work and other work arrangements such as hybrid work and mixed-mode work. Our results, which report both direct effects and indirect effects of remote work, suggest that such experimentation needs to be conducted carefully. Simply comparing the work practices and/or productivity levels of remote workers and office workers will likely yield biased estimates of the global treatment effects of different remote work policies, due to the causal effects of one’s colleagues working remotely. In conducting these experiments, it is crucial that firms use experiment designs that are optimized for capturing the overall effects of remote work policies, for example, graph cluster randomization 53 , 54 or switchback randomization 55 . Ideally, such field experiments would be complemented with high-quality qualitative research that can describe emergent processes and workers’ perceptions and, more generally, uncover insights beyond those that can be obtained through quantitative methods.

Our research is not without its limitations. First, our study characterizes the impacts of firm-wide remote work on the US employees of one major technology firm. Although we expect our results to generalize to other technology firms, this may not be the case. Caution should also be exercised in generalizing our results to other sectors and other countries. Second, the period of time over which we measured the causal effects of remote work are quite short (three months), and it is possible that the long-term effects of firm-wide remote work are different. For example, at the beginning of the pandemic, workers were able to leverage existing network connections, many of which were built in person. This may not be possible if firm-wide remote work were implemented long-term. Third, our analysis treats the effects of firm-wide remote work on peoples’ collaboration networks and communication media usage as separate, whereas these two types of effects may interact and exacerbate one another. Fourth, although we believe that changes to workers’ communication networks and media will affect productivity and innovation, we were unable to measure these outcomes directly. Even if we were able to measure productivity and innovation, the impacts of network structure and communication media choice on performance are likely contingent on a number of factors, including the type of task a given team/organization is trying to complete 56 , 57 , 58 , 59 . Finally, our ability to make causal claims is predicated on the validity of our modified DiD framework’s identifying assumptions: parallel trends, conditional exogeneity after matching and additively separable effects. Although we have taken steps to verify the plausibility of these assumptions and tested the robustness of our results to an alternative matching procedure 60 (details of which are provided in the Methods ), they are assumptions nonetheless.

There are multiple high-profile cases of firms such as IBM and Yahoo! enacting, but ultimately rescinding, flexible remote work policies before COVID-19, presumably due to the impacts of these policies on communication and collaboration 61 , 62 . On the basis of these examples, one might conclude that the current enthusiasm for remote work may not ultimately translate into a long-lasting shift to remote work for the majority of firms. However, during the COVID-19 pandemic, workers and firms have invested in the physical and human capital required to support remote work 63 and innovation has shifted toward new technologies that support remote work 64 . Both of these factors make it more likely that for many firms, some version of remote work will persist beyond the pandemic. In light of this fact, the importance of deepening our understanding of remote work and its impacts has never been greater.

Ethical review

This research was reviewed and classified as exempt by the Massachusetts institute of Technology (MIT) Committee on the Use of Humans as Experimental Subjects (that is, MIT’s Institutional Review Board), because the research was secondary use research involving the use of de-identified data.

Our data were passively collected and anonymized by Microsoft’s Workplace Analytics product 65 , which logs activity that takes place in employees’ work email accounts and in Microsoft Teams using de-identified IDs. Microsoft Teams is collaboration software that enables employees to video/audio call, video/audio teleconference, IM and share files. The use of the data is compliant with US employee privacy laws. Employee privacy restrictions in many countries prevent us from reporting on workers outside the US. However, an employee’s communication and collaboration with international coworkers is still included in the data and those employees are still counted as part of each employee’s network. No information on international coworkers except for counting interactions with US employees was obtained for research purposes or analysed. Microsoft provides employees with appropriate notice of its use of Workplace Analytics, and sets strict controls over the collection and use of such data.

In our collaboration network, each worker is a node. For a tie to exist between two workers in a given month, those two workers must have had at least one meaningful interaction through two out of the following four communication media: email, IM, scheduled meeting and unscheduled video/audio call. A meaningful interaction is an email, IM, scheduled meeting or unscheduled video/audio call with a group of size no more than eight.

In our analysis, we classify a worker as working remotely if more than 80% of their collaboration hours in a given month are with colleagues remote to them. For employees WFH, all of their colleagues are considered to be remote from them, whereas, for those in an office, colleagues are remote to them if those colleagues are WFH or are located on a Microsoft campus in a different city. After March 2020, all US Microsoft employees are by definition working remotely, as they are WFH.

Modified DiD model

Our modified DiD model extends the standard DiD model in two ways. First, rather than measuring the effect of changes in one treatment variable, our model measures the effects of changes in two different treatment variables—(1) whether an employee is working remotely and (2) whether that employee’s colleagues are working remotely—and assumes that these two effects are additively separable. Second, our model allows the variation in our treatment variables to be induced by one exogenous shock that affects all workers in our sample, but affects some workers differently compared with others. More specifically, although all Microsoft employees were affected by COVID-19, only some employees experienced changes in their remote work status and/or the share of their collaborators that were working remotely due to Microsoft’s company-wide WFH mandate during the pandemic.

We estimate the average treatment effect for the treated (ATT) of ego remote work and collaborator remote work on all outcome measures using the following specification:

where Y i t denotes the work outcome, α i is an employee fixed-effect, τ t is a month fixed effect, D i t indicates whether employee i was a treated employee forced to work remotely in month t , s i t is the share of employee i ’s coworkers who were working remotely in month t and ϵ i t denotes the error term. Observations are weighted using coarsened exact matching (CEM) weights, and standard errors are clustered at the level of an employee’s manager. We estimate this model using data from February, April, May and June 2020. We omitted March because workers were transitioning from office work to WFH beginning in the first week of the month.

Our ability to causally identify both ATTs is predicated on a number of identifying assumptions, some of which are standard in DiD analyses and some of which are specific to our research setting. First, we assume that, for both of our ‘treatment’ variables, the time series for ‘treated’ and ‘untreated’ workers would have evolved in parallel absent the treatment. Time-series trends for different subsets of the matched sample are compared in Fig. 1 . These comparisons suggest that, for both of our treatment variables, the DiD model’s parallel trends assumption is plausible, both when measuring the effect of the treatment on network measures (Fig. 1a,c ) and when measuring the effect of the treatment on communication media measures (Figs. 1b,d ). Analogous figures for our full set of outcome variables are provided in Supplementary Figs. 5 – 19 . In all cases, the time series appear to move in parallel both before the transition to remote work, and once the transition to remote work concluded, suggesting that this identifying assumption is reasonable.

Second, we assume strict exogeneity, that is, that the timing of the switch to remote work must be independent of employees’ outcomes. As the ‘treatment group’ was all switched to WFH due to COVID-19, we are less concerned about endogeneity of treatment than we might be in other settings. However, we do need to assume that workers’ remote work status before the pandemic and the percentage of workers’ colleagues that work remotely before the pandemic are independent of how they are affected by the pandemic. This assumption would be violated if, for example, those who worked remotely before the pandemic were less likely to have unforeseen childcare responsibilities from school closures caused by the pandemic. To make this identifying assumption more plausible, we use the CEM procedure described below. If we wanted to interpret the ATTs that we estimate from those employees that started WFH due to the pandemic as average treatment effects, we would also need to assume that, conditional on the CEM procedure described below, employees’ pre-pandemic remote work status and the percentage of colleagues working remotely were independent of the effects of ego remote work and collaborator remote work on their work outcomes.

Finally, we assume that ego remote work effects, collaborator remote work effects and non-remote-work-related COVID-19 effects are additively separable. More precisely, we assume that Y i t can be written as

where RW i t is a binary variable that indicates whether employee i is working remotely at time t , s i t is the share of employee i ’s collaborators working remotely in month t , C i t is a binary variable indicating whether employee i was subject to the COVID-19 pandemic at time t and Y i t (0, 0, 0) is worker i ’s outcome at time t if all three variables were equal to 0. This assumption is an extension of the standard DiD assumption that treatment effects, cross-group differences and time-effects are additively separable and would be violated if, for example, the effects of ego remote work and/or collaborator remote work were amplified in a multiplicative manner due to other aspects of the COVID-19 pandemic (for example, childcare responsibilities or pandemic-induced changes to Microsoft’s product roadmaps). With our data, we are unable to validate the plausibility of this important identifying assumption; however, it is worth noting that causal estimates produced by standard DiD models also rely on the validity of parametric assumptions 66 .

The results from our modified DiD specification for the full set of outcomes are provided in Supplementary Tables 1 – 3 . Throughout the main text, we refer to results as insignificant when two-sided P  >0.05.

We make our results more robust by estimating our DiD model using weights generated using CEM 67 . This reweighting means that we can relax the parallel trends and exogeneity assumptions described above to only be required conditional on employee characteristics. In other words, provided that any differences in how the two groups would have evolved in the absence of the pandemic or how they are affected by the pandemic are entirely explained by the employee characteristics we match on, then the CEM-based results are valid.

The CEM procedure works as follows. Each US Microsoft employee is assigned to a stratum on the basis of their role, managerial status, seniority level and tenure at Microsoft as of February 2020. For each employee i in a stratum s that contains a mixture of employees that were and were not remote before the COVID-19 pandemic, we construct a CEM weight according to the following formula:

where n O ( n R ) is the total number of non-remote (remote) employees before the COVID-19 pandemic, \({n}_{O}^{s}\) ( \({n}_{R}^{s}\) ) is the total number of non-remote (remote) employees before the COVID-19 pandemic in stratum s and O s ( R s ) is the set of non-remote (remote) employees before the COVID-19 pandemic in stratum s . The 97 (<0.2%) employees in strata without both non-remote and remote employees before the COVID-19 pandemic were discarded from our sample. The final remote:non-remote sample ratio is 1:4.6.

Treatment effect heterogeneity

We measured treatment effect heterogeneity with respect to tenure at Microsoft (shorter tenure versus longer tenure), managerial status (manager versus individual contributor) and role type (engineering versus non-engineering). To do so, we estimated the DiD model separately for each subgroup. Our treatment effect estimates for each combination of outcome and subgroup are provided in Supplementary Tables 4 – 23 .

Alternative matching procedure

To test the robustness of our analysis, we re-estimate our main DiD specification on an alternate matched sample of employees who worked remotely before the COVID-19 pandemic, which is constructed using a more extensive matching procedure introduced in ref. 60 . In this matching procedure, we augment the set of observables that we match on to include not only time-invariant employee attributes (that is, role, managerial status, seniority and new-hire status as of February), but also time-varying behavioural attributes (that is, number of scheduled meeting hours, unscheduled call hours, IMs sent, emails sent, workweek hours, network ties, business groups connected to, cross-group connections, bridging ties, churned ties and added ties, share of time with cross-group ties, bridging ties, weak ties and added ties, and the individual clustering coefficient) as measured in June 2020. As we are matching on many more variables, there are more employees who cannot be matched, and our matched sample includes only 43,576 employees.

The motivation for this matching procedure is as follows. In a standard matched DiD analysis, control and treatment units would be matched on the basis of pretreatment behaviour. This type of matching is not appropriate in our context, given that employees who did and did not work remotely before the COVID-19 pandemic are by definition in different potential outcome states in February. Assuming that there is a treatment effect to detect, matching on pretreatment behavioural outcomes would actually make our identifying assumptions less likely to hold. However, in June 2020, both employees who were and were not working remotely before the COVID-19 pandemic were in the same potential outcome state (firm-wide remote work), and therefore matching on time-varying behavioural outcomes improves the credibility of our identifying assumptions.

Supplementary Figs. 20 and 21 show the results of our DiD model as estimated on this alternative sample. The results are qualitatively similar to those we present in our main analysis.

Collaboration network outcome definitions

Number of connections: The number of people with whom one had a meaningful interaction through at least two out of four possible communication media (email, IM, scheduled meeting and unscheduled video/audio call) in a given month. A meaningful interaction is an email, meeting, video/audio call or IM with a group of size no more than eight.

Number of business groups and cross-group connections: A business group is a collection of typically fewer than ten employees who report to the same manager and share a common purpose. We look at the number of distinct business groups that one’s immediate collaborators belong to, and the number of one’s collaborators that belong to a different business group than one’s own.

Bridging connections: Bridging connections are connections with a low value of the local constraint 9 , 18 , 42 in that period. To calculate the local constraint, we first calculate the normalized mutual weight, NMW i j t , between each pair of people i and j in each period t . If there is no connection between i and j in period t , then NMW i j t  = 0, otherwise \({\mathrm{NMW}}_{ijt}=\frac{2}{{n}_{it}+{n}_{jt}}\) , where n i t is the number of connections i has in period t . Then, for each i , j , t , we calculate the local constraint \({{\mathrm {{LC}}_{ijt}}} = {\mathrm {NMW}}_{ijt} + {\sum }_{k} {{{\mathrm {NMW}}}_{ikt}} \times {{\mathrm {NMW}}_{kjt}}\) . We define a global cut-off \(\widehat{\mathrm{LC}}\) on the basis of the median value of the constraint across all directed ties in February and categorize a connection as bridging if its local constraint is below that cut-off. We calculate the local constraint for each tie using the matricial formulae described in ref. 68 .

Individual clustering coefficient: The number of triads (group of three people who are all connected to each other) a person is a part of as a share of the number of triads they could possibly be part of given their degree. If a i j t is a dummy that equals 1 if and only if there is a connection between i and j in period t and n i t is the number of connections i has in period t , then individual i ’s clustering coefficient in period t is \({\mathrm{CC}}_{it}=\frac{2}{{n}_{it}({n}_{it}-1)}\mathop{\sum}\limits_{j,k}{a}_{ijt}\times {a}_{jkt}\times {a}_{kit}\) .

Number of churned connections: The number of people with whom a worker had a connection with in month t  − 1, but does not have a connection in month t .

Number of added connections: The number of people with whom a worker has a connection in month t , but did not have a connection in month t  − 1.

Distribution of collaboration time: In addition to unweighted network ties, we also measured the share of collaboration time that an individual spent with each of their collaborators. The number of collaboration hours is calculated by summing up the number of hours spent communicating by email or IM, in meetings and in video/audio calls. If h i j t is the number of hours that individual i spent with collaborator j in month t , then the share of collaboration time i spent with j is \({P}_{ijt}=\frac{{h}_{ijt}}{{\sum }_{k}{h}_{ikt}}\) , from which we can define the following metrics:

Share of time with own-group connections: The share of time spent with collaborators in the same business group (see the above definition), \({\mathrm{SG}}_{it}=\mathop{\sum}\limits_{j| {g}_{j}={g}_{i}}{P}_{ijt}\) , where g i is the business group that individual i belongs to.

Share of time with bridging connections: The share of collaboration time spent with collaborators with whom the local constraint (as defined under ‘bridging connections’) is below the February median \({\mathrm{BC}}_{it}=\mathop{\sum}\limits_{j| {\mathrm{LC}}_{ijt} < \widehat{\mathrm{LC}}}{P}_{ijt}\) .

Share of time with weak ties: The share of a person’s collaboration hours spent with the half of the people that they collaborate with the least during month t , \({\mathrm{ST}}_{it}=\mathop{\sum}\limits_{j| {P}_{ijt} < {P}_{it}^{m}}{P}_{ijt}\) , where \({P}_{it}^{m}\) is the time that i spends with their median connection in period t . We do not analyse the number of weak ties a person has in a given month as, by this definition, it is equal to half the number of ties they have in that month.

Share of time with added connections: The share of a person’s collaboration hours spent with people with whom they did not have a connection in the previous month, \({\mathrm{SA}}_{it}=\mathop{\sum}\limits_{j\notin {n}_{i,t-1}}{P}_{ijt}\) , where n i , t  − 1 is the set of i ’s collaborators in period t  − 1.

Entropy of an individual’s collaboration time (network entropy): The entropy 48 of the distribution of the hours spent with one’s collaborators, \({E}_{it}=-{\sum }_{j}{P}_{ijt}\times {{\mathrm{log}}}\,{P}_{ijt}\) .

Concentration of an individual’s collaboration time: A normalized version of the HHI 47 of the hours spent with one’s collaborators, \({\mathrm{HHI}}_{it}=\frac{1}{{n}_{it}-1}\left({n}_{it}\times {\sum }_{j}{P}_{ijt}^{2}-1\right)\) , where n i t is the number of i ’s collaborators in period t . The normalization ensures that HHI i t always falls between 0 and 1.

Communication media outcome definitions

Scheduled meeting hours: The number of hours that a person spent in meetings scheduled through Teams or Outlook calendar with at least one other person. Before firm-wide remote work, employees were able to participate in meetings both in-person and by video/audio call. After the shift to firm-wide remote work, all meetings take place entirely by video/audio call.

Unscheduled call hours: The number of hours a person spent in unscheduled video/audio calls through Microsoft Teams with at least one other person.

Emails sent: The number of emails a person sent through their work email account.

IMs sent: The number of IMs a person sent through Microsoft Teams.

Workweek hours: The sum across every day in the workweek of the time between a person’s first sent email or IM, scheduled meeting or Microsoft Teams video/audio call, and the last sent email or IM, scheduled meeting or Microsoft Teams video/audio call. A day is part of the workweek if it is a ‘working day’ for a given employee based on their work calendar.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

An anonymized version of the data supporting this study is retained indefinitely for scientific and academic purposes. The data are not publicly available due to employee privacy and other legal restrictions. The data are available from the authors on reasonable request and with permission from Microsoft Corporation.

Code availability

The code supporting this study is retained indefinitely for scientific and academic purposes. The code is not publicly available due to employee privacy and other legal restrictions. The code is available from the authors on reasonable request and with permission from Microsoft Corporation.

Change history

05 october 2021.

A Correction to this paper has been published: https://doi.org/10.1038/s41562-021-01228-z

Bloom, N. A. Working From Home and the Future of U.S. Economic Growth Under COVID (2020); https://www.youtube.com/watch?v=jtdFIZx3hyk

Brynjolfsson, E. et al. COVID-19 and Remote Work: An Early Look at US Data. Technical Report (National Bureau of Economic Research, 2020).

Barrero, J. M., Bloom, N. & Davis, S. 60 million fewer commuting hours per day: how Americans use time saved by working from home. Working Paper (Univ. Chicago Becker Friedman Institute for Economics, 2020); https://bfi.uchicago.edu/wp-content/uploads/2020/09/BFI_WP_2020132.pdf

Dingel, J. I. & Neiman, B. How many jobs can be done at home? J. Public Econ. 189 , 104235 (2020).

Article   Google Scholar  

Benveniste, A. These companies’ workers may never go back to the office. CNN (18 October 2020); https://cnn.it/3jIobzJ

McLean, R. These companies plan to make working from home the new normal. As in forever. CNN (25 June 2020); https://cnn.it/3ebJU27

Lund, S., Cheng, W.-L., André Dua, A. D. S., Robinson, O. & Sanghvi, S. What 800 executives envision for the postpandemic workforce. McKinsey Global Institute (23 September 2020); https://www.mckinsey.com/featured-insights/future-of-work/what-800-executives-envision-for-the-postpandemic-workforce

Granovetter, M. The strength of weak ties. Am. J. Sociol. 78 , 1360–1380 (1973).

Burt, R. S. Structural holes and good ideas. Am. J. Sociol. 110 , 349–399 (2004).

Reagans, R. & McEvily, B. Network structure and knowledge transfer: the effects of cohesion and range. Admin. Sci. Q. 48 , 240–267 (2003).

Uzzi, B. & Spiro, J. Collaboration and creativity: the small world problem. Am. J. Sociol. 111 , 447–504 (2005).

Argote, L. & Ingram, P. Knowledge transfer: a basis for competitive advantage in firms. Organ. Behav. Hum. Dec. Process. 82 , 150–169 (2000).

Hansen, M. T. The search-transfer problem: the role of weak ties in sharing knowledge across organization subunits. Admin. Sci. Q. 44 , 82–111 (1999).

Krackhardt, D. The strength of strong ties. in Networks in the Knowledge Economy (Oxford Univ. Press, 2003).

Levin, D. Z. & Cross, R. The strength of weak ties you can trust: the mediating role of trust in effective knowledge transfer. Manage. Sci. 50 , 1477–1490 (2004).

McFadyen, M. A. & Cannella Jr, A. A. Social capital and knowledge creation: diminishing returns of the number and strength of exchange relationships. Acad. Manage. J. 47 , 735–746 (2004).

Google Scholar  

Granovetter, M. The strength of weak ties: a network theory revisited. in Social Structure and Network Analysis 105–130 (Sage, 1982).

Burt, R. S. Structural Holes: The Social Structure of Competition (Harvard Univ. Press, 2009)

Baum, J. A., McEvily, B. & Rowley, T. J. Better with age? Tie longevity and the performance implications of bridging and closure. Organ. Sci. 23 , 529–546 (2012).

Kneeland, M. K. Network Churn: A Theoretical and Empirical Consideration of a Dynamic Process on Performance . PhD thesis, New York University (2019).

Kumar, P. & Zaheer, A. Ego-network stability and innovation in alliances. Acad. Manage. J. 62 , 691–716 (2019).

Burt, R. S. & Merluzzi, J. Network oscillation. Acad. Manage. Discov. 2 , 368–391 (2016).

Soda, G. B., Mannucci, P. V. & Burt, R. Networks, creativity, and time: staying creative through brokerage and network rejuvenation. Acad. Manage. J. https://doi.org/10.5465/amj.2019.1209 (2021).

Zeng, A., Fan, Y., Di, Z., Wang, Y. & Havlin, S. Fresh teams are associated with original and multidisciplinary research. Nat. Hum. Behav. https://doi.org/10.1038/s41562-021-01084-x (2021).

Levin, D. Z., Walter, J. & Murnighan, J. K. Dormant ties: the value of reconnecting. Organ. Sci. 22 , 923–939 (2011).

Lengel, R. H. & Daft, R. L. An Exploratory Analysis of the Relationship Between Media Richness and Managerial Information Processing . Technical Report (Texas A&M Univ. Department of Management, 1984).

Daft, R. L. & Lengel, R. H. Organizational information requirements, media richness and structural design. Manage. Sci. 32 , 554–571 (1986).

Dennis, A. R., Fuller, R. M. & Valacich, J. S. Media, tasks, and communication processes: a theory of media synchronicity. MIS Q. 32 , 575–600 (2008).

Morris, M., Nadler, J., Kurtzberg, T. & Thompson, L. Schmooze or lose: social friction and lubrication in e-mail negotiations. Group Dyn. Theor. Res. Pract. 6 , 89–100 (2002).

Pentland, A. The new science of building great teams. Harvard Bus. Rev. 90 , 60–69 (2012).

Allen, T. D., Golden, T. D. & Shockley, K. M. How effective is telecommuting? Assessing the status of our scientific findings. Psychol. Sci. Publ. Int. 16 , 40–68 (2015).

Ahuja, M. K. & Carley, K. M. Network structure in virtual organizations. Organ. Sci. 10 , 741–757 (1999).

Ahuja, M. K., Galletta, D. F. & Carley, K. M. Individual centrality and performance in virtual R&D groups: an empirical study. Manage. Sci. 49 , 21–38 (2003).

Suh, A., Shin, K.-s, Ahuja, M. & Kim, M. S. The influence of virtuality on social networks within and across work groups: a multilevel approach. J. Manage. Inform. Syst. 28 , 351–386 (2011).

DeFilippis, E., Impink, S., Singell, M., Polzer, J. T. & Sadun, R. Collaborating During Coronavirus: The Impact of COVID-19 on the Nature of Work. Working Paper 21-006 (Harvard Business School Organizational Behavior Unit, 2020).

Bernstein, E., Blunden, H., Brodsky, A., Sohn, W. & Waber, B. The implications of working without an office. Harvard Business Review (15 July 2020); https://hbr.org/2020/07/the-implications-of-working-without-an-office

Larson, J. et al. Dynamic silos: modularity in intra-organizational communication networks before and during the COVID-19 pandemic. Preprint at https://arxiv.org/abs/2104.00641 (2021).

Bloom, N., Liang, J., Roberts, J. & Ying, Z. J. Does working from home work? Evidence from a Chinese experiment. Q. J. Econ. 130 , 165–218 (2015).

Choudhury, P., Foroughi, C. & Larson, B. Z. Work-from-anywhere: the productivity effects of geographic flexibility. Acad. Manage. Proc. 2020 , 21199 (2020).

Kleinbaum, A. M., Stuart, T. & Tushman, M. Communication (and Coordination?) in a Modern, Complex Organization (Harvard Business School, 2008).

McEvily, B., Soda, G. & Tortoriello, M. More formally: rediscovering the missing link between formal organization and informal social structure. Acad. Manage. Ann. 8 , 299–345 (2014).

Everett, M. G. & Borgatti, S. P. Unpacking Burt’s constraint measure. Social Netw. 62 , 50–57 (2020).

Onnela, J.-P. et al. Structure and tie strengths in mobile communication networks. Proc. Natl Acad. Sci. USA 104 , 7332–7336 (2007).

Article   CAS   Google Scholar  

Aral, S. & Van Alstyne, M. The diversity-bandwidth trade-off. Am. J. Sociol. 117 , 90–171 (2011).

Brashears, M. E. & Quintane, E. The weakness of tie strength. Social Netw. 55 , 104–115 (2018).

Burke, M. & Kraut, R. E. Growing closer on Facebook: changes in tie strength through social network site use. In Proc. SIGCHI Conference on Human Factors in Computing Systems 4187–4196 (ACM, 2014); https://dl.acm.org/doi/10.1145/2556288.2557094

Herfindahl, O. C. Concentration in the Steel Industry . PhD thesis, Columbia University (1950).

Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27 , 379–423 (1948).

Herbsleb, J. D. & Mockus, A. An empirical study of speed and communication in globally distributed software development. IEEE Trans. Softw. Eng. 29 , 481–494 (2003).

Ehrlich, K. & Cataldo, M. All-for-one and one-for-all? A multi-level analysis of communication patterns and individual performance in geographically distributed software development. In Proc. ACM 2012 Conf. Computer Supported Cooperative Work 945–954 (ACM, 2012); https://doi.org/10.1145/2145204.2145345

Cataldo, M. & Herbsleb, J. D. Communication networks in geographically distributed software development. In Proc. 2008 ACM Conf. Computer Supported Cooperative Work 579–588 (ACM, 2008).

Kolko, J. Remote job postings double during coronavirus and keep rising. Indeed Hiring Lab (16 March 2021); https://www.hiringlab.org/2021/03/16/remote-job-postings-double/

Ugander, J., Karrer, B., Backstrom, L. & Kleinberg, J. Graph cluster randomization: network exposure to multiple universes. In Proc. 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 329–337 (2013); https://doi.org/10.1145/2487575.2487695

Eckles, D., Karrer, B. & Ugander, J. Design and analysis of experiments in networks: reducing bias from interference. J. Causal Inference https://doi.org/10.1515/jci-2015-0021 (2016).

Bojinov, I., Simchi-Levi, D. & Zhao, J. Design and analysis of switchback experiments. Preprint at SSRN https://doi.org/10.2139/ssrn.3684168 (2020).

Lechner, C., Frankenberger, K. & Floyd, S. W. Task contingencies in the curvilinear relationships between intergroup networks and initiative performance. Acad. Manage. J. 53 , 865–889 (2010).

Chung, Y. & Jackson, S. E. The internal and external networks of knowledge-intensive teams: the role of task routineness. J. Manage. 39 , 442–468 (2013).

Dennis, A. R., Wixom, B. H. & Vandenberg, R. J. Understanding fit and appropriation effects in group support systems via meta-analysis. MIS Q. 25 , 167–193 (2001).

Fuller, R. M. & Dennis, A. R. Does fit matter? The impact of task-technology fit and appropriation on team performance in repeated tasks. Inform. Syst. Res. 20 , 2–17 (2009).

Athey, S., Mobius, M. M. & Pál, J. The impact of aggregators on Internet news consumption. Preprint at SSRN https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2897960 (2017).

Swisher, K. Physically together: here’s the internal Yahoo no-work-from-home memo for remote workers and maybe more. All Things (22 February 2013).

Simons, J. IBM, a pioneer of remote work, calls workers back to the office. Wall Street Journal (18 May 2017).

Barrero, J. M., Bloom, N. & Davis, S. J. Why working from home will stick. Working Paper (Univ. Chicago, Becker Friedman Institute for Economics, 2020).

Bloom, N., Davis, S. J. & Zhestkova, Y. COVID-19 shifted patent applications toward technologies that support working from home. Working Paper (Univ. Chicago, Becker Friedman Institute for Economics, 2020).

Workplace Analytics https://docs.microsoft.com/en-us/workplace-analytics/use/metric-definitions (Microsoft, 2021).

Athey, S. & Imbens, G. W. Identification and inference in nonlinear difference-in-differences models. Econometrica 74 , 431–497 (2006).

Iacus, S. M., King, G. & Porro, G. Causal inference without balance checking: coarsened exact matching. Polit. Anal. 20 , 1–24 (2012).

Muscillo, A. A note on (matricial and fast) ways to compute Burt’s structural holes. Preprint at https://arxiv.org/abs/2102.05114 (2021).

Download references

Acknowledgements

This work was a part of Microsoft’s New Future of Work Initiative. We thank D. Eckles for assistance; N. Baym for illuminating discussions regarding social capital; and the attendees of the Berkeley Haas MORS Macro Research Lunch and the organizers and attendees of the NYU Stern Future of Work seminar for their comments and feedback. The authors received no specific funding for this work.

Author information

Authors and affiliations.

Microsoft Corporation, Redmond, WA, USA

Longqi Yang, Sonia Jaffe, Siddharth Suri, Shilpi Sinha, Jeffrey Weston, Connor Joyce, Neha Shah, Kevin Sherman, Brent Hecht & Jaime Teevan

Haas School of Business, University of California, Berkeley, CA, USA

David Holtz

MIT Initiative on the Digital Economy, Cambridge, MA, USA

You can also search for this author in PubMed   Google Scholar

Contributions

L.Y. analysed the data. L.Y., D.H., S.J. and S. Suri performed the research design, interpretation and writing. S. Sinha, J.W., C.J., N.S. and K.S. provided data access and expertise. B.H. and J.T. advised and sponsored the project.

Corresponding author

Correspondence to Longqi Yang .

Ethics declarations

Competing interests.

L.Y., S.J., S. Suri, S. Sinha, J.W., C.J., N.S., K.S., B.H. and J.T. are employees of and have a financial interest in Microsoft. D.H. was previously a Microsoft intern. All of the authors are listed as inventors on a pending patent application by Microsoft Corporation (16/942,375) related to this work.

Additional information

Peer review information Nature Human Behaviour thanks Nick Bloom, Yvette Blount and Sandy Staples for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Supplementary Figs. 1–21 and Supplementary Tables 1–25.

Peer Review Information

Rights and permissions.

Reprints and permissions

About this article

Cite this article.

Yang, L., Holtz, D., Jaffe, S. et al. The effects of remote work on collaboration among information workers. Nat Hum Behav 6 , 43–54 (2022). https://doi.org/10.1038/s41562-021-01196-4

Download citation

Received : 02 November 2020

Accepted : 16 August 2021

Published : 09 September 2021

Issue Date : January 2022

DOI : https://doi.org/10.1038/s41562-021-01196-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

What science says about hybrid working — and how to make it a success.

Nature (2024)

  • David Evans
  • Claire Mason
  • Andrew Reeson

Nature Human Behaviour (2024)

Post-pandemic acceleration of demand for interpersonal skills

Securing the remote office: reducing cyber risks to remote working through regular security awareness education campaigns.

  • Giddeon Njamngang Angafor
  • Iryna Yevseyeva
  • Leandros Maglaras

International Journal of Information Security (2024)

Challenges for Inclusive Organizational Behavior (IOB) in Terms of Supporting the Employment of People with Disabilities by Enhancing Remote Working

  • Frączek Bożena

Social Indicators Research (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

microsoft research papers

Work Trend Index

Research and data on the trends reshaping the world of work

A colorful illustration of people being launched or boosted into the sky with capes that remind the viewer of cursor pointers.

What Can Copilot’s Earliest Users Teach Us About Generative AI at Work?

A first look at the impact on productivity, creativity, and time.

About Work Trend Index

31,000 people. 31 countries. Trillions of productivity signals.

The Work Trend Index conducts global, industry-spanning surveys as well as observational studies to offer unique insights on the trends reshaping work for every employee and leader.

A digital illustration of objects representing work and communication in the foreground leading to a figure moving past them and looking out on a peaceful landscape.

Annual Report · May 9, 2023

Will AI Fix Work?

The pace of work is outpacing our ability to keep up. AI is poised to create a whole new way of working.

Illustration of four people laying down a large puzzle piece, standing on a floor of other puzzle pieces. Two figures hold on to the blue side of the puzzle while another two hold on to the purple side.

Special Report · April 20, 2023

The New Performance Equation in the Age of AI

New research shows that employee engagement matters to the bottom line—especially amid economic uncertainty

Illustration of three people in a sailboat working together to navigate choppy waters. In the distance, several sailboats on placid waters are visible.

Special Report · September 22, 2022

Hybrid Work Is Just Work. Are We Doing It Wrong?

In choppy economic waters, new data points to three urgent pivots for leaders to help employees and organizations thrive

Illustration of hot air balloons ascending into the sky. A person running toward a balloon that is taking off receives a helping hand from another person who is already in the basket of the balloon.

Annual Report · March 16, 2022

Great Expectations: Making Hybrid Work Work

From when to go to the office to why work in the first place, employees have a new “worth it” equation. And there’s no going back.

Three frontline workers, one in an apron, one in a hard hat, one with a stethoscope, in front of a colorful illustrated background.

Special Report · January 12, 2022

Technology Can Help Unlock a New Future for Frontline Workers

New data shows that now is the time to empower the frontline with the right digital tools

A figure chisels a stone sculpture. Another paints a landscape. Between them, a figure hands each the tools they need to do their jobs.

Special Report · September 9, 2021

To Thrive in Hybrid Work, Build a Culture of Trust and Flexibility

Microsoft employee survey data shows the importance of embracing different work styles—and the power of simple conversations

An illustration of a person resting between two meetings.

Special Report · April 20, 2021

Research Proves Your Brain Needs Breaks

New options help you carve out downtime between meetings

A giant hand draws lines that connect a series of tiny people. The lines form an arrow pointing the way forward.

Special Report · March 30, 2021

In Hybrid Work, Managers Keep Teams Connected

Researchers found that feelings of connection among Microsoft’s teams diminished during the pandemic. They also discovered the remedy.

Abstract paper cut illustration of concentric woman's silhouette in profile with moon over water at the center

Annual Report · March 22, 2021

The Next Great Disruption Is Hybrid Work—Are We Ready?

Exclusive research and expert insights into a year of work like no other reveal urgent trends leaders should consider as hybrid work unfolds.

A person on a stepladder replacing the yellow light on a stoplight.

Special Report · 2020-09-22

A Checkup on Employee Wellbeing

Explore how the pandemic is impacting wellbeing at work around the world.

People surfing on a wavy ocean made up of striated data charts.,

Special Report · 2020-07-08

The Knowns and Unknowns of the Future of Work

Learn how a sudden shift to remote work may have lasting effects around the world.

People climbing out of a video conferencing icon and walking around freely.,

Special Report · 2020-04-09

Remote Work Trend Report: Meetings

See how global meeting habits changed during the world’s largest work-from-home mandate.

WorkLab Newsletter art

The WorkLab Newsletter: Science-based insights on the future of work, direct to your inbox

Discover more from WorkLab

A colorful photo-illustration of McKinsey’s Global Talent Head Bryan Hancock.

Additional research on the future of work

Privacy Approach

Microsoft takes privacy seriously. We remove all personal and organization-identifying information, such as company name, from the data before analyzing it and creating reports. We never use customer content—such as information within an email, chat, document, or meeting—to produce reports. Our goal is to discover and share broad workplace trends that are anonymized by aggregating the data broadly from those trillions of signals that make up the Microsoft Graph.

  • Open supplemental data
  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, a review of microsoft academic services for science of science studies.

microsoft research papers

  • Microsoft Research, Redmond, WA, United States

Since the relaunch of Microsoft Academic Services (MAS) 4 years ago, scholarly communications have undergone dramatic changes: more ideas are being exchanged online, more authors are sharing their data, and more software tools used to make discoveries and reproduce the results are being distributed openly. The sheer amount of information available is overwhelming for individual humans to keep up and digest. In the meantime, artificial intelligence (AI) technologies have made great strides and the cost of computing has plummeted to the extent that it has become practical to employ intelligent agents to comprehensively collect and analyze scholarly communications. MAS is one such effort and this paper describes its recent progresses since the last disclosure. As there are plenty of independent studies affirming the effectiveness of MAS, this paper focuses on the use of three key AI technologies that underlies its prowess in capturing scholarly communications with adequate quality and broad coverage: (1) natural language understanding in extracting factoids from individual articles at the web scale, (2) knowledge assisted inference and reasoning in assembling the factoids into a knowledge graph, and (3) a reinforcement learning approach to assessing scholarly importance for entities participating in scholarly communications, called the saliency, that serves both as an analytic and a predictive metric in MAS. These elements enhance the capabilities of MAS in supporting the studies of science of science based on the GOTO principle, i.e., good and open data with transparent and objective methodologies. The current direction of development and how to access the regularly updated data and tools from MAS, including the knowledge graph, a REST API and a website, are also described.

Introduction

Centuries of scientific advancements have been a result of a virtuous cycle where scientists meticulously collect observation data to deduce a theoretical model and then use the model to predict new experimental outcomes as a means to validate the theory. This scientific principle has been applied to study the science of science, namely, the development of science itself, a field that sees pioneers like Eugene Garfield at the Institute for Scientific Information (ISI, now part of Clarivate Analytics) ( Garfield, 1955 , 1964 , 1972 ). Driven by the insights that scientific advancements inevitably leave abundant traces in the scholarly communications that often manifest themselves in the form of citations, a central topic in the science of science has been deriving quantitative models from citations for the purpose of analyzing and understanding the impacts of scientific work. Historically, citations made in the main body of an article have been difficult to collect so the bibliography has been used in their stead. Implicitly, this practice assumes the relations among publications can be approximated by the pairwise Boolean measures between the citing and the cited articles. Such an approximation is found to be too reductive in contrast to peer reviews for article-level assessments ( Wilsdon, 2015 ), although there is evidence suggesting noises in such a simplified model may be “canceled out” through aggregations at a level higher than individual articles ( Traag and Waltman, 2019 ). Indeed, the most widely used bibliometrics, such as the journal impact factor (JIF) or the h-index, are by design aggregate measures at the journal or the author level. However, the demands for article-level metrics are so strong that they make popular a practice assuming articles in the same journal are equal in quality and the aggregate metrics for the journal can serve as a proxy for the articles published therein. Its adverse effects are so profound and misuses so pervasive that renowned institutions and thought leaders have found it necessary to proclaim the San Francisco Declaration of Research Assessment (DORA) 1 to publicize a strong stance against using journal-level metrics for research assessments. A widely accepted good model to understand the impacts of individual publications has yet to be found.

Another challenge in the study of science of science is the explosive growth in the volume of scientific reports and the diversity of research topics. These have outstripped the cognitive capacity of human beings to properly digest and catch up. This cognitive overload ostensibly impacts everyone, including those employed by vendors to curate data and develop commercial platforms for science of science studies. As a result, errors and omissions in manually curated data are abundant, eroding the trustworthiness of studies conducted on those platforms. Most frustratingly, the proprietary and opaque nature in the commercial systems prevent recourses when obvious errors are spotted. As data-driven decision-making processes have become more prevalent in recent years, the platform quality has become a serious issue that prompts the Computing Research Association (CRA) to release a stern statement on the worsening state of commercial data and call for actions against unscientific practices based on or leading to flawed data 2 . In their report ( Berger et al., 2019 ), a CRA working group illustrates faulty data from Clarivate Analytics and surveys from humans no longer up to date in their fields may have led US News & World Report to produce abhorrent rankings on research universities that can severely mislead students in making school choices and funders in allocating resources. Similar to DORA, the CRA working group publishes a set of guidelines urging the adoption of Good and Open data with Transparent and Objective methodology, known as the GOTO principle, in conducting and publishing the results of quantitative science of science studies.

This article describes Microsoft Academic Services (MAS), a project in Microsoft Research with an aim to support researchers to follow the GOTO principle. Having evolved from the initially disclosed in ( Sinha et al., 2015 ), MAS now consists of three parts: an open dataset known as Microsoft Academic Graph (MAG) 3 , a freely available inference engine called Microsoft Academic Knowledge Exploration Service (MAKES), and a website called Microsoft Academic 4 that provides a more human friendly interface to MAKES. MAS is a part of an ongoing research that explores the nature of cognition, a topic in artificial intelligence (AI) that studies the mental capacity in acquiring, reasoning and inferencing with knowledge. The research is motivated by the observation that cognition involves the capabilities of memorizing, computing, being attentive, and staying focused on the task at hand, all of which can be programmed to the modern computer to outperform humans. Particularly for MAS, the project explores the boundary within which the power of machines can be harnessed to understand the scholarly communications observable on the web. In other words, MAS aims at developing AI agents that are well-read in all scientific fields and hopefully can become trustable consultants to human researchers on matters of scholarly activities taking place on the web. In this sense, the MAG component in MAS is the outcome of the knowledge acquisition and reasoning and MAKES, the capability of machine inferencing with the knowledge in MAG. The dataset MAG is distributed and frequently updated under an open data license and the inference algorithms in MAKES are published in relevant peer-review venues and summarized later in this article.

Aside from being open in data and transparent in algorithm as per the GOTO principle, MAS actively uses technologies to capture scholarly communication activities with adequate quality and coverage to strive for a good platform. To address the explosive growth in scientific research, MAS employs the state-of-the-art AI technologies, such as natural language understanding, to extract the knowledge from the text of these publications. This allows MAS to always take a data-driven approach in providing consistent data quality and avoid manual efforts that are often the source of subjective controversies or errors. Knowledge extraction in MAS goes beyond simply indexing key phrases to recognize and disambiguate the entities underpinning scholarly communications. MAS currently includes entities that describe who supported by which institutions have made what claims in which publication at which instance of which venue , as illustrated in Figure 1 . With more scholarly communications being conducted online with data and software tools, the definition of publication in MAS has been expanded. Aside from the traditional forms such as books, journals and conference papers, MAS has recognized datasets and software packages as additional forms of publications. Additionally, as plenty of scholarly work exerts impacts through commercial exploitation preceded by patent applications, MAS has also included them as publications. These new resources fit well into the model of publication entity in Figure 1 because they all have authors, affiliations, topical contents, etc., and can receive citations. In addition to extracting these entities, a key mission of knowledge extraction is to recognize the relations among the entities, such as the citation contexts characterizing how the work in one publication is received by others citing it. As schematized in Figure 1 , these entities and their relations are represented in a graph structure as the nodes and edges, respectively, leading to the name of MAG. Note that the entity recognition and disambiguation (ERD), as reported in ( Carmel et al., 2014 ), is far from a solved problem. However, the key here is the AI technologies employed in MAS are designed to learn and improve by itself by repeatedly reading more materials than any human can possibly do in a lifetime. After years of self-improving, many independent studies have suggested that MAG data are in many aspects as accurate, if not more, than manually curated data ( Herrmannova and Knoth, 2016 ; Harzing and Alakangas, 2017 ; Hug and Brändle, 2017 ; Hug et al., 2017 ; Thelwall, 2017 , 2018a , b , c ; Kousha et al., 2018 ).

www.frontiersin.org

Figure 1 . The data model of scholarly communications in MAS where the nodes represent the entity types modeled in MAG, and the simple and block arrows depict one-to-one and one-to-many relations among the entities, respectively.

Secondly, MAS uses technologies for scale, particularly when the lack of coverage in many datasets is becoming ever more concerning. While it might be appropriate in the last century for human experts to manually select only some of the scholarly communications into a database, this practice may have finally lived out its usefulness as the case studies in the CRA report have shown. Furthermore, with the advancements in information technology, online publishing has become a widely adopted medium for scientists to communicate with one another. Important activities, including self-archiving, data and software sharing, and community efforts dedicated to reproducing previously published results [e.g., Papers with Code 5 , ReScience ( Rougier et al., 2017 )] are taking place exclusively on the web. A modern dataset therefore must be able to capture all these web-only activities to properly reflect the current state of the reality, and it is hard to fathom how all these capturing efforts can be accomplished by hand. MAS provides an encouraging example that technologies can help in this area.

The key to MAS is large-scale deployment of AI agents in understanding scholarly communications. Therefore, the rest of the article is devoted to describing the methodologies so that the characteristics of MAS can be better understood. The AI technologies used in MAS, as illustrated in Figure 2 , encompass three areas: (1) natural language understanding, including ERD and concept detection to extract factoids from individual publication and to fulfill queries in MAKES, (2) knowledge reasoning to organize the factoids into MAG, and (3) a reinforcement learning system to learn a probabilistic measure called the saliency that facilitates the statistical learning and inferences in the above two areas.

www.frontiersin.org

Figure 2 . AI and service components in MAS are comprised of two feedback loops, one to grow the power of acquiring knowledge in MAG and the other to assess the saliency of each entity in MAG. In the first loop, each publication on the web is first processed by the MAG assisted entity recognition and disambiguation as described in (1). As the raw entities and their relations are extracted from individual publications, semantic reasoning algorithms are then applied to conflate them into a revised graph, including the concept hierarchy from all the publications. The revised MAG is then used in the next run to better extract entities from publication. The second loop utilizes the citation behaviors as the rewarding target for a reinforcement learning algorithm to assess the importance of each entity on MAG based on the network topology. The quantitative measure, called the saliency, serves as a ranking factor in MAKES, a search and recommendation engine for MAG.

Entity Recognition and Disambiguation

Central to MAS is the quest to harness the power of machine to acquire knowledge from written text. As alluded previously, the knowledge acquisition task amounts to recognizing the lexical constructs of the semantic objects representing either entities or relations. To be more precise, the task of natural language understanding in MAS is formulated as a maximum a posteriori (MAP) decision problem:

where the input x = ( w 1 , w 2 , ⋯ ) is a word sequence of a natural language expression, K is a knowledge base, and the task is to find the best output ŷ = ( e 1 , e 2 , ⋯), e i ∈ K , that is a sequence of semantic objects. For example, suppose the input is a sentence “HIV causes AIDS.” The ideal output should consist of two entities “HIV” and “AIDS,” and a relation “causing” between them.

The MAP decision is known to be optimal provided the posterior probability distribution in (1) can be accurately estimated. While this can be done directly, MAS uses a mathematically equivalent approach, known as the generative modeling, where the Bayes rule is applied to (1) to rewrite the MAP decision as:

with P ( x | y, K ) and P ( y | K ) the semantic language and the prior models, respectively. The semantic language model characterizes how frequently a sequence of semantic objects y is expressed through the word sequence x . Typically, an entity is lexicalized by a noun phrase while a relation, a verb phrase. MAS, however, does not utilize the syntax structure of natural language but, rather, just assumes that the lexical realization of each semantic object is statistically independent of one another, namely:

where x i denotes the i -th phrase segment in x corresponding to e i . Essentially, the semantic language model characterizes the synonymous expressions for each semantic object e i and how likely each of them is used. For example, the journal “Physical Review Letters” can be referred to by its full name, a common abbreviation “Phys Rev Lett,” or simply the acronym “PRL,” and an author can be mentioned using the last name, the first name or just its initial with an optional middle initial. The bibliography section, the text body and the web pages of a paper all provide abundant materials to harvest synonymous expressions. With large enough data samples, it appears adequate in MAS to use a simple maximum likelihood estimation, i.e., frequency counts with statistical smoothing, for the synonym model P (· | e i , K ).

The semantic prior model P ( y | K ) assesses the likelihood of a certain combination of semantic objects that can be derived from the knowledge base. In a way, the brunt of the statistical independent assumption in (3) is lessened because the contextual dependencies leading to a viable semantic interpretation are strictly enforced here. This can be seen by applying the chain rule of conditional probability to further decompose the semantic prior model as:

where P ( e 1 | K ) is the saliency of the entity e 1 and P ( e i | e i −1 , ⋯ e 1 , K ) is the semantic cohesion model according to the knowledge K . In conjunction with the synonym model, the semantic cohesion model can be estimated directly from data with an additional constraint that assigns zero probability to implausible semantic object combinations. This constraint plays a critical role in reducing the degree of ambiguities in understanding the input. For example, “Michael Evans” with a missing middle initial is a very confusable name, and “WWW” can mean a conference organized by IW3C2, a journal (ISSN: 1386-145X or 1573-1413), or even as a key word in the title of a paper. However, there are only two authors, a “Michael P. Evans” and a “Michael S. Evans” that have ever published any papers in the WWW conference, in 2002 and the other in 2017, respectively, and never in the namesake journal or any paper containing “WWW” as a key term in all other publication venues. If the publication year is also present, the apparently ambiguous input “Michael Evans ( Evans and Furnell, 2002 )” can be precisely resolved into the entity referring to the author named “Michael P. Evans” that has published a paper in “the eleventh International World Wide Web Conference” held in Honolulu Hawaii in the year of 2002. Using the knowledge-imposed constraints is particularly effective for author disambiguation when the technique is applied to understand curricula vitae or author homepages posted on the web. Assuming each such web page belongs to a single author, the publications listed therein are often high-quality signals to ascertain the identity of the author from the namesakes.

The manner that the knowledge is utilized in (4) also allows MAS to identify and acquire new synonymous expressions for existing entities and, often, new entities. This capability in acquiring new knowledge without human intervention is the key for MAS to enrich itself gradually. Mathematically, let K t denote the knowledge base used in (1) leading to the understanding of the scholarly materials y t at time t , the knowledge enrichment in MAS at an interval of Δ t , is also formulated as the MAP decision:

The iterative process of (5) in MAS can be better appreciated through a common task in parsing the bibliography where the author intent is to refer to a publication with a sequence of references to authors, followed by an optional publication title and a reference to a publication venue. The manners with which a reference is made, however, is highly inconsistent. When the semantic knowledge is applied to parse an input such as “Zhihong Shen, Hao Ma, and Kuansan Wang, ACL-2018,” it allows MAS to recognize fragments of the text, say, “Hao Ma” or “Kuansan Wang” as authors as they are frequently seen in the knowledge base. With these anchors, MAS can use (4) to infer that “Zhihong Shen” and “ACL-2018” are likely references to another author and the venue, respectively. These inferences can be made even before the publication records of ACL-2018 are included into the knowledge base and can be used with (5) to grow new entities in MAS.

While MAG only publishes the canonical expression for each entity, MAKES includes the probabilistic models that we derive from all the raw materials mentioned above. A step-by-step examination of (2) can be conducted in the query input box at the Microsoft Academic website where, upon each character entered, an API call is made into MAKES to analyze the semantic intent of the typed input with the MAP decision rule described in (2). Top interpretations manifesting themselves as query completions or suggestions are displayed to the user as a means for query intent, disambiguation or confirmation. More details are described in the FAQ page of the website 6 .

Concept Detection and Taxonomy Learning

Like many complex systems, the relations among entities in the scholarly communications ( Figure 1 ) cannot fully capture the activities because the semantics of the communications is encoded not in the topology but in the natural language contents of the publications. To address this issue, MAS adopts an entity type, called concepts [called “fields of study” in Sinha et al. (2015) ], to represent the semantic contents of a document. Unlike physical entities such as authors and affiliations, concepts are abstract and hence have no concrete way to define them. Furthermore, concepts are hierarchical in nature. For example, “machine learning” is a concept frequently associated with “artificial intelligence” that, in turn, is a branch of “computer science” but often intersects with “cognitive science” in “psychology.” Accordingly, a taxonomy must allow a concept to have multiple parents and organize all concepts into a directed acyclic graph (DAG). While concepts can be associated with all types of physical entities, say, to describe the topics of interest of a journal or the fields of expertise of a scholar, MAS only infers the relations between a publication and its concepts directly and leaves all others to be indirectly aggregated through publications.

A survey on the concepts taxonomy used in major library systems, presumably developed by human experts, suggests that few of them are compatible with each other. The low agreement among human experts leads MAS to create a concept taxonomy by itself solely from the document collection. As there are close to 1 million new publications a month being added in recent months, the machine learned taxonomy is dynamically adjusted on a regular basis so that new concepts can be added and obsolete concepts can be retired or merged with others.

Concept detection is a natural language understanding problem and, therefore, its mathematical foundation is also governed by (1). Unlike the ERD problem, however, the ideal output y ^ in this case is an unordered collection of DAGs of concepts rather than a sequence of semantic objects, and the textual boundaries of a concept in x are intrinsically soft, i.e., phrase segments can overlap. MAS therefore employs the approach to directly estimate the probabilistic distribution in (1) from the text rather than going through a generative model of (2). As detailed in the recent publication ( Shen et al., 2018 ), the key concept underlying the MAS approach here is the distributional similarity hypothesis proposed in 1950's ( Harris, 1954 ), which observes that semantically similar phrases tend to occur in similar contexts. There have been plenty of methods reported in the literature demonstrating the efficacy of applying distributional similarity for concept detection, either by training a hierarchical classifier mapping a sequence of discrete words directly into concepts, or by the embedding method that first converts the text into a vector representation with which learning and inferences can be conducted in a vector space ( Turney and Pantel, 2010 ). When properly executed, semantically similar phrases can be transformed into vectors close to one another, simplifying the synonymous expression detection, needed for (3), into a nearest neighbor search. In other words, the probabilistic distribution of synonyms P (·| e i , K ) can be estimated by the distance in the vector space. Recently, the embedding methods have produced many surprising results, starting with ( Mikolov et al., 2013 ; Berger et al., 2019 ), that contribute to a renaissance of the vector space model thanks to the availability of big data and powerful computational resources. The current practice in MAS, however, has found it more powerful to combine both the discrete and the vector space approaches into a mixture model for concept learning ( Shen et al., 2018 ).

The concept detection software in MAS has been released as part of the MAG distribution. The package, called Language Similarity 7 , provides a function with which the semantic similarity of two text paragraphs can be quantified using the embedding models trained from the publications in the corresponding MAG version. This function in turn serves as a mixture component for another function that, for any paragraph, returns a collection of top concepts detected in the paragraph that exceed a given threshold. Again, interested readers are referred to the recent article ( Shen et al., 2018 ) for technical details.

Network Semantics Reasoning

As MAS sources its materials from the web notorious for its uneven data qualities, duplicate, erroneous and missing information abounds. Critical to MAS is therefore a process, called conflation, that can reason over partial and noisy information to assemble the semantic objects extracted from individual documents into a cohesive knowledge graph. A key capability in conflation is to recognize and merge the same factoids while adjudicating any inconsistencies from multiple sources. Conflation therefore requires reasoning over the semantics of network topology and many of the techniques in MAS described in Sinha et al. (2015) are still in practice today.

Recently, a budding research area focuses on extending the notion of distributional similarity from its natural language root to the network environment. The postulation is straightforward: similar nodes tend to have similar types of edges connecting to similar nodes. Similar to the natural language use case of representing entities and relations as vectors, the goal of this approach is to transform the nodes and edges of a network into vectors so that reasonings with a network can be simplified and carried out in the vector space with algebraic mathematics. Network semantics, however, is more complicated than natural language whose contextual relations are single dimensional in nature: a phrase is either left or right to another. A network has a higher order topology because a node can simultaneously connect to a wide variety of others with edges representing distinctive relations. Citation network is a simple example where one paper can be cited by two others that also have a citation relation between them. Citation network is considered simple as it only has a single type of nodes, publication, and a single type of relation, citing. In reality, scholarly communications also involve people, organizations, locations, etc. that are best described by a heterogeneous network where multiple types of nodes are connected by multiple types of edges, making the notion of distributional similarity more sophisticated. The research in heterogenous network semantics reasoning, especially in its subfields of network and knowledge graph embedding, is ongoing and highly active.

MAS has been testing the network embedding techniques on related entity recommendation and found it essential for each entity to have multiple embeddings based on the types of relations involved in the inferences. In other words, embedding is sensitive to the sense defining similarity. For example, two institutions can be regarded as similar because their publications share a lot in common either in contents, in authorships, in venues, or are being cited together by same publications or authors. The multitude of senses of similarity leads to multiple sets of embeddings, of which results are included in MAG distributions. As the research in this area is still ongoing and the techniques by no means matured, MAS applications can achieve better results by combining the embedding and the discrete inference techniques. One such example is reported in a recent paper ( Kanakia et al., 2019 ) that describes the method behind the current related publication recommendation in MAS. The user studies in this application show the best system uses both the distance of the text embeddings and the frequency of being cited together.

Assessing Entity Importance With Saliency

As the MAP decision in (1) also drives MAKES to rank the results y in response to a query x , the entity prior P ( e | K ) in (4) is a critical component for MAKES. The way the entity prior is estimated determines in which sense the ranking is optimized. Ideally, the prior should be the importance the entity has been perceived by the scholarly community in general. Recently, a new area of research, lumped under the name altmetrics ( Piwowar, 2013 ), has been advocating that the searching, viewing, downloading, or endorsement activities in the social media for a publication should be included in estimating the importance of the scholarly work. Having monitored these activities for the past few years, we have found altmetrics a good indicator gauging how a publication has gained awareness in the social media. Although being known is a necessary step for being perceived as important, our observations cannot exclude the possibility that a publication is searched and viewed more because it is repeatedly mentioned in another highly regarded work, or authored by influential scholars or even just from reputable organizations. Based on our observations and concerns about altmetrics in the community (e.g., Cheung, 2013 ), the current focus in MAS is on exploiting the heterogeneity of scholarly communications mentioned above to estimate the entity prior by first computing the importance of a node relative to others of the same type and then weighting it by the importance of its entity type.

Saliency: An Eigencentrality Measure for Heterogeneous Dynamic Network

The eigenvector centrality measure, or simply eigencentrality, has been long known as a powerful method to assess the relative importance of nodes in a network ( Franceschet, 2011 ). Developed in the early twentieth century, eigencentrality measures the importance of a node relative to others by examining how strongly this node is referred to by other important nodes. Often normalized as a probabilistic measure, eigencentrality can be understood as a likelihood of a node being named as most important in a survey conducted on all members in the network. The method is made prominent by Google in its successful adaptation of eigencentrality for its PageRank algorithm: the PageRank of a webpage is measured by the proportional frequency of the incoming hyperlinks weighted by the PageRank of the respective sources. In a distinct contrast to simple citation counts, two important considerations in PageRank are the frequency of mentions in the citing article counts, and the importance of the citing source matters. Google has demonstrated that PageRank can be successfully used to assess the importance of each web document.

There are, however, two major challenges in using the eigencentrality as an article-level metric in general. First, the eigencentrality is mathematically well-defined only if the underlying network is well connected. This mathematical requirement is often not met in real-life, neither in the citation networks nor the web graph. To tackle this problem, Google introduced a “teleportation” mechanism in PageRank in which the connection between two web pages is only 85% dependent on the hyperlinks between them. The rest of the 15%, called the teleportation probability, is reserved for the assumption that all webpages are connected to each other intrinsically and uniformly. While the teleportation mechanism serves Google well, it is found to be fragile and implausible for the citation network ( Walker et al., 2007 ; Maslov and Redner, 2008 ): the ranking of scholarly publications is overly sensitive to the choice of the teleportation probability, and the best choice suggests scientists only follow the bibliography half the time, with the other half randomly discovering articles from the entire research literature following a uniform distribution. Many PageRank inspired studies, as recently reviewed in ( Waltman and Yan, 2014 ), have also made the same observation and proposed remedies utilizing the heterogeneity of the scholarly communication network. They mostly, however, are in an early exploratory stage as the manners in modeling the heterogeneous interactions still contain many heuristics needed to be further validated. Secondly, even the well-connected issue can be addressed through a heterogeneous model, another challenge, as pointed out by many (e.g., Walker et al., 2007 ), is how to avoid treating eigencentrality as a static measure so that the time differences in citations can be taken into account. It is undesirable to treat an article that receives the last citations long ago as equal to one that has just received the same amount of citations today because results without a proper temporal adjustment exhibit a favorable bias toward older publications that have more time to collect citations.

MAS attacks these two challenges with a unified framework called saliency based on the following considerations. First, to address the underlying network as changing in time, saliency is defined as the stochastic process characterizing the temporal evolution of the individual eigencentrality computed from a snapshot of the network. Without making assumptions on its form, the autoregressive moving-average (ARMA) process, mathematically known to be able to approximate a non-stationary distribution to any precision with enough orders, is used to model the temporal characteristics of saliency. Surprisingly for MAS, a simple first order autoregressive (AR) process seems sufficient for the model to reach an ergodic solution (to be shown below), suggesting that the endorsement power of a citation can be treated as simply as an exponential decay with a constant half-life interval. This finding is a validation of the observation first reported in ( Walker et al., 2007 ).

Secondly, to account for the heterogeneity of the network, MAS uses a mixture model in which the saliency of a publication is a weighted sum of the saliencies of the entities related to the publication. By considering the heterogeneity of scholarly communications, MAS allows one publication to be connected to another through shared authors, affiliations, publication venues and even concepts, effectively ensuring the well-connectedness requirement is met without introducing a random teleportation mechanism. Mathematically, let s x ( t ) denote the saliency vector of the entities of type x at time t , with x = p specifically for the publication, the heterogeneous mixture model coupled with an AR process leads to:

where Δ t is the interval between the two successive network snapshots are taken, w p, x the (non-negative) weight of a type x node on the publication, τ the time decaying factor in the AR process, and A p, x the adjacency matrix characterizing the connection strength between a publication to any entity of type x . Currently, MAS considers all nodes of types x ≠ p to have equal connection to the publication, e.g., given a publication all of its authors and affiliations are treated as contributed equally to the saliency of the publication. In the meantime for publications citing one another, A p, p is set proportional to the number of mentions in the text body of the citing article to the cited work.

As the heterogeneous model treats the saliency of a publication as the combined saliencies of all entities related to it, s p ( t ) is therefore a joint probabilistic distribution. Accordingly, the saliency of a non-publication entity can be obtained by marginalizing the joint distribution, i.e.,

where A x, p = [δ ij ] and

Again, the current MAS implementation does not address how the credit of a publication should be assigned unevenly to its authors based on the author order as (8) implies all authors have equal contributions, the side effect of which, however, is each institutions associated with a publication will receive its credit proportional to the number of authors affiliated with the institution. Ostensibly, a more sophisticated model than (8) can be used where, for instance, the author sequence can play a role in determining δ ij . MAS reports the author sequence as well as the affiliation sequence for authors with multiple affiliations, but has not yet used them for the purpose of computing saliency.

Estimating Saliency With Reinforcement Learning

To avoid making a strong assumption that the latent variables τ and w p, x are constant, MAS uses reinforcement learning (RL) to dynamically choose the best values based on the reinforcement signals streaming in through the observations. The choice is motivated by the fact that the RL technique is known to be effective in tackling the exploitation vs. exploration tradeoff, which in MAS means a balanced treatment between the older and newer publications or authors that have unequal time to collect their due recognitions. Often, the challenge of applying RL is the reinforcement signals are hard to obtain. This is fortunately not the case in MAS because approximately half a million new publications with tens of million citations are discovered every 2 weeks (= Δ t ), and these new observations provide ample materials to be reinforcement signals. Assuming the scholar communications are eventually just, namely, more important publications will receive higher citations in the long run ( NΔt, N ≫ 1), the goal of the RL in MAS is to maximize the agreement between the saliencies of today and the citations accumulated NΔt into the future. Currently, MAS uses the maximum mutual information (MMI) as the quantitative measurement for the agreement, namely, if c ( t ) denotes the vector of citation mention counts for all publication, the objective of the RL in MAS is to find:

where < ·, · > denotes the inner product. The choice of MMI allows (9) to be a convex function so that it can be iteratively solved with a quasi-Newton method. An off-the-shelve software implementing a L-BFGS algorithm is used in MAS.

It is a surprise that, by choosing long enough future N , the solutions to the latent variables τ and w p, x appear to be quite steady over time with the simplest form of ARMA process: 1 st order autoregression and no moving average. This apparent ergodicity allows MAS to administer the RL with a delay of NΔt ≈5 years, namely, the latent variables in (6) can be obtained by using the data observed up to 5 years ago to predict the citations of the recent 5 years. The results, as shown in Figure 3 , suggests that generally a publication accrues its saliency from citations at a weight slightly more than 92%, although the factors of its authors, affiliations, publication venues and even topics are non-trivial. Along the time domain, the value of τ, hovering around 0.9, corresponds to a temporal decay in saliency with a half-life of 7.5 years. In contrast to previously studies where the citations only account for 50% of the weight ( Walker et al., 2007 ; Maslov and Redner, 2008 ) or with a very short decay from 1 to 2.6 years ( Walker et al., 2007 ), the RL results in MAS are a much less dramatic departure from the common practice of using citation counts as an article level metric where, effectively, the metric is computed with a 100% weight in citations that do not decay over time.

www.frontiersin.org

Figure 3 . The longitudinal values of the latent variables underlying the saliency as obtained by the reinforcement learning (RL) algorithm. These latent variables correspond to the weighting the algorithm has to exert on each entity type in order to predict the future citation behaviors most optimally in the sense of Maximum Mutual Information, as described in (9). The model shows citations remain the dominant factor to have a high saliency. Despite a relatively simple configuration, the model exhibits remarkable stability over the 35 months shown in this figure other than the two instances, in April 2017 and July 2018, when MAG changed its treatment on affiliations dramatically.

As also shown in Figure 3 , the RL is not impervious to major changes in the underlying data, such as the treatments to author affiliations. In May 2017, a so-called “inferred affiliation” feature was introduced to MAG where authors with unknown affiliations were associated with the “most likely” institutions inferred from their recent publication records. An overly optimistic threshold led to many inaccurate projections, and the RL responded to the degradation in quality by lowering the affiliation weight and shifting it to citations. In July of the following year, the MAG data schema is altered to allow a single author to have multiple affiliations and all of which receive equal attribution from the publications by the author. Such a more faithful characterization of author affiliations leads to a boost of the affiliation weight from 1.5 to 3.5%, suggesting the RL mechanism finds the affiliation information more useful.

Properties of Saliency

The saliencies obtained with (6) and (7) are reported in MAG at each update interval for all entities in a quantized form of −1000ln s ( t ), and are used as the entity prior in the MAP decision (1) in MAKES that can be examined through the search and analytics results at Microsoft Academic websites. All these tools can be valuable for more and deeper investigations to fully understand the properties of saliency as a potential metric. For example, by design s p ( t ) further discriminates the following three citation behaviors not considered in the simple citation count: the number of mentions in the citing article, the age of the citations received, and the non-citation factors that can alleviate the disadvantages for newer publications. The combined effects of these three aspects on the article-level assessment can be further studied by inspecting the results from (1) with synthesized queries. Figure 4 shows a typical outcome of 20% disagreement in the ranking position differences between saliency and citation count based rankings using the query set ( Supplementary Material S1 ). A quick examination into the disagreements confirms that a publication can have a higher saliency, albeit lower citation counts, because it is cited by more prestigious or more recent work as designed. Where these disagreements are desired, however, is a question worth exploring.

www.frontiersin.org

Figure 4 . Histogram of ranking position by citation counts (CC) and saliencies (top) and their differences (bottom). Although future citation counts are the target for best estimating the saliencies, they only agree on the publication rankings roughly 80 percent of the time, demonstrating the effects of non-citation factors ( Figure 3 ) in the design of saliency. In contrast to citation counts, saliencies are sensitive to the venues, the authors, the concepts, and the recencies of the citing sources.

The design to unshackle the reliance on the overly reductive citation counts may also lead the saliency to be less susceptible to manipulations, ranging from citation coercions ( Wilhite and Fong, 2012 ) to malicious cheating ( López-Cózar et al., 2014 ) targeting metrics like the h-index. By using the citation contexts in saliencies, these manipulations are, in theory, less effective and easier to detect, as demonstrated by PageRank for the link spam detection in the web graph ( Gyöngyi and Garcia-Molina, 2005 ). The extent to which the gain of the eigenvector-based method can be transported from the web graph to the scholarly network, however, awaits further quantification.

Another research topic MAS can be useful is in the effectiveness of saliencies of non-publication entities that, as described in (7), are aggregated from publication saliencies. This design gives rise to at least two intriguing properties. First, an entity can achieve high saliency with lots of publications, not all of which are important. As a result, saliency appears to be measuring both productivity and impact simultaneously, just like h-index. Indeed, shows a comparison between the h-index and the saliency of Microsoft authors ( Supplementary Material S2 ). Overall, there is a trend line suggesting individuals with a higher h-index tend to also have a higher saliency, but notable disagreements between the two abound. The author with the highest h-index, 134, in this set has the most publications at 619 articles that in total receive 60,157 citations but ranks only at the 4th place by saliency. Conversely, the highest ranked author by saliency has published only 138 papers receiving 82,293 citations with a h-index at 76. Most notably, the second highest ranked author by saliency has an h-index only at 31. This is because the author has published only 39 papers, which limits the h-index, but they are all well received with a total citation count of 58,268, which buoys the saliency. The drawbacks of the h-index, e.g., capping at the publication counts in this example, are well-known ( Waltman and Eck, 2012 ). By considering more factors and not limited to overly reductive raw signals, saliency appears to be better equipped to avoid mischaracterizing researchers who strive for the quality and not the quantity of their publications.

Secondly, because the underlying foundation of an aggregated saliency is based on article-level analysis, interdisciplinary work seems to be better captured. One such example is the journal ranking on a given subject, say, Library Science. As shown for a while at Microsoft Academic website 8 , journals like Nature and Science are among the top 10 for this field when ranked by saliency. This may be a surprise to many human experts because these two journals are seldom considered as a publication venue for the field of library science. Indeed, if the journals are ranked by h-index, these two journals will appear in much lower positions because the numbers of articles in the field are lower in these two journals. However, a closer investigation shows that these two journals have influential articles in the field, such as the Leiden Manifesto ( Hicks et al., 2015 ) in Nature and the coercive citation studies ( Wilhite and Fong, 2012 ) in Science. If one were to understand the most impactful papers in this field, precluding these two journals into consideration would lead to unacceptable omissions and result in incomplete work. Again, this example highlights the known problem of using journals as the unit to conduct quantitative scientific studies, and the sharp focus into article-level analysis, as demonstrated feasible by saliency, appears to be a better option.

Prestige: Size-Normalized Saliency

A known issue existing in aggregate measurements is that the sheer number of data points being considered can often play an outsized role. This can be seen in Figure 5 where the author saliency largely agrees, especially for prolific authors, with the h-index, a metric designed to measure the impact as well as the productivity. As implied by (7), an author can reach a high saliency by having a large number of publications despite most of them receive only moderate recognitions. Given it has been observed that hyper-prolific authors exist ( Ioannidis et al., 2018 ), and their publications seem to yield uneven qualities ( Bornmann and Tekles, 2019 ), it might be helpful to juxtapose the saliency with a corresponding size-normalized version, which we call prestige, to further discern the two aspects. To be specific, the prestige of a non-publication entity can be derived from (7) as:

where A ¯ x , p = [ δ i j / ∑ j δ i j ] in contrast to (9). In short, the prestige of an entity is the average of the saliencies of its publications. Figure 6 illustrates the effect of the size normalization through the rankings of the world research institutions in the field of computer science based on the saliencies and the prestiges of their research papers published during the 5-year window between 2012 and 2016. The institutions that publish with consistent impacts are lined up along the main diagonal where the size normalization has negligible effect on their rankings. It appears the majority of the institutions are in this category. Scattered to the upper left of the diagonal, however, are those that are not the most prolific institutions but, when they do, their publications tend to be highly recognized by the research community. Size normalization, as expected, significantly boosts their rankings, for instance, as in the casesof Princeton University and Google. On the other hand, clustered to the lower right to the diagonal are the institutions that achieve high saliencies by publishing a large body of literature, as reflected in the relatively large bubble sizes in Figure 6 , and hence their rankings are negatively impacted by the size normalization operation.

www.frontiersin.org

Figure 5 . A scatter plot comparing the h-index and the saliency where each dot corresponds to the h-index and the saliency of a Microsoft author. Although the two metrics largely agree, the saliency measure is able to overcome a known limitation of the h-index and highlight authors who have published widely recognized work but not in large quantity.

www.frontiersin.org

Figure 6 . Saliency (horizontal) vs. Prestige ranking of the top 50 research institutions in computer science area. The size of each bubble corresponds to the number of publications included in computing the saliency and the prestige measurements.

As many GOTO-compliant ranking systems ( Berger et al., 2019 ) have discovered, one cannot over-emphasize that institution rankings are a highly sophisticated task that necessitates multiple perspectives and with varying degrees in granularities that commercial rankings such as US News & World Report are typically ill-equipped. To illustrate the point, Figure 7 shows the saliency-prestige rankings of institutions in the subfield of computer science, artificial intelligence, and its subfields of machine learning, computer vision and natural language processing. The recurring themes emerging from the high variances in the ranking results and significant differences in the top institutions strongly suggest that the ranking result of a field is a very poor predictor of its subfields. This is consistent with our observation that, within the subfields of computer science, the spectrum of research topics is so broad that institutions can choose to specialize into a selective few to have a strong and highly impactful research program. Consequently, ranking institutions at too broad a category amounts to comparing research on notably different fields that can have distinct publication culture and citation behaviors, i.e., is an apple-vs.-orange type of comparison. With new resources like MAS that can pinpoint each publication to very fine-grained fields of study, such a deeply-flawed methodology that was previously tolerated due to data scarcity should no longer be deemed acceptable and must be soundly rejected by the community.

www.frontiersin.org

Figure 7 . Institution Rankings, by Saliency (horizontal) vs. Prestige, for the field of Artificial Intelligence and its subfields.

The explosive growth in scholarly communications has made it more difficult for individual humans to keep track of the latest achievements and trends in scientific research. The warning signs are visible in the worsening qualities of research assessments involving expert opinions, as a recent CRA study showed. This article describes how MAS utilizes the advancements in AI to curate a good and open data set and enable transparent and objective methodologies (GOTO) for scientific studies on science. The AI components in MAS, in natural language understanding, in knowledge reasoning and inferences, and in reinforcement learning for estimating saliencies of entities in scholarly communications have been described. There are early indications that saliencies, an objective measure by harvesting the peer reviewed citation contexts, avoid many drawbacks of existing academic metrics.

Data Availability Statement

All datasets generated for this study are included in the article/ Supplementary Material .

Author Contributions

KW drafted the manuscript and coordinated the research project. ZS, RR, and DE supervised the MAG, MAS, and the MAKES portions of the work. CH reviewed the experimental setups and software, while the rest of the authors have equal contributions to the data collected in the work.

The authors declare that this study received funding from Microsoft Research. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

Conflict of Interest

All authors are employed by the company Microsoft Research.

Acknowledgments

Dr. Hao Ma led the efforts in creating many advanced features in MAG. Dr. Bo-June Paul Hsu led the team to develop the inference engine in MAKES, and, with the assistance of Dr. Rong Xiao, implemented the first version of the reinforcement learning to compute saliency. The work would not be possible without the strong support from Microsoft Bing engineering teams and colleagues in Microsoft Research labs around the globe.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fdata.2019.00045/full#supplementary-material

Supplementary Material S1. A randomly synthesized query sets to study the differences between the citation count based and the saliency based ranking behaviors.

Supplementary Material S2. Analytical script and the data to study the difference between the h-index and saliencies for authors.

1. ^ https://sfdora.org/

2. ^ See https://cra.org/cra-statement-us-news-world-report-rankings-computer-science-universities/

3. ^ https://docs.microsoft.com/en-us/academic-services/

4. ^ https://academic.microsoft.com

5. ^ https://paperswithcode.com/

6. ^ https://academic.microsoft.com/faq

7. ^ https://www.microsoft.com/en-us/research/project/academic/articles/understanding-documents-by-using-semantics/

8. ^ See the analytic page at https://academic.microsoft.com/journals/41008148,161191863

Berger, E., Blackburn, S. M., Brodley, C., Jagadish, H. V., McKinley, K. S., Nascimento, M. A., et al. (2019). GOTO rankings considered helpful. Commun. ACM 62, 29–30. doi: 10.1145/3332803

CrossRef Full Text | Google Scholar

Bornmann, L., and Tekles, A. (2019). Productivity does not equal usefulness. Scientometrics 118, 705–707. doi: 10.1007/s11192-018-2982-5

Carmel, D., Chang, M. W., Gabrilovich, E., Hsu, B. J. P., and Wang, K. (2014). “ERD'14: entity recognition and disambiguation challenge,” in Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (Gold Coast, QLD). doi: 10.1145/2600428.2600734

Cheung, M. K. (2013). Altmetrics: too soon for use in assessment. Nature 494, 176–176. doi: 10.1038/494176d

PubMed Abstract | CrossRef Full Text | Google Scholar

Evans, M., and Furnell, S. (2002). “A web-based resource migration protocol using WebDav,” in Proceedings of the WWW-2002 (Honolulu, HI).

Google Scholar

Franceschet, M. (2011). PageRank: standing on the shoulders of giants. Commun. ACM 54, 92–101. doi: 10.1145/1953122.1953146

Garfield, E. (1955). Citation indexes for science: a new dimension in documentation through association of ideas. Science 122, 108–111. doi: 10.1126/science.122.3159.108

Garfield, E. (1964). Science citation index- a new dimension in indexing. Science 144, 649–654. doi: 10.1126/science.144.3619.649

Garfield, E. (1972). Citation analysis as a tool in journal evaluation journals can be ranked by frequency and impact of citations for science policy studies. Science 178, 471–479. doi: 10.1126/science.178.4060.471

Gyöngyi, Z., and Garcia-Molina, H. (2005). Web Spam Taxonomy . Chiba: AIRWeb.

Harris, Z. S. (1954). Distributional structure. WORD 10, 146–162. doi: 10.1080/00437956.1954.11659520

Harzing, A. W., and Alakangas, S. (2017). Microsoft Academic: is the phoenix getting wings? Scientometrics 110, 371–383. doi: 10.1007/s11192-016-2185-x

Herrmannova, D., and Knoth, P. (2016). An Analysis of the Microsoft Academic Graph, D-lib Magazine 22, 6. doi: 10.1045/september2016-herrmannova

Hicks, D., Wouters, P., Waltman, L., Rijcke, S. D., and Rafols, I. (2015). Bibliometrics: the leiden manifesto for research metrics. Nature 520, 429–431. doi: 10.1038/520429a

Hug, S. E., and Brändle, M. P. (2017). The coverage of Microsoft academic: analyzing the publication output of a university. Scientometrics 113, 1551–1571. doi: 10.1007/s11192-017-2535-3

Hug, S. E., Ochsner, M., and Brändle, M. P. (2017). Citation analysis with microsoft academic. Scientometrics 111, 371–378. doi: 10.1007/s11192-017-2247-8

Ioannidis, J. P. A., Klavans, R., and Boyack, K. W. (2018). Thousands of scientists publish a paper every five days. Nature 561, 167–169. doi: 10.1038/d41586-018-06185-8

Kanakia, S. Z., Eide, D., and Wang, K. (2019). “A scalable hybrid research paper recommender system for microsoft academic,” in WWW '19 The World Wide Web Conference (New York, NY: ACM). doi: 10.1145/3308558.3313700

Kousha, K., Thelwall, M., and Abdoli, M. (2018). Can Microsoft Academic assess the early citation impact of in-press articles? A multi-discipline exploratory analysis. J. Informet. 12, 287–298. doi: 10.1016/j.joi.2018.01.009

López-Cózar, E. D., Robinson-García, N., and Torres-Salinas, D. (2014). The Google scholar experiment: How to index false papers and manipulate bibliometric indicators. J. Assoc. Inform. Sci. Technol. 65, 446–454. doi: 10.1002/asi.23056

Maslov, S., and Redner, S. (2008). Promise and pitfalls of extending Google's pagerank algorithm to citation networks. J. Neurosci. 28, 11103–11105. doi: 10.1523/JNEUROSCI.0002-08.2008

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). “Distributed representations of words and phrases and their compositionality,”in NIPS'13 Proceedings of the 26th International Conference on Neural Information Processing Systems , Vol. 2 (Lake Tahoe, NV), 3111–3119.

Piwowar, H. (2013). Altmetrics: Value all research products. Nature 493, 159–159. doi: 10.1038/493159a

Rougier, N. P., Hinsen, K., Alexandre, F., Arildsen, T., Barba, L. A., Benureau, F. C. Y., et al. (2017). Sustainable computational science: the ReScience initiative. PeerJ 3, 1–8. doi: 10.7717/peerj-cs.142

Shen, Z., Ma, H., and Wang, K. (2018). “A web-scale system for scientific knowledge exploration,” in Meeting of the Association for Computational linguistics (Melbourne, VIC), 87–92. doi: 10.18653/v1/P18-4015

Sinha, A., Shen, Z., Song, Y., Ma, H., Eide, D., Hsu, B. J. P., and Wang, K. (2015). “An overview of Microsoft Academic Service (MAS) and applications,” in Proceedings of the 24th International Conference on World Wide Web (Florence). doi: 10.1145/2740908.2742839

Thelwall, M. (2017). Microsoft Academic: a multidisciplinary comparison of citation counts with Scopus and Mendeley for 29 journals. J. Informet. 11, 1201–1212. doi: 10.1016/j.joi.2017.10.006

Thelwall, M. (2018a). Can Microsoft Academic be used for citation analysis of preprint archives? The case of the social science research network. Scientometrics 115, 913–928. doi: 10.1007/s11192-018-2704-z

Thelwall, M. (2018b). Does Microsoft Academic find early citations. Scientometrics 114, 325–334. doi: 10.1007/s11192-017-2558-9

Thelwall, M. (2018c). Microsoft Academic automatic document searches: accuracy for journal articles and suitability for citation analysis. J. Informet. 12, 1–9. doi: 10.1016/j.joi.2017.11.001

Traag, V. A., and Waltman, L. (2019). Systematic analysis of agreement between metrics and peer review in the UK REF. Palgrave Commun. 5:29. doi: 10.1057/s41599-019-0233-x

Turney, P. D., and Pantel, P. (2010). From frequency to meaning: vector space models of semantics. J. Art. Intell. Res. 37, 141–188. doi: 10.1613/jair.2934

Walker, D., Xie, H., Yan, K. K., and Maslov, S. (2007). Ranking scientific publications using a model of network traffic. J. Statist. Mech. 2007:6010. doi: 10.1088/1742-5468/2007/06/P06010

Waltman, L., and Eck, N. J. V. (2012). The inconsistency of the h-index. J. Assoc. Informat. Sci. Technol. 63, 406–415. doi: 10.1002/asi.21678

Waltman, L., and Yan, E. (2014). “PageRank-related methods for analyzing citation networks,” in Measuring Scholarly Impact , eds L, Waltman and E. Yan (Cham: Springer), 83–100. doi: 10.1007/978-3-319-10377-8_4

Wilhite, W., and Fong, E. A. (2012). Coercive citation in academic publishing. Science 335, 542–543. doi: 10.1126/science.1212540

Wilsdon, J. (2015). We need a measured approach to metrics. Nature 523, 129–129. doi: 10.1038/523129a

Keywords: microsoft academic services, microsoft academic graph, knowledge graph (KG), machine cognition, academic search, artificail intelligence (AI)

Citation: Wang K, Shen Z, Huang C, Wu C-H, Eide D, Dong Y, Qian J, Kanakia A, Chen A and Rogahn R (2019) A Review of Microsoft Academic Services for Science of Science Studies. Front. Big Data 2:45. doi: 10.3389/fdata.2019.00045

Received: 28 August 2019; Accepted: 18 November 2019; Published: 03 December 2019.

Reviewed by:

Copyright © 2019 Wang, Shen, Huang, Wu, Eide, Dong, Qian, Kanakia, Chen and Rogahn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Kuansan Wang, kuansanw@microsoft.com

Microsoft Quantum researchers make algorithmic advances to tackle intractable problems in physics and materials science

  • Share Microsoft Quantum researchers make algorithmic advances to tackle intractable problems in physics and materials science on X X
  • Share Microsoft Quantum researchers make algorithmic advances to tackle intractable problems in physics and materials science on LinkedIn LinkedIn
  • Share Microsoft Quantum researchers make algorithmic advances to tackle intractable problems in physics and materials science on Facebook Facebook
  • Share Microsoft Quantum researchers make algorithmic advances to tackle intractable problems in physics and materials science on Email Email
  • Print a copy of Microsoft Quantum researchers make algorithmic advances to tackle intractable problems in physics and materials science Print

Man smiling at camera

Senior Researcher

Wim_van_Dam__headshot

Wim van Dam

Principal Researcher

In a paper recently published in PRX Quantum, Microsoft Azure Quantum researchers Guang Hao Low and Yuan Su, with collaborators Yu Tong and Minh Tran, have developed faster algorithms for quantum simulation. One of the most promising applications of quantum computers is to simulate systems governed by the laws of quantum mechanics. Efficient quantum simulations have the potential to revolutionize many fields, including materials science and chemistry, where problems with high industrial relevance can be intractable using today’s supercomputers. Realizing this promise will require not only experimental progress, but also algorithmic advances that reduce the required quantum hardware resources. Doing so helps prepare our future scaled quantum computers to tackle challenging computational problems in the real world.

In their paper, Complexity of Implementing Trotter Steps , the authors improve upon pre-existing algorithms that rely on the so-called product formula methods, which date back to the 1990s when the first quantum simulation algorithm was proposed. The underlying idea is quite straightforward: we can simulate a general Hamiltonian system by simulating its component terms one at a time. In most situations, this only leads to an approximate quantum simulation, but the overall accuracy can be made arbitrarily high by repeating such Trotter steps sufficiently frequently.

Overcoming the complexity barrier

So, what are the resources needed to run this algorithm on a quantum computer? The algorithm repeats an elementary Trotter step multiple times, hence the total complexity is given by the number of repetitions multiplied by the cost per step, the latter of which is further determined by the number of terms in the Hamiltonian. Unfortunately, this is not very attractive for long-range quantum systems as the number of terms involved can be too big to be practical. Consider, for instance, a system with all-to-all interactions. If the size of the system is N, then the number of terms is N 2 , which also quantifies the asymptotic cost of Trotter steps. As a result, we are basically paying a quadratically higher cost to solve a simulation problem of just linear size. This issue becomes even worse for more general systems with many-body interactions. The question to ask then is—is there a better implementation whose cost does not scale with the total number of Hamiltonian terms, overcoming this complexity barrier?

The answer to this question, as the paper shows, is twofold. If terms in the Hamiltonian are combined with arbitrary coefficients, then this high degree of freedom must be captured by any accurate quantum simulation, implying a cost proportional to the total term number. However, when the target Hamiltonian is structured with a lower degree of freedom, the paper provides a host of recursive techniques to lower the complexity of quantum simulation. In particular, this leads to an efficient quantum algorithm to simulate the electronic structure Hamiltonian, which models various important systems in materials science and quantum chemistry.

microsoft research papers

Recursive techniques have played an essential role in speeding up classical algorithms, such as those for sorting, searching, large integer and matrix multiplication, modular exponentiation, and Fourier transformations. Specifically, given a problem of size N, we do not aim to solve it directly; instead, we divide the target problem into M subproblems, each of which can be seen as an instance of the original one with size N/M and can be solved recursively using the same approach. This implies that the overall complexity C(N) satisfies the relation: C(N) = M C(N/M) + f(N), with f(N) denoting the additional cost to combine solutions of the subproblems. Mathematical analysis yields that, under certain realistic assumptions, the overall complexity C(N) has the same scaling as the combination cost f(N) up to a logarithmic factor—a powerful result sometimes known as “the master theorem.” However, combining solutions can be much easier to handle than solving the full problem, so recursions essentially allow us to simplify the target problem almost for free!

Given the ubiquitous nature of recursions in classical computing, it is somewhat surprising that there were not many recursive quantum algorithms available. The paper from Low, Su, and collaborators develops recursive Trotter steps with a much lower implementation cost, suggesting the use of recursion as a promising new way to reduce the complexity of simulating many-body Hamiltonians.

Quantum solutions

The paper’s result applies to a variety of long-range interacted Hamiltonians, including the Coulomb interaction between charged particles and the dipole-dipole interaction between molecules, both of which are ubiquitous in materials science and quantum chemistry—a primary target application of quantum computers. In physics, impressive controls in recent experiments with trapped ions, Rydberg atoms, and ultracold atoms and polar molecules have enabled the possibility to study new phases of matter, which contributes to a growing interest in simulating such systems.

This research is part of the larger quantum computing effort at Microsoft. Microsoft has long been at the forefront of the quantum industry, serving as a pioneering force in the development of quantum algorithms tailored for simulating materials science and chemistry. This includes earlier efforts using quantum computers to elucidate reaction mechanisms in complex chemical systems targeting the open problem of biological nitrogen fixation in nitrogenase, as well as more recent quantum solutions to a carbon dioxide fixation catalyst with more than one order of magnitude savings in the computational cost.

The new results from the current work represent Microsoft’s continuing progress to develop solutions for classically intractable problems on a future quantum machine with Azure Quantum .

microsoft research papers

Azure Quantum

Accelerating scientific discovery.

  • Want to learn more about Azure Quantum? We invite you to check-out our Microsoft Quantum Innovator Series .
  • If you are interested in connecting, with our Azure Quantum team, please reach out at: [email protected] .

Related blog posts

How microsoft and quantinuum achieved reliable quantum computing  .

Today, Microsoft is announcing a critical breakthrough that advances the field of quantum computing by Read more

Responsible computing and accelerating scientific discovery across HPC, AI, and Quantum  

The technological landscape can evolve quickly, and early adoption of governance and risk mitigation measures Read more

Microsoft and 1910 Genetics partner to turbocharge R&D productivity for the pharmaceutical industry  

Unprecedented collaboration will build the most powerful, fully integrated, AI-driven drug discovery and development platform Read more

Follow Microsoft

  • Get filtered RSS
  • Get all RSS

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Big Data

Logo of frontbigdata

A Review of Microsoft Academic Services for Science of Science Studies

Associated data.

All datasets generated for this study are included in the article/ Supplementary Material .

Since the relaunch of Microsoft Academic Services (MAS) 4 years ago, scholarly communications have undergone dramatic changes: more ideas are being exchanged online, more authors are sharing their data, and more software tools used to make discoveries and reproduce the results are being distributed openly. The sheer amount of information available is overwhelming for individual humans to keep up and digest. In the meantime, artificial intelligence (AI) technologies have made great strides and the cost of computing has plummeted to the extent that it has become practical to employ intelligent agents to comprehensively collect and analyze scholarly communications. MAS is one such effort and this paper describes its recent progresses since the last disclosure. As there are plenty of independent studies affirming the effectiveness of MAS, this paper focuses on the use of three key AI technologies that underlies its prowess in capturing scholarly communications with adequate quality and broad coverage: (1) natural language understanding in extracting factoids from individual articles at the web scale, (2) knowledge assisted inference and reasoning in assembling the factoids into a knowledge graph, and (3) a reinforcement learning approach to assessing scholarly importance for entities participating in scholarly communications, called the saliency, that serves both as an analytic and a predictive metric in MAS. These elements enhance the capabilities of MAS in supporting the studies of science of science based on the GOTO principle, i.e., good and open data with transparent and objective methodologies. The current direction of development and how to access the regularly updated data and tools from MAS, including the knowledge graph, a REST API and a website, are also described.

Introduction

Centuries of scientific advancements have been a result of a virtuous cycle where scientists meticulously collect observation data to deduce a theoretical model and then use the model to predict new experimental outcomes as a means to validate the theory. This scientific principle has been applied to study the science of science, namely, the development of science itself, a field that sees pioneers like Eugene Garfield at the Institute for Scientific Information (ISI, now part of Clarivate Analytics) (Garfield, 1955 , 1964 , 1972 ). Driven by the insights that scientific advancements inevitably leave abundant traces in the scholarly communications that often manifest themselves in the form of citations, a central topic in the science of science has been deriving quantitative models from citations for the purpose of analyzing and understanding the impacts of scientific work. Historically, citations made in the main body of an article have been difficult to collect so the bibliography has been used in their stead. Implicitly, this practice assumes the relations among publications can be approximated by the pairwise Boolean measures between the citing and the cited articles. Such an approximation is found to be too reductive in contrast to peer reviews for article-level assessments (Wilsdon, 2015 ), although there is evidence suggesting noises in such a simplified model may be “canceled out” through aggregations at a level higher than individual articles (Traag and Waltman, 2019 ). Indeed, the most widely used bibliometrics, such as the journal impact factor (JIF) or the h-index, are by design aggregate measures at the journal or the author level. However, the demands for article-level metrics are so strong that they make popular a practice assuming articles in the same journal are equal in quality and the aggregate metrics for the journal can serve as a proxy for the articles published therein. Its adverse effects are so profound and misuses so pervasive that renowned institutions and thought leaders have found it necessary to proclaim the San Francisco Declaration of Research Assessment (DORA) 1 to publicize a strong stance against using journal-level metrics for research assessments. A widely accepted good model to understand the impacts of individual publications has yet to be found.

Another challenge in the study of science of science is the explosive growth in the volume of scientific reports and the diversity of research topics. These have outstripped the cognitive capacity of human beings to properly digest and catch up. This cognitive overload ostensibly impacts everyone, including those employed by vendors to curate data and develop commercial platforms for science of science studies. As a result, errors and omissions in manually curated data are abundant, eroding the trustworthiness of studies conducted on those platforms. Most frustratingly, the proprietary and opaque nature in the commercial systems prevent recourses when obvious errors are spotted. As data-driven decision-making processes have become more prevalent in recent years, the platform quality has become a serious issue that prompts the Computing Research Association (CRA) to release a stern statement on the worsening state of commercial data and call for actions against unscientific practices based on or leading to flawed data 2 . In their report (Berger et al., 2019 ), a CRA working group illustrates faulty data from Clarivate Analytics and surveys from humans no longer up to date in their fields may have led US News & World Report to produce abhorrent rankings on research universities that can severely mislead students in making school choices and funders in allocating resources. Similar to DORA, the CRA working group publishes a set of guidelines urging the adoption of Good and Open data with Transparent and Objective methodology, known as the GOTO principle, in conducting and publishing the results of quantitative science of science studies.

This article describes Microsoft Academic Services (MAS), a project in Microsoft Research with an aim to support researchers to follow the GOTO principle. Having evolved from the initially disclosed in (Sinha et al., 2015 ), MAS now consists of three parts: an open dataset known as Microsoft Academic Graph (MAG) 3 , a freely available inference engine called Microsoft Academic Knowledge Exploration Service (MAKES), and a website called Microsoft Academic 4 that provides a more human friendly interface to MAKES. MAS is a part of an ongoing research that explores the nature of cognition, a topic in artificial intelligence (AI) that studies the mental capacity in acquiring, reasoning and inferencing with knowledge. The research is motivated by the observation that cognition involves the capabilities of memorizing, computing, being attentive, and staying focused on the task at hand, all of which can be programmed to the modern computer to outperform humans. Particularly for MAS, the project explores the boundary within which the power of machines can be harnessed to understand the scholarly communications observable on the web. In other words, MAS aims at developing AI agents that are well-read in all scientific fields and hopefully can become trustable consultants to human researchers on matters of scholarly activities taking place on the web. In this sense, the MAG component in MAS is the outcome of the knowledge acquisition and reasoning and MAKES, the capability of machine inferencing with the knowledge in MAG. The dataset MAG is distributed and frequently updated under an open data license and the inference algorithms in MAKES are published in relevant peer-review venues and summarized later in this article.

Aside from being open in data and transparent in algorithm as per the GOTO principle, MAS actively uses technologies to capture scholarly communication activities with adequate quality and coverage to strive for a good platform. To address the explosive growth in scientific research, MAS employs the state-of-the-art AI technologies, such as natural language understanding, to extract the knowledge from the text of these publications. This allows MAS to always take a data-driven approach in providing consistent data quality and avoid manual efforts that are often the source of subjective controversies or errors. Knowledge extraction in MAS goes beyond simply indexing key phrases to recognize and disambiguate the entities underpinning scholarly communications. MAS currently includes entities that describe who supported by which institutions have made what claims in which publication at which instance of which venue , as illustrated in Figure 1 . With more scholarly communications being conducted online with data and software tools, the definition of publication in MAS has been expanded. Aside from the traditional forms such as books, journals and conference papers, MAS has recognized datasets and software packages as additional forms of publications. Additionally, as plenty of scholarly work exerts impacts through commercial exploitation preceded by patent applications, MAS has also included them as publications. These new resources fit well into the model of publication entity in Figure 1 because they all have authors, affiliations, topical contents, etc., and can receive citations. In addition to extracting these entities, a key mission of knowledge extraction is to recognize the relations among the entities, such as the citation contexts characterizing how the work in one publication is received by others citing it. As schematized in Figure 1 , these entities and their relations are represented in a graph structure as the nodes and edges, respectively, leading to the name of MAG. Note that the entity recognition and disambiguation (ERD), as reported in (Carmel et al., 2014 ), is far from a solved problem. However, the key here is the AI technologies employed in MAS are designed to learn and improve by itself by repeatedly reading more materials than any human can possibly do in a lifetime. After years of self-improving, many independent studies have suggested that MAG data are in many aspects as accurate, if not more, than manually curated data (Herrmannova and Knoth, 2016 ; Harzing and Alakangas, 2017 ; Hug and Brändle, 2017 ; Hug et al., 2017 ; Thelwall, 2017 , 2018a , b , c ; Kousha et al., 2018 ).

An external file that holds a picture, illustration, etc.
Object name is fdata-02-00045-g0001.jpg

The data model of scholarly communications in MAS where the nodes represent the entity types modeled in MAG, and the simple and block arrows depict one-to-one and one-to-many relations among the entities, respectively.

Secondly, MAS uses technologies for scale, particularly when the lack of coverage in many datasets is becoming ever more concerning. While it might be appropriate in the last century for human experts to manually select only some of the scholarly communications into a database, this practice may have finally lived out its usefulness as the case studies in the CRA report have shown. Furthermore, with the advancements in information technology, online publishing has become a widely adopted medium for scientists to communicate with one another. Important activities, including self-archiving, data and software sharing, and community efforts dedicated to reproducing previously published results [e.g., Papers with Code 5 , ReScience (Rougier et al., 2017 )] are taking place exclusively on the web. A modern dataset therefore must be able to capture all these web-only activities to properly reflect the current state of the reality, and it is hard to fathom how all these capturing efforts can be accomplished by hand. MAS provides an encouraging example that technologies can help in this area.

The key to MAS is large-scale deployment of AI agents in understanding scholarly communications. Therefore, the rest of the article is devoted to describing the methodologies so that the characteristics of MAS can be better understood. The AI technologies used in MAS, as illustrated in Figure 2 , encompass three areas: (1) natural language understanding, including ERD and concept detection to extract factoids from individual publication and to fulfill queries in MAKES, (2) knowledge reasoning to organize the factoids into MAG, and (3) a reinforcement learning system to learn a probabilistic measure called the saliency that facilitates the statistical learning and inferences in the above two areas.

An external file that holds a picture, illustration, etc.
Object name is fdata-02-00045-g0002.jpg

AI and service components in MAS are comprised of two feedback loops, one to grow the power of acquiring knowledge in MAG and the other to assess the saliency of each entity in MAG. In the first loop, each publication on the web is first processed by the MAG assisted entity recognition and disambiguation as described in (1). As the raw entities and their relations are extracted from individual publications, semantic reasoning algorithms are then applied to conflate them into a revised graph, including the concept hierarchy from all the publications. The revised MAG is then used in the next run to better extract entities from publication. The second loop utilizes the citation behaviors as the rewarding target for a reinforcement learning algorithm to assess the importance of each entity on MAG based on the network topology. The quantitative measure, called the saliency, serves as a ranking factor in MAKES, a search and recommendation engine for MAG.

Entity Recognition and Disambiguation

Central to MAS is the quest to harness the power of machine to acquire knowledge from written text. As alluded previously, the knowledge acquisition task amounts to recognizing the lexical constructs of the semantic objects representing either entities or relations. To be more precise, the task of natural language understanding in MAS is formulated as a maximum a posteriori (MAP) decision problem:

where the input x = ( w 1 , w 2 , ⋯ ) is a word sequence of a natural language expression, K is a knowledge base, and the task is to find the best output ŷ = ( e 1 , e 2 , ⋯), e i ∈ K , that is a sequence of semantic objects. For example, suppose the input is a sentence “HIV causes AIDS.” The ideal output should consist of two entities “HIV” and “AIDS,” and a relation “causing” between them.

The MAP decision is known to be optimal provided the posterior probability distribution in (1) can be accurately estimated. While this can be done directly, MAS uses a mathematically equivalent approach, known as the generative modeling, where the Bayes rule is applied to (1) to rewrite the MAP decision as:

with P ( x | y, K ) and P ( y | K ) the semantic language and the prior models, respectively. The semantic language model characterizes how frequently a sequence of semantic objects y is expressed through the word sequence x . Typically, an entity is lexicalized by a noun phrase while a relation, a verb phrase. MAS, however, does not utilize the syntax structure of natural language but, rather, just assumes that the lexical realization of each semantic object is statistically independent of one another, namely:

where x i denotes the i -th phrase segment in x corresponding to e i . Essentially, the semantic language model characterizes the synonymous expressions for each semantic object e i and how likely each of them is used. For example, the journal “Physical Review Letters” can be referred to by its full name, a common abbreviation “Phys Rev Lett,” or simply the acronym “PRL,” and an author can be mentioned using the last name, the first name or just its initial with an optional middle initial. The bibliography section, the text body and the web pages of a paper all provide abundant materials to harvest synonymous expressions. With large enough data samples, it appears adequate in MAS to use a simple maximum likelihood estimation, i.e., frequency counts with statistical smoothing, for the synonym model P (· | e i , K ).

The semantic prior model P ( y | K ) assesses the likelihood of a certain combination of semantic objects that can be derived from the knowledge base. In a way, the brunt of the statistical independent assumption in (3) is lessened because the contextual dependencies leading to a viable semantic interpretation are strictly enforced here. This can be seen by applying the chain rule of conditional probability to further decompose the semantic prior model as:

where P ( e 1 | K ) is the saliency of the entity e 1 and P ( e i | e i −1 , ⋯ e 1 , K ) is the semantic cohesion model according to the knowledge K . In conjunction with the synonym model, the semantic cohesion model can be estimated directly from data with an additional constraint that assigns zero probability to implausible semantic object combinations. This constraint plays a critical role in reducing the degree of ambiguities in understanding the input. For example, “Michael Evans” with a missing middle initial is a very confusable name, and “WWW” can mean a conference organized by IW3C2, a journal (ISSN: 1386-145X or 1573-1413), or even as a key word in the title of a paper. However, there are only two authors, a “Michael P. Evans” and a “Michael S. Evans” that have ever published any papers in the WWW conference, in 2002 and the other in 2017, respectively, and never in the namesake journal or any paper containing “WWW” as a key term in all other publication venues. If the publication year is also present, the apparently ambiguous input “Michael Evans (Evans and Furnell, 2002 )” can be precisely resolved into the entity referring to the author named “Michael P. Evans” that has published a paper in “the eleventh International World Wide Web Conference” held in Honolulu Hawaii in the year of 2002. Using the knowledge-imposed constraints is particularly effective for author disambiguation when the technique is applied to understand curricula vitae or author homepages posted on the web. Assuming each such web page belongs to a single author, the publications listed therein are often high-quality signals to ascertain the identity of the author from the namesakes.

The manner that the knowledge is utilized in (4) also allows MAS to identify and acquire new synonymous expressions for existing entities and, often, new entities. This capability in acquiring new knowledge without human intervention is the key for MAS to enrich itself gradually. Mathematically, let K t denote the knowledge base used in (1) leading to the understanding of the scholarly materials y t at time t , the knowledge enrichment in MAS at an interval of Δ t , is also formulated as the MAP decision:

The iterative process of (5) in MAS can be better appreciated through a common task in parsing the bibliography where the author intent is to refer to a publication with a sequence of references to authors, followed by an optional publication title and a reference to a publication venue. The manners with which a reference is made, however, is highly inconsistent. When the semantic knowledge is applied to parse an input such as “Zhihong Shen, Hao Ma, and Kuansan Wang, ACL-2018,” it allows MAS to recognize fragments of the text, say, “Hao Ma” or “Kuansan Wang” as authors as they are frequently seen in the knowledge base. With these anchors, MAS can use (4) to infer that “Zhihong Shen” and “ACL-2018” are likely references to another author and the venue, respectively. These inferences can be made even before the publication records of ACL-2018 are included into the knowledge base and can be used with (5) to grow new entities in MAS.

While MAG only publishes the canonical expression for each entity, MAKES includes the probabilistic models that we derive from all the raw materials mentioned above. A step-by-step examination of (2) can be conducted in the query input box at the Microsoft Academic website where, upon each character entered, an API call is made into MAKES to analyze the semantic intent of the typed input with the MAP decision rule described in (2). Top interpretations manifesting themselves as query completions or suggestions are displayed to the user as a means for query intent, disambiguation or confirmation. More details are described in the FAQ page of the website 6 .

Concept Detection and Taxonomy Learning

Like many complex systems, the relations among entities in the scholarly communications ( Figure 1 ) cannot fully capture the activities because the semantics of the communications is encoded not in the topology but in the natural language contents of the publications. To address this issue, MAS adopts an entity type, called concepts [called “fields of study” in Sinha et al. ( 2015 )], to represent the semantic contents of a document. Unlike physical entities such as authors and affiliations, concepts are abstract and hence have no concrete way to define them. Furthermore, concepts are hierarchical in nature. For example, “machine learning” is a concept frequently associated with “artificial intelligence” that, in turn, is a branch of “computer science” but often intersects with “cognitive science” in “psychology.” Accordingly, a taxonomy must allow a concept to have multiple parents and organize all concepts into a directed acyclic graph (DAG). While concepts can be associated with all types of physical entities, say, to describe the topics of interest of a journal or the fields of expertise of a scholar, MAS only infers the relations between a publication and its concepts directly and leaves all others to be indirectly aggregated through publications.

A survey on the concepts taxonomy used in major library systems, presumably developed by human experts, suggests that few of them are compatible with each other. The low agreement among human experts leads MAS to create a concept taxonomy by itself solely from the document collection. As there are close to 1 million new publications a month being added in recent months, the machine learned taxonomy is dynamically adjusted on a regular basis so that new concepts can be added and obsolete concepts can be retired or merged with others.

Concept detection is a natural language understanding problem and, therefore, its mathematical foundation is also governed by (1). Unlike the ERD problem, however, the ideal output y ^ in this case is an unordered collection of DAGs of concepts rather than a sequence of semantic objects, and the textual boundaries of a concept in x are intrinsically soft, i.e., phrase segments can overlap. MAS therefore employs the approach to directly estimate the probabilistic distribution in (1) from the text rather than going through a generative model of (2). As detailed in the recent publication (Shen et al., 2018 ), the key concept underlying the MAS approach here is the distributional similarity hypothesis proposed in 1950's (Harris, 1954 ), which observes that semantically similar phrases tend to occur in similar contexts. There have been plenty of methods reported in the literature demonstrating the efficacy of applying distributional similarity for concept detection, either by training a hierarchical classifier mapping a sequence of discrete words directly into concepts, or by the embedding method that first converts the text into a vector representation with which learning and inferences can be conducted in a vector space (Turney and Pantel, 2010 ). When properly executed, semantically similar phrases can be transformed into vectors close to one another, simplifying the synonymous expression detection, needed for (3), into a nearest neighbor search. In other words, the probabilistic distribution of synonyms P (·| e i , K ) can be estimated by the distance in the vector space. Recently, the embedding methods have produced many surprising results, starting with (Mikolov et al., 2013 ; Berger et al., 2019 ), that contribute to a renaissance of the vector space model thanks to the availability of big data and powerful computational resources. The current practice in MAS, however, has found it more powerful to combine both the discrete and the vector space approaches into a mixture model for concept learning (Shen et al., 2018 ).

The concept detection software in MAS has been released as part of the MAG distribution. The package, called Language Similarity 7 , provides a function with which the semantic similarity of two text paragraphs can be quantified using the embedding models trained from the publications in the corresponding MAG version. This function in turn serves as a mixture component for another function that, for any paragraph, returns a collection of top concepts detected in the paragraph that exceed a given threshold. Again, interested readers are referred to the recent article (Shen et al., 2018 ) for technical details.

Network Semantics Reasoning

As MAS sources its materials from the web notorious for its uneven data qualities, duplicate, erroneous and missing information abounds. Critical to MAS is therefore a process, called conflation, that can reason over partial and noisy information to assemble the semantic objects extracted from individual documents into a cohesive knowledge graph. A key capability in conflation is to recognize and merge the same factoids while adjudicating any inconsistencies from multiple sources. Conflation therefore requires reasoning over the semantics of network topology and many of the techniques in MAS described in Sinha et al. ( 2015 ) are still in practice today.

Recently, a budding research area focuses on extending the notion of distributional similarity from its natural language root to the network environment. The postulation is straightforward: similar nodes tend to have similar types of edges connecting to similar nodes. Similar to the natural language use case of representing entities and relations as vectors, the goal of this approach is to transform the nodes and edges of a network into vectors so that reasonings with a network can be simplified and carried out in the vector space with algebraic mathematics. Network semantics, however, is more complicated than natural language whose contextual relations are single dimensional in nature: a phrase is either left or right to another. A network has a higher order topology because a node can simultaneously connect to a wide variety of others with edges representing distinctive relations. Citation network is a simple example where one paper can be cited by two others that also have a citation relation between them. Citation network is considered simple as it only has a single type of nodes, publication, and a single type of relation, citing. In reality, scholarly communications also involve people, organizations, locations, etc. that are best described by a heterogeneous network where multiple types of nodes are connected by multiple types of edges, making the notion of distributional similarity more sophisticated. The research in heterogenous network semantics reasoning, especially in its subfields of network and knowledge graph embedding, is ongoing and highly active.

MAS has been testing the network embedding techniques on related entity recommendation and found it essential for each entity to have multiple embeddings based on the types of relations involved in the inferences. In other words, embedding is sensitive to the sense defining similarity. For example, two institutions can be regarded as similar because their publications share a lot in common either in contents, in authorships, in venues, or are being cited together by same publications or authors. The multitude of senses of similarity leads to multiple sets of embeddings, of which results are included in MAG distributions. As the research in this area is still ongoing and the techniques by no means matured, MAS applications can achieve better results by combining the embedding and the discrete inference techniques. One such example is reported in a recent paper (Kanakia et al., 2019 ) that describes the method behind the current related publication recommendation in MAS. The user studies in this application show the best system uses both the distance of the text embeddings and the frequency of being cited together.

Assessing Entity Importance With Saliency

As the MAP decision in (1) also drives MAKES to rank the results y in response to a query x , the entity prior P ( e | K ) in (4) is a critical component for MAKES. The way the entity prior is estimated determines in which sense the ranking is optimized. Ideally, the prior should be the importance the entity has been perceived by the scholarly community in general. Recently, a new area of research, lumped under the name altmetrics (Piwowar, 2013 ), has been advocating that the searching, viewing, downloading, or endorsement activities in the social media for a publication should be included in estimating the importance of the scholarly work. Having monitored these activities for the past few years, we have found altmetrics a good indicator gauging how a publication has gained awareness in the social media. Although being known is a necessary step for being perceived as important, our observations cannot exclude the possibility that a publication is searched and viewed more because it is repeatedly mentioned in another highly regarded work, or authored by influential scholars or even just from reputable organizations. Based on our observations and concerns about altmetrics in the community (e.g., Cheung, 2013 ), the current focus in MAS is on exploiting the heterogeneity of scholarly communications mentioned above to estimate the entity prior by first computing the importance of a node relative to others of the same type and then weighting it by the importance of its entity type.

Saliency: An Eigencentrality Measure for Heterogeneous Dynamic Network

The eigenvector centrality measure, or simply eigencentrality, has been long known as a powerful method to assess the relative importance of nodes in a network (Franceschet, 2011 ). Developed in the early twentieth century, eigencentrality measures the importance of a node relative to others by examining how strongly this node is referred to by other important nodes. Often normalized as a probabilistic measure, eigencentrality can be understood as a likelihood of a node being named as most important in a survey conducted on all members in the network. The method is made prominent by Google in its successful adaptation of eigencentrality for its PageRank algorithm: the PageRank of a webpage is measured by the proportional frequency of the incoming hyperlinks weighted by the PageRank of the respective sources. In a distinct contrast to simple citation counts, two important considerations in PageRank are the frequency of mentions in the citing article counts, and the importance of the citing source matters. Google has demonstrated that PageRank can be successfully used to assess the importance of each web document.

There are, however, two major challenges in using the eigencentrality as an article-level metric in general. First, the eigencentrality is mathematically well-defined only if the underlying network is well connected. This mathematical requirement is often not met in real-life, neither in the citation networks nor the web graph. To tackle this problem, Google introduced a “teleportation” mechanism in PageRank in which the connection between two web pages is only 85% dependent on the hyperlinks between them. The rest of the 15%, called the teleportation probability, is reserved for the assumption that all webpages are connected to each other intrinsically and uniformly. While the teleportation mechanism serves Google well, it is found to be fragile and implausible for the citation network (Walker et al., 2007 ; Maslov and Redner, 2008 ): the ranking of scholarly publications is overly sensitive to the choice of the teleportation probability, and the best choice suggests scientists only follow the bibliography half the time, with the other half randomly discovering articles from the entire research literature following a uniform distribution. Many PageRank inspired studies, as recently reviewed in (Waltman and Yan, 2014 ), have also made the same observation and proposed remedies utilizing the heterogeneity of the scholarly communication network. They mostly, however, are in an early exploratory stage as the manners in modeling the heterogeneous interactions still contain many heuristics needed to be further validated. Secondly, even the well-connected issue can be addressed through a heterogeneous model, another challenge, as pointed out by many (e.g., Walker et al., 2007 ), is how to avoid treating eigencentrality as a static measure so that the time differences in citations can be taken into account. It is undesirable to treat an article that receives the last citations long ago as equal to one that has just received the same amount of citations today because results without a proper temporal adjustment exhibit a favorable bias toward older publications that have more time to collect citations.

MAS attacks these two challenges with a unified framework called saliency based on the following considerations. First, to address the underlying network as changing in time, saliency is defined as the stochastic process characterizing the temporal evolution of the individual eigencentrality computed from a snapshot of the network. Without making assumptions on its form, the autoregressive moving-average (ARMA) process, mathematically known to be able to approximate a non-stationary distribution to any precision with enough orders, is used to model the temporal characteristics of saliency. Surprisingly for MAS, a simple first order autoregressive (AR) process seems sufficient for the model to reach an ergodic solution (to be shown below), suggesting that the endorsement power of a citation can be treated as simply as an exponential decay with a constant half-life interval. This finding is a validation of the observation first reported in (Walker et al., 2007 ).

Secondly, to account for the heterogeneity of the network, MAS uses a mixture model in which the saliency of a publication is a weighted sum of the saliencies of the entities related to the publication. By considering the heterogeneity of scholarly communications, MAS allows one publication to be connected to another through shared authors, affiliations, publication venues and even concepts, effectively ensuring the well-connectedness requirement is met without introducing a random teleportation mechanism. Mathematically, let s x ( t ) denote the saliency vector of the entities of type x at time t , with x = p specifically for the publication, the heterogeneous mixture model coupled with an AR process leads to:

where Δ t is the interval between the two successive network snapshots are taken, w p, x the (non-negative) weight of a type x node on the publication, τ the time decaying factor in the AR process, and A p, x the adjacency matrix characterizing the connection strength between a publication to any entity of type x . Currently, MAS considers all nodes of types x ≠ p to have equal connection to the publication, e.g., given a publication all of its authors and affiliations are treated as contributed equally to the saliency of the publication. In the meantime for publications citing one another, A p, p is set proportional to the number of mentions in the text body of the citing article to the cited work.

As the heterogeneous model treats the saliency of a publication as the combined saliencies of all entities related to it, s p ( t ) is therefore a joint probabilistic distribution. Accordingly, the saliency of a non-publication entity can be obtained by marginalizing the joint distribution, i.e.,

where A x, p = [δ ij ] and

Again, the current MAS implementation does not address how the credit of a publication should be assigned unevenly to its authors based on the author order as (8) implies all authors have equal contributions, the side effect of which, however, is each institutions associated with a publication will receive its credit proportional to the number of authors affiliated with the institution. Ostensibly, a more sophisticated model than (8) can be used where, for instance, the author sequence can play a role in determining δ ij . MAS reports the author sequence as well as the affiliation sequence for authors with multiple affiliations, but has not yet used them for the purpose of computing saliency.

Estimating Saliency With Reinforcement Learning

To avoid making a strong assumption that the latent variables τ and w p, x are constant, MAS uses reinforcement learning (RL) to dynamically choose the best values based on the reinforcement signals streaming in through the observations. The choice is motivated by the fact that the RL technique is known to be effective in tackling the exploitation vs. exploration tradeoff, which in MAS means a balanced treatment between the older and newer publications or authors that have unequal time to collect their due recognitions. Often, the challenge of applying RL is the reinforcement signals are hard to obtain. This is fortunately not the case in MAS because approximately half a million new publications with tens of million citations are discovered every 2 weeks (= Δ t ), and these new observations provide ample materials to be reinforcement signals. Assuming the scholar communications are eventually just, namely, more important publications will receive higher citations in the long run ( NΔt, N ≫ 1), the goal of the RL in MAS is to maximize the agreement between the saliencies of today and the citations accumulated NΔt into the future. Currently, MAS uses the maximum mutual information (MMI) as the quantitative measurement for the agreement, namely, if c ( t ) denotes the vector of citation mention counts for all publication, the objective of the RL in MAS is to find:

where < ·, · > denotes the inner product. The choice of MMI allows (9) to be a convex function so that it can be iteratively solved with a quasi-Newton method. An off-the-shelve software implementing a L-BFGS algorithm is used in MAS.

It is a surprise that, by choosing long enough future N , the solutions to the latent variables τ and w p, x appear to be quite steady over time with the simplest form of ARMA process: 1 st order autoregression and no moving average. This apparent ergodicity allows MAS to administer the RL with a delay of NΔt ≈5 years, namely, the latent variables in (6) can be obtained by using the data observed up to 5 years ago to predict the citations of the recent 5 years. The results, as shown in Figure 3 , suggests that generally a publication accrues its saliency from citations at a weight slightly more than 92%, although the factors of its authors, affiliations, publication venues and even topics are non-trivial. Along the time domain, the value of τ, hovering around 0.9, corresponds to a temporal decay in saliency with a half-life of 7.5 years. In contrast to previously studies where the citations only account for 50% of the weight (Walker et al., 2007 ; Maslov and Redner, 2008 ) or with a very short decay from 1 to 2.6 years (Walker et al., 2007 ), the RL results in MAS are a much less dramatic departure from the common practice of using citation counts as an article level metric where, effectively, the metric is computed with a 100% weight in citations that do not decay over time.

An external file that holds a picture, illustration, etc.
Object name is fdata-02-00045-g0003.jpg

The longitudinal values of the latent variables underlying the saliency as obtained by the reinforcement learning (RL) algorithm. These latent variables correspond to the weighting the algorithm has to exert on each entity type in order to predict the future citation behaviors most optimally in the sense of Maximum Mutual Information, as described in (9). The model shows citations remain the dominant factor to have a high saliency. Despite a relatively simple configuration, the model exhibits remarkable stability over the 35 months shown in this figure other than the two instances, in April 2017 and July 2018, when MAG changed its treatment on affiliations dramatically.

As also shown in Figure 3 , the RL is not impervious to major changes in the underlying data, such as the treatments to author affiliations. In May 2017, a so-called “inferred affiliation” feature was introduced to MAG where authors with unknown affiliations were associated with the “most likely” institutions inferred from their recent publication records. An overly optimistic threshold led to many inaccurate projections, and the RL responded to the degradation in quality by lowering the affiliation weight and shifting it to citations. In July of the following year, the MAG data schema is altered to allow a single author to have multiple affiliations and all of which receive equal attribution from the publications by the author. Such a more faithful characterization of author affiliations leads to a boost of the affiliation weight from 1.5 to 3.5%, suggesting the RL mechanism finds the affiliation information more useful.

Properties of Saliency

The saliencies obtained with (6) and (7) are reported in MAG at each update interval for all entities in a quantized form of −1000ln s ( t ), and are used as the entity prior in the MAP decision (1) in MAKES that can be examined through the search and analytics results at Microsoft Academic websites. All these tools can be valuable for more and deeper investigations to fully understand the properties of saliency as a potential metric. For example, by design s p ( t ) further discriminates the following three citation behaviors not considered in the simple citation count: the number of mentions in the citing article, the age of the citations received, and the non-citation factors that can alleviate the disadvantages for newer publications. The combined effects of these three aspects on the article-level assessment can be further studied by inspecting the results from (1) with synthesized queries. Figure 4 shows a typical outcome of 20% disagreement in the ranking position differences between saliency and citation count based rankings using the query set ( Supplementary Material S1 ). A quick examination into the disagreements confirms that a publication can have a higher saliency, albeit lower citation counts, because it is cited by more prestigious or more recent work as designed. Where these disagreements are desired, however, is a question worth exploring.

An external file that holds a picture, illustration, etc.
Object name is fdata-02-00045-g0004.jpg

Histogram of ranking position by citation counts (CC) and saliencies (top) and their differences (bottom). Although future citation counts are the target for best estimating the saliencies, they only agree on the publication rankings roughly 80 percent of the time, demonstrating the effects of non-citation factors ( Figure 3 ) in the design of saliency. In contrast to citation counts, saliencies are sensitive to the venues, the authors, the concepts, and the recencies of the citing sources.

The design to unshackle the reliance on the overly reductive citation counts may also lead the saliency to be less susceptible to manipulations, ranging from citation coercions (Wilhite and Fong, 2012 ) to malicious cheating (López-Cózar et al., 2014 ) targeting metrics like the h-index. By using the citation contexts in saliencies, these manipulations are, in theory, less effective and easier to detect, as demonstrated by PageRank for the link spam detection in the web graph (Gyöngyi and Garcia-Molina, 2005 ). The extent to which the gain of the eigenvector-based method can be transported from the web graph to the scholarly network, however, awaits further quantification.

Another research topic MAS can be useful is in the effectiveness of saliencies of non-publication entities that, as described in (7), are aggregated from publication saliencies. This design gives rise to at least two intriguing properties. First, an entity can achieve high saliency with lots of publications, not all of which are important. As a result, saliency appears to be measuring both productivity and impact simultaneously, just like h-index. Indeed, shows a comparison between the h-index and the saliency of Microsoft authors ( Supplementary Material S2 ). Overall, there is a trend line suggesting individuals with a higher h-index tend to also have a higher saliency, but notable disagreements between the two abound. The author with the highest h-index, 134, in this set has the most publications at 619 articles that in total receive 60,157 citations but ranks only at the 4th place by saliency. Conversely, the highest ranked author by saliency has published only 138 papers receiving 82,293 citations with a h-index at 76. Most notably, the second highest ranked author by saliency has an h-index only at 31. This is because the author has published only 39 papers, which limits the h-index, but they are all well received with a total citation count of 58,268, which buoys the saliency. The drawbacks of the h-index, e.g., capping at the publication counts in this example, are well-known (Waltman and Eck, 2012 ). By considering more factors and not limited to overly reductive raw signals, saliency appears to be better equipped to avoid mischaracterizing researchers who strive for the quality and not the quantity of their publications.

Secondly, because the underlying foundation of an aggregated saliency is based on article-level analysis, interdisciplinary work seems to be better captured. One such example is the journal ranking on a given subject, say, Library Science. As shown for a while at Microsoft Academic website 8 , journals like Nature and Science are among the top 10 for this field when ranked by saliency. This may be a surprise to many human experts because these two journals are seldom considered as a publication venue for the field of library science. Indeed, if the journals are ranked by h-index, these two journals will appear in much lower positions because the numbers of articles in the field are lower in these two journals. However, a closer investigation shows that these two journals have influential articles in the field, such as the Leiden Manifesto (Hicks et al., 2015 ) in Nature and the coercive citation studies (Wilhite and Fong, 2012 ) in Science. If one were to understand the most impactful papers in this field, precluding these two journals into consideration would lead to unacceptable omissions and result in incomplete work. Again, this example highlights the known problem of using journals as the unit to conduct quantitative scientific studies, and the sharp focus into article-level analysis, as demonstrated feasible by saliency, appears to be a better option.

Prestige: Size-Normalized Saliency

A known issue existing in aggregate measurements is that the sheer number of data points being considered can often play an outsized role. This can be seen in Figure 5 where the author saliency largely agrees, especially for prolific authors, with the h-index, a metric designed to measure the impact as well as the productivity. As implied by (7), an author can reach a high saliency by having a large number of publications despite most of them receive only moderate recognitions. Given it has been observed that hyper-prolific authors exist (Ioannidis et al., 2018 ), and their publications seem to yield uneven qualities (Bornmann and Tekles, 2019 ), it might be helpful to juxtapose the saliency with a corresponding size-normalized version, which we call prestige, to further discern the two aspects. To be specific, the prestige of a non-publication entity can be derived from (7) as:

An external file that holds a picture, illustration, etc.
Object name is fdata-02-00045-g0005.jpg

A scatter plot comparing the h-index and the saliency where each dot corresponds to the h-index and the saliency of a Microsoft author. Although the two metrics largely agree, the saliency measure is able to overcome a known limitation of the h-index and highlight authors who have published widely recognized work but not in large quantity.

where A ¯ x , p = [ δ i j / ∑ j δ i j ] in contrast to (9). In short, the prestige of an entity is the average of the saliencies of its publications. Figure 6 illustrates the effect of the size normalization through the rankings of the world research institutions in the field of computer science based on the saliencies and the prestiges of their research papers published during the 5-year window between 2012 and 2016. The institutions that publish with consistent impacts are lined up along the main diagonal where the size normalization has negligible effect on their rankings. It appears the majority of the institutions are in this category. Scattered to the upper left of the diagonal, however, are those that are not the most prolific institutions but, when they do, their publications tend to be highly recognized by the research community. Size normalization, as expected, significantly boosts their rankings, for instance, as in the casesof Princeton University and Google. On the other hand, clustered to the lower right to the diagonal are the institutions that achieve high saliencies by publishing a large body of literature, as reflected in the relatively large bubble sizes in Figure 6 , and hence their rankings are negatively impacted by the size normalization operation.

An external file that holds a picture, illustration, etc.
Object name is fdata-02-00045-g0006.jpg

Saliency (horizontal) vs. Prestige ranking of the top 50 research institutions in computer science area. The size of each bubble corresponds to the number of publications included in computing the saliency and the prestige measurements.

As many GOTO-compliant ranking systems (Berger et al., 2019 ) have discovered, one cannot over-emphasize that institution rankings are a highly sophisticated task that necessitates multiple perspectives and with varying degrees in granularities that commercial rankings such as US News & World Report are typically ill-equipped. To illustrate the point, Figure 7 shows the saliency-prestige rankings of institutions in the subfield of computer science, artificial intelligence, and its subfields of machine learning, computer vision and natural language processing. The recurring themes emerging from the high variances in the ranking results and significant differences in the top institutions strongly suggest that the ranking result of a field is a very poor predictor of its subfields. This is consistent with our observation that, within the subfields of computer science, the spectrum of research topics is so broad that institutions can choose to specialize into a selective few to have a strong and highly impactful research program. Consequently, ranking institutions at too broad a category amounts to comparing research on notably different fields that can have distinct publication culture and citation behaviors, i.e., is an apple-vs.-orange type of comparison. With new resources like MAS that can pinpoint each publication to very fine-grained fields of study, such a deeply-flawed methodology that was previously tolerated due to data scarcity should no longer be deemed acceptable and must be soundly rejected by the community.

An external file that holds a picture, illustration, etc.
Object name is fdata-02-00045-g0007.jpg

Institution Rankings, by Saliency (horizontal) vs. Prestige, for the field of Artificial Intelligence and its subfields.

The explosive growth in scholarly communications has made it more difficult for individual humans to keep track of the latest achievements and trends in scientific research. The warning signs are visible in the worsening qualities of research assessments involving expert opinions, as a recent CRA study showed. This article describes how MAS utilizes the advancements in AI to curate a good and open data set and enable transparent and objective methodologies (GOTO) for scientific studies on science. The AI components in MAS, in natural language understanding, in knowledge reasoning and inferences, and in reinforcement learning for estimating saliencies of entities in scholarly communications have been described. There are early indications that saliencies, an objective measure by harvesting the peer reviewed citation contexts, avoid many drawbacks of existing academic metrics.

Data Availability Statement

Author contributions.

KW drafted the manuscript and coordinated the research project. ZS, RR, and DE supervised the MAG, MAS, and the MAKES portions of the work. CH reviewed the experimental setups and software, while the rest of the authors have equal contributions to the data collected in the work.

Conflict of Interest

All authors are employed by the company Microsoft Research.

Acknowledgments

Dr. Hao Ma led the efforts in creating many advanced features in MAG. Dr. Bo-June Paul Hsu led the team to develop the inference engine in MAKES, and, with the assistance of Dr. Rong Xiao, implemented the first version of the reinforcement learning to compute saliency. The work would not be possible without the strong support from Microsoft Bing engineering teams and colleagues in Microsoft Research labs around the globe.

1 https://sfdora.org/

2 See https://cra.org/cra-statement-us-news-world-report-rankings-computer-science-universities/

3 https://docs.microsoft.com/en-us/academic-services/

4 https://academic.microsoft.com

5 https://paperswithcode.com/

6 https://academic.microsoft.com/faq

7 https://www.microsoft.com/en-us/research/project/academic/articles/understanding-documents-by-using-semantics/

8 See the analytic page at https://academic.microsoft.com/journals/41008148,161191863

Funding. The authors declare that this study received funding from Microsoft Research. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fdata.2019.00045/full#supplementary-material

Supplementary Material S1

A randomly synthesized query sets to study the differences between the citation count based and the saliency based ranking behaviors.

Supplementary Material S2

Analytical script and the data to study the difference between the h-index and saliencies for authors.

  • Berger E., Blackburn S. M., Brodley C., Jagadish H. V., McKinley K. S., Nascimento M. A., et al.. (2019). GOTO rankings considered helpful . Commun. ACM 62 , 29–30. 10.1145/3332803 [ CrossRef ] [ Google Scholar ]
  • Bornmann L., Tekles A. (2019). Productivity does not equal usefulness . Scientometrics 118 , 705–707. 10.1007/s11192-018-2982-5 [ CrossRef ] [ Google Scholar ]
  • Carmel D., Chang M. W., Gabrilovich E., Hsu B. J. P., Wang K. (2014). ERD'14: entity recognition and disambiguation challenge, in Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (Gold Coast, QLD: ). 10.1145/2600428.2600734 [ CrossRef ] [ Google Scholar ]
  • Cheung M. K. (2013). Altmetrics: too soon for use in assessment . Nature 494 , 176–176. 10.1038/494176d [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Evans M., Furnell S. (2002). A web-based resource migration protocol using WebDav, in Proceedings of the WWW-2002 (Honolulu, HI: ). [ Google Scholar ]
  • Franceschet M. (2011). PageRank: standing on the shoulders of giants . Commun. ACM 54 , 92–101. 10.1145/1953122.1953146 [ CrossRef ] [ Google Scholar ]
  • Garfield E. (1955). Citation indexes for science: a new dimension in documentation through association of ideas . Science 122 , 108–111. 10.1126/science.122.3159.108 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Garfield E. (1964). Science citation index- a new dimension in indexing . Science 144 , 649–654. 10.1126/science.144.3619.649 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Garfield E. (1972). Citation analysis as a tool in journal evaluation journals can be ranked by frequency and impact of citations for science policy studies . Science 178 , 471–479. 10.1126/science.178.4060.471 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gyöngyi Z., Garcia-Molina H. (2005). Web Spam Taxonomy . Chiba: AIRWeb. [ Google Scholar ]
  • Harris Z. S. (1954). Distributional structure . WORD 10 , 146–162. 10.1080/00437956.1954.11659520 [ CrossRef ] [ Google Scholar ]
  • Harzing A. W., Alakangas S. (2017). Microsoft Academic: is the phoenix getting wings? Scientometrics 110 , 371–383. 10.1007/s11192-016-2185-x [ CrossRef ] [ Google Scholar ]
  • Herrmannova D., Knoth P. (2016). An Analysis of the Microsoft Academic Graph, D-lib Magazine 22 , 6. 10.1045/september2016-herrmannova [ CrossRef ] [ Google Scholar ]
  • Hicks D., Wouters P., Waltman L., Rijcke S. D., Rafols I. (2015). Bibliometrics: the leiden manifesto for research metrics . Nature 520 , 429–431. 10.1038/520429a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hug S. E., Brändle M. P. (2017). The coverage of Microsoft academic: analyzing the publication output of a university . Scientometrics 113 , 1551–1571. 10.1007/s11192-017-2535-3 [ CrossRef ] [ Google Scholar ]
  • Hug S. E., Ochsner M., Brändle M. P. (2017). Citation analysis with microsoft academic . Scientometrics 111 , 371–378. 10.1007/s11192-017-2247-8 [ CrossRef ] [ Google Scholar ]
  • Ioannidis J. P. A., Klavans R., Boyack K. W. (2018). Thousands of scientists publish a paper every five days . Nature 561 , 167–169. 10.1038/d41586-018-06185-8 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kanakia S. Z., Eide D., Wang K. (2019). A scalable hybrid research paper recommender system for microsoft academic, in WWW '19 The World Wide Web Conference (New York, NY: ACM; ). 10.1145/3308558.3313700 [ CrossRef ] [ Google Scholar ]
  • Kousha K., Thelwall M., Abdoli M. (2018). Can Microsoft Academic assess the early citation impact of in-press articles? A multi-discipline exploratory analysis . J. Informet. 12 , 287–298. 10.1016/j.joi.2018.01.009 [ CrossRef ] [ Google Scholar ]
  • López-Cózar E. D., Robinson-García N., Torres-Salinas D. (2014). The Google scholar experiment: How to index false papers and manipulate bibliometric indicators . J. Assoc. Inform. Sci. Technol. 65 , 446–454. 10.1002/asi.23056 [ CrossRef ] [ Google Scholar ]
  • Maslov S., Redner S. (2008). Promise and pitfalls of extending Google's pagerank algorithm to citation networks . J. Neurosci. 28 , 11103–11105. 10.1523/JNEUROSCI.0002-08.2008 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mikolov T., Sutskever I., Chen K., Corrado G. S., Dean J. (2013). Distributed representations of words and phrases and their compositionality, in NIPS'13 Proceedings of the 26th International Conference on Neural Information Processing Systems , Vol. 2 (Lake Tahoe, NV: ), 3111–3119. [ Google Scholar ]
  • Piwowar H. (2013). Altmetrics: Value all research products . Nature 493 , 159–159. 10.1038/493159a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rougier N. P., Hinsen K., Alexandre F., Arildsen T., Barba L. A., Benureau F. C. Y., et al.. (2017). Sustainable computational science: the ReScience initiative . PeerJ 3 , 1–8. 10.7717/peerj-cs.142 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shen Z., Ma H., Wang K. (2018). A web-scale system for scientific knowledge exploration, in Meeting of the Association for Computational linguistics (Melbourne, VIC: ), 87–92. 10.18653/v1/P18-4015 [ CrossRef ] [ Google Scholar ]
  • Sinha A., Shen Z., Song Y., Ma H., Eide D., Hsu B. J. P., Wang K. (2015). An overview of Microsoft Academic Service (MAS) and applications, in Proceedings of the 24th International Conference on World Wide Web (Florence: ). 10.1145/2740908.2742839 [ CrossRef ] [ Google Scholar ]
  • Thelwall M. (2017). Microsoft Academic: a multidisciplinary comparison of citation counts with Scopus and Mendeley for 29 journals . J. Informet. 11 , 1201–1212. 10.1016/j.joi.2017.10.006 [ CrossRef ] [ Google Scholar ]
  • Thelwall M. (2018a). Can Microsoft Academic be used for citation analysis of preprint archives? The case of the social science research network . Scientometrics 115 , 913–928. 10.1007/s11192-018-2704-z [ CrossRef ] [ Google Scholar ]
  • Thelwall M. (2018b). Does Microsoft Academic find early citations . Scientometrics 114 , 325–334. 10.1007/s11192-017-2558-9 [ CrossRef ] [ Google Scholar ]
  • Thelwall M. (2018c). Microsoft Academic automatic document searches: accuracy for journal articles and suitability for citation analysis . J. Informet. 12 , 1–9. 10.1016/j.joi.2017.11.001 [ CrossRef ] [ Google Scholar ]
  • Traag V. A., Waltman L. (2019). Systematic analysis of agreement between metrics and peer review in the UK REF . Palgrave Commun. 5 :29. 10.1057/s41599-019-0233-x [ CrossRef ] [ Google Scholar ]
  • Turney P. D., Pantel P. (2010). From frequency to meaning: vector space models of semantics . J. Art. Intell. Res. 37 , 141–188. 10.1613/jair.2934 [ CrossRef ] [ Google Scholar ]
  • Walker D., Xie H., Yan K. K., Maslov S. (2007). Ranking scientific publications using a model of network traffic . J. Statist. Mech. 2007 :6010. 10.1088/1742-5468/2007/06/P06010 [ CrossRef ] [ Google Scholar ]
  • Waltman L., Eck N. J. V. (2012). The inconsistency of the h-index . J. Assoc. Informat. Sci. Technol. 63 , 406–415. 10.1002/asi.21678 [ CrossRef ] [ Google Scholar ]
  • Waltman L., Yan E. (2014). PageRank-related methods for analyzing citation networks, in Measuring Scholarly Impact , eds Waltman L, Yan E. (Cham: Springer; ), 83–100. 10.1007/978-3-319-10377-8_4 [ CrossRef ] [ Google Scholar ]
  • Wilhite W., Fong E. A. (2012). Coercive citation in academic publishing . Science 335 , 542–543. 10.1126/science.1212540 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wilsdon J. (2015). We need a measured approach to metrics . Nature 523 , 129–129. 10.1038/523129a [ PubMed ] [ CrossRef ] [ Google Scholar ]

How-To Geek

How to use researcher in microsoft word for essays and papers.

With Researcher in Microsoft Word, you can reduce the time you spend researching your school essay or research paper. Close your web browser and use Word’s built-in tool.

Quick Links

What can you do with researcher, open researcher in microsoft word, review relevant topics and top sources, add topic items to your document.

Microsoft wants to make your research easier. With the Word Researcher tool, you can close your web browser and get sources for school essays, research papers, and similar documents in a few clicks.

The Researcher feature, powered by Bing, gives you a handy search box to find people, events, places, and concepts. The results of your search provide you with relevant topics and top sources including books, journals, websites, and images.

When you select the source you want, you can see an overview, history, location, images, and other important details. And the best part is, you never leave your Microsoft Word document.

In addition to viewing the details for your topic, you can start an outline for your paper as well as adding and citing text. Click the main subject or one of the information sections and add it directly to your document.

Here, we'll show you how to reduce the time you spend researching and speed up the creation of your paper with the Researcher tool in Microsoft Word.

At the time of writing,  Researcher is available with Word for Microsoft 365, Word for Microsoft 365 for Mac, and Word 2016. It is available to Microsoft 365 subscribers for Windows desktop clients.

To use the Researcher tool, open the "References" tab of your Word document. Click "Researcher" from the "Research" section of the ribbon.

When the pane opens on the right, type a term into the Search box and you're on your way!

You'll receive results for your search with Relevant Topics at the top and Top Sources beneath.

Relevant Topics

Some topics may only give you a couple of Relevant Topics. Click "More Topics" below that section to see additional sources.

If you click one of the Relevant Topics, you'll see a nice overview of the subject. At the end of the "Overview" section, click "Read More" for full details.

Depending on your topic, you'll then see several block sections packed with details. This structure comes in handy for starting your outline with them, which we'll describe below.

If the subject and Relevant Topic have images, you can click "See All Images" for a neat grid of photos and illustrations. Click one to open your browser and view the image online. Plus, you can add these to your document, which we'll also show you below.

Top Sources

For even more options, the "Top Sources" area offers books, journals, and websites. Select any one of those for its details.

If you choose a Relevant Topic at the top first, you can then filter your Top Sources by subtopic. Click the drop-down box for "All Topics" and pick one.

While most of the material is contained within Word, you may come across a source here and there that you must open in your browser. Click the link to open the source site in your default web browser.

Along with viewing information on your topic, you can add headings, text, and images directly to your document using Researcher.

Add Headings

On the top right of each source's section, you'll see a plus sign. Click the "+" icon to add that section as a collapsible heading for your document outline. Remember, this only adds the heading, not the text, within the section.

If you want to add a snippet of text to your document, you can do this as well. Select the text from the source by dragging your cursor through it. When you release, you'll see a small box appear with options for "Add and Cite" and "Add."

When you choose "Add and Cite," the text will pop into your document with the source cited at the end of the snippet. The citation is formatted automatically, so you can add it to a bibliography easily.

When you choose "Add," the text will still appear in your document, but without the citation.

If your topic offers images, and you click "See All Images," you have the option to add one or more of those, too. This is super convenient because you don't have to hunt them down yourself.

Click the "+" icon in the corner of the image to add it to your paper.

It will appear in your document with the source cited beneath it.

Be sure to respect copyrights when using the available images for your purpose. If you're unsure whether you can use an image, click "Learn More" above the image grid. This takes you to the Microsoft legal webpage explaining copyright and offering FAQs. You can also check our article on images with a Creative Commons License for those sources from Creative Commons.

College essays and research papers are enough work in themselves. By using Researcher in Microsoft Word, you can ease the burden of the research for your document and get a jumpstart on its contents.

Help | Advanced Search

Quantum Physics

Title: demonstration of logical qubits and repeated error correction with better-than-physical error rates.

Abstract: The promise of quantum computers hinges on the ability to scale to large system sizes, e.g., to run quantum computations consisting of more than 100 million operations fault-tolerantly. This in turn requires suppressing errors to levels inversely proportional to the size of the computation. As a step towards this ambitious goal, we present experiments on a trapped-ion QCCD processor where, through the use of fault-tolerant encoding and error correction, we are able to suppress logical error rates to levels below the physical error rates. In particular, we entangled logical qubits encoded in the [[7,1,3]] code with error rates 9.8 times to 500 times lower than at the physical level, and entangled logical qubits encoded in a [[12,2,4]] code with error rates 4.7 times to 800 times lower than at the physical level, depending on the judicious use of post-selection. Moreover, we demonstrate repeated error correction with the [[12,2,4]] code, with logical error rates below physical circuit baselines corresponding to repeated CNOTs, and show evidence that the error rate per error correction cycle, which consists of over 100 physical CNOTs, approaches the error rate of two physical CNOTs. These results signify an important transition from noisy intermediate scale quantum computing to reliable quantum computing, and demonstrate advanced capabilities toward large-scale fault-tolerant quantum computing.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • INSPIRE HEP
  • Google Scholar
  • Semantic Scholar

1 blog link

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Apple claims its on-device AI system ReaLM 'substantially outperforms' GPT-4

maria-diaz

A smarter Siri could be coming to the iPhone. 

We know Apple is working on a series of AI announcements for WWDC 2024 in June, but we don't yet know exactly what these will entail. Enhancing Siri is one of Apple's main priorities, as iPhone users regularly complain about the assistant. Apple's AI researchers this week published a  research paper that may shed new light on Apple's AI plans for Siri, maybe even in time for WWDC.

The paper introduces Reference Resolution As Language Modeling (ReALM), a conversational AI system with a novel approach to improving reference resolution. The hope is that ReALM could improve Siri's ability to understand context in a conversation, process onscreen content, and detect background activities. 

Also: OpenAI's Voice Engine can clone a voice from a 15-second clip. Listen for yourself

Treating reference resolution as a language modeling problem breaks from traditional methods focused on conversational context. ReaLM can convert conversational, onscreen, and background processes into a text format that can then be processed by large language models (LLMs), leveraging their semantic understanding capabilities.

The researchers benchmarked ReaLM models against GPT-3.5 and GPT-4, OpenAI's LLMs that currently power the free ChatGPT and the paid ChatGPT Plus. In the paper, the researchers said their smallest model performed comparatively to GPT-4, while their largest models did even better.

"We demonstrate large improvements over an existing system with similar functionality across different types of references, with our smallest model obtaining absolute gains of over 5% for onscreen references," the researchers explained in the paper. "We also benchmark against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it."

Also: An AI model with emotional intelligence? I cried, and Hume's EVI told me it cared

The paper lists four sizes of the ReALM model: ReALM-80M, ReALM-250M, ReALM-1B, and ReALM-3B. The "M" and "B" indicate the number of parameters in millions and billions, respectively. GPT-3.5 has 175 billion parameters while GPT-4 reportedly boasts about 1.5 trillion parameters. 

"We show that ReaLM outperforms previous approaches, and performs roughly as well as the state of the art LLM today, GPT-4, despite consisting of far fewer parameters," the paper states.

Apple has yet to confirm whether this research will play a role in iOS 18 or its latest devices.

Apple confirms WWDC 2024 for June 10 - will AI steal the show?

Apple reportedly exploring ai-powered home robots, including these two products, microsoft unveils seven new ai features to level up your meetings.

IMAGES

  1. Seven Microsoft Research Papers Selected for Oral Presentations in

    microsoft research papers

  2. Microsoft Word Research Paper Template Collection

    microsoft research papers

  3. paper

    microsoft research papers

  4. Microsoft Research Paper

    microsoft research papers

  5. How to format research paper in Word

    microsoft research papers

  6. PDF file

    microsoft research papers

COMMENTS

  1. Publications index

    Publications index. Below is an index of publications written by Microsoft researchers, often in collaboration with the academic community.

  2. Microsoft Research

    View more news and awards. Explore research at Microsoft, a site featuring the impact of research along with publications, products, downloads, and research careers.

  3. The effects of remote work on collaboration among information ...

    We found that, on average, firm-wide remote work decreased the number of bridging ties by 0.09 FV ( P < 0.001, 95% CI = 0.06-0.13) and the share of time with bridging ties by 0.41 FV ( P < 0.001 ...

  4. AI and Microsoft Research

    Microsoft Research recently established AI4Science, a global organization of researchers and engineers, including leading experts in machine learning, quantum physics, computational chemistry, molecular biology, fluid dynamics, software engineering, and other disciplines. This group is researching some of today's most pressing challenges in ...

  5. Microsoft Research

    Microsoft Research (MSR) is the research subsidiary of Microsoft.It was created in 1991 by Richard Rashid, Bill Gates and Nathan Myhrvold with the intent to advance state-of-the-art computing and solve difficult world problems through technological innovation in collaboration with academic, government, and industry researchers. The Microsoft Research team has more than 1,000 computer ...

  6. Work Trend Index: Microsoft's latest research on the ways we work

    Our goal is to discover and share broad workplace trends that are anonymized by aggregating the data broadly from those trillions of signals that make up the Microsoft Graph. See the WorkLab Sitemap. The Work Trend Index provides data-driven insights to help people and organizations thrive amid ongoing change and disruption.

  7. Frontiers

    A Review of Microsoft Academic Services for Science of Science Studies. Kuansan Wang * Zhihong Shen Chiyuan Huang Chieh-Han Wu Darrin Eide Yuxiao Dong Junjie Qian Anshul Kanakia Alvin Chen Richard Rogahn. Microsoft Research, Redmond, WA, United States. Since the relaunch of Microsoft Academic Services (MAS) 4 years ago, scholarly communications ...

  8. PDF Textbooks Are All You Need

    Microsoft Research Abstract We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of "textbook quality" data from the web (6B tokens) and synthetically

  9. PDF The Impact of AI on Developer Productivity: Evidence from GitHub Copilot

    The paper proceeds as follows. We first describe the design of the controlled trial and provide summary statistics. We then present the results. ... Before we began recruitment, we received approval for the study from the Microsoft Research Ethics Review Board. Participants were instructed to write an HTTP server in JavaScript—the treatment ...

  10. Microsoft Quantum researchers make algorithmic advances to tackle

    In a paper recently published in PRX Quantum, Microsoft Azure Quantum researchers Guang Hao Low and Yuan Su, with collaborators Yu Tong and Minh Tran, have developed faster algorithms for quantum simulation. One of the most promising applications of quantum computers is to simulate systems governed by the laws of quantum mechanics.

  11. PDF S ebastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke

    Microsoft Research Abstract Arti cial intelligence (AI) researchers have been developing and re ning large language models (LLMs) ... In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-

  12. A Review of Microsoft Academic Services for Science of Science Studies

    A scalable hybrid research paper recommender system for microsoft academic, in WWW '19 The World Wide Web Conference (New York, NY: ACM; ). 10.1145/3308558.3313700 [Google Scholar] Kousha K., Thelwall M., Abdoli M. (2018). Can Microsoft Academic assess the early citation impact of in-press articles? A multi-discipline exploratory analysis. J.

  13. How to Use Researcher in Microsoft Word for Essays and Papers

    Open Researcher in Microsoft Word. To use the Researcher tool, open the "References" tab of your Word document. Click "Researcher" from the "Research" section of the ribbon. When the pane opens on the right, type a term into the Search box and you're on your way!

  14. [2404.02280] Demonstration of logical qubits and repeated error

    The promise of quantum computers hinges on the ability to scale to large system sizes, e.g., to run quantum computations consisting of more than 100 million operations fault-tolerantly. This in turn requires suppressing errors to levels inversely proportional to the size of the computation. As a step towards this ambitious goal, we present experiments on a trapped-ion QCCD processor where ...

  15. Applied Sciences

    Since Microsoft HoloLens first appeared in 2016, HoloLens has been used in various industries, over the past five years. This study aims to review academic papers on the applications of HoloLens in several industries. A review was performed to summarize the results of 44 papers (dated between January 2016 and December 2020) and to outline the research trends of applying HoloLens to different ...

  16. Apple claims its on-device AI system ReaLM 'substantially ...

    GPT-3.5 has 175 billion parameters while GPT-4 reportedly boasts about 1.5 trillion parameters. "We show that ReaLM outperforms previous approaches, and performs roughly as well as the state of ...