SCI Journal

15 Best Academic Networking and Collaboration Platforms

Photo of author

This post may contain affiliate links that allow us to earn a commission at no expense to you. Learn more

Best Academic Networking and Collaboration Platforms

Want to break free from scholarly isolation? Uncover the best academic networking and collaboration platforms to transform your research journey.

Have you ever found yourself submerged in heaps of academic papers, isolated in the pursuit of a research question that burns within your curious mind? Although occasionally necessary, the seclusion of academic research can stifle innovation and create echo chambers.

The antidote lies in robust academic networking and collaboration – the highway to diverse knowledge exchange.

Table of Contents

Academic networking and collaboration platforms are a global gathering of brilliant minds convening to discuss, debate, share, and collaborate in a virtual world. 

These platforms are fertile grounds that nurture academic growth, streamline collaboration, and enhance scholarly visibility. They offer diverse benefits, from tracking citations and managing references to facilitating interdisciplinary conversations.

Let’s embark on an enlightening journey through some of the top networking and collaboration platforms.

Best Academic Networking and Collaboration Platforms

#1. academia.edu – best for sharing research papers.

Credits: Phs.org, Best Academic Networking and Collaboration Platforms

  • Global hub connecting millions of researchers
  • Streamlines the process of sharing papers
  • Keeps you abreast with state-of-the-art research in your field

Academia.edu is a bustling online city square solely dedicated to the academic world. This platform provides an extensive environment for scholars to share papers, receive feedback, and stay updated with the latest research in their areas of interest.

What are the benefits of Academia.edu?

  • Direct Communication: Academia.edu offers a platform to directly connect with researchers around the world, facilitating communication and collaboration.
  • Paper Sharing and Discoverability: Users can share their own academic papers and discover those written by others, enhancing exposure and learning.
  • Analytics: Provides detailed statistics on who is reading and citing your work, thus helping to track the impact of your research.

If your research papers crave a global platform and you’re keen on networking with academicians, Academia.edu has a lot to offer. However, the platform’s push towards premium memberships might be a concern for some.

How much does it cost?

  • Premium: $9 per month

Source: https://www.academia.edu/

#2. Google Scholar – Best for Broad Literature Search and Citation Tracking

Credits: Wikipedia ,Best Academic Networking and Collaboration Platforms

  • Comprehensive database for literature search
  • Efficient citation tracking system
  • Authoritative profile management

Google Scholar operates like a relentless detective in the realm of academia. With its wide reach, it helps you navigate the vast sea of literature and keeps a sharp eye on who cites your work. 

This platform ensures you’re not just studying, but also effectively weaving your research into the global academic network.

What are the benefits of Google Scholar?

  • Broad Scope: Google Scholar gives access to a vast array of academic articles, theses, books, conference papers, and patents across many subject areas.
  • Citation Tracking: This platform sasy tracking of citations to articles, which helps measure the impact of research.
  • Personal Scholar Profiles: Users can create profiles displaying their publications, facilitating professional visibility.

Google Scholar is a formidable tool for literature search and citation tracking. But, if you’re seeking a platform that offers active networking or community building, you may need to explore further.

Source: https://scholar.google.com/

#3. LinkedIn – Best for Professional Networking Across All Fields

Credits: Linkedin ,Best Academic Networking and Collaboration Platforms

  • Premier platform for professional networking
  • Provides job listings and career opportunities
  • Facilitates industry-academia interactions

Like an interactive professional directory on steroids, LinkedIn goes beyond traditional networking. It bridges the gap between academia and industry, fostering connections and conversations that could ignite your next career-defining opportunity.

What are the benefits of LinkedIn?

  • Professional Networking: Allows for networking with professionals not only from academia but also from various industries.
  • Job Market Insight: This platform provides information about job opportunities, trends, and professional development resources.
  • Group Discussions: LinkedIn enables engagement in professional group discussions, offering space to share insights and gain knowledge from peers.

LinkedIn is your go-to if you’re seeking to extend your network beyond academia. While it might not be the primary choice for academic research, it’s an invaluable platform for career development and industry insights.

  • Premium starts at $39.99 per month to $149.99 per month

Source: https://www.techtarget.com/

#4. Mendeley – Best for Reference Management and Discovery of New Research

Credits: Mendeley ,Best Academic Networking and Collaboration Platforms

  • Robust tool for reference management
  • Curates personalized research recommendations
  • Facilitates collaborative work on papers

Mendeley is like the diligent research assistant you always needed. This tool handles your references with deftness, suggests new research tailored to your interests, and allows you to collaborate on papers with your team.

What are the benefits of Mendeley?

  • Reference Management: This tool helps to organize, read, annotate, and cite literature effectively, easing the research process.
  • Collaborative Work: Mendeley enables sharing and collaborating on documents with others privately or in public groups.
  • Research Network: It also connects with a global research community, providing updates from the fields of interest.

Mendeley is a handy tool for handling references and discovering new research. However, its networking capabilities are limited compared to other platforms.

  • Basic: $4.99 per month
  • Pro: $9.99 per month
  • Max: 14.99 per month

Source: https://www.mendeley.com

#5. ResearchGate – Best for Interdisciplinary Networking and Collaboration

Credits: ResearchGate ,Best Academic Networking and Collaboration Platforms

  • Dedicated platform for researchers across disciplines
  • Helps share and discover scholarly content
  • Fosters collaboration and discussion among peers

ResearchGate is a cross-disciplinary academic hub, buzzing with intellectual dialogue, paper sharing, and collaboration. The platform stands out by facilitating academic networking across disciplines, fostering a rich scholarly exchange.

What are the benefits of ResearchGate?

  • Collaborative Projects: ResearchGate enables sharing and following of research projects, helping foster collaborations.
  • Q&A Forum: A platform to ask research-related questions and receive answers from professionals in the field.
  • Open Reviews: It also offers a space for open peer review, enhancing the transparency and rigor of research.

ResearchGate offers an excellent platform for interdisciplinary networking and collaboration. However, its utility may be limited if you’re not involved in active research or uncomfortable with content behind memberships.

Source: https://www.researchgate.net/

#6. ORCID – Best for Ensuring Researcher Uniqueness

Credits: ORCID ,Best Academic Networking and Collaboration Platforms

  • Provides unique identifiers for researchers
  • Prevents identity confusion in academic work
  • Allows easy tracking of individual research contributions

ORCID is like your unique academic fingerprint, ensuring your work never gets mixed up with someone else’s. It grants researchers unique identifiers, making it easier to track and attribute your academic contributions.

What are the benefits of ORCID?

  • Unique Identifier: It provides a unique digital identifier that distinguishes a researcher and ensures their work is correctly attributed.
  • Integration: This tool is broadly integrated with many publishers, funders, and institutions, allowing for ease of workflow.
  • Record Keeping: ORCID keeps a track of all professional activities (publications, grants, patents etc.) in a centralized place, ensuring the researcher’s profile is up-to-date.

ORCID is a valuable tool for ensuring researcher uniqueness, but it’s not a full-fledged networking platform. It’s a must-have for academics, though its importance might be underestimated by those with less common names.

Source: https://orcid.org/

#7. Publons – Best for Tracking Peer Reviews and Editorships

Credits: Publons ,Best Academic Networking and Collaboration Platforms

  • Tracks and validates peer review contributions
  • Supports discovery of editorial opportunities
  • Facilitates open recognition for reviewers

Picture a stage where your often unnoticed work as a peer reviewer or editor gets the spotlight. Publons does just that, giving recognition to your behind-the-scenes contributions to the scholarly world.

What are the benefits of Publons?

  • Recognition of Review Work: Provides credit for peer review and editorial work which is often overlooked in academia.
  • Review History: Maintains a verifiable record of a researcher’s contribution to peer review and editorial work.
  • Training: Offers resources and training for peer reviewing, improving review quality and skills.

Publons excels at recognizing the often-invisible labor of peer review and editorship. It may have limited scope for broader academic networking, but its unique focus makes it stand out.

Source: https://publons.com/

#8. OSF (Open Science Framework) – Best for Full Project Lifecycle Management

Credits: MIT Libraries ,Best Academic Networking and Collaboration Platforms

  • Manages project lifecycle from planning to publishing
  • Supports collaboration and data sharing
  • Champions the cause of open science

OSF is like your research’s trustworthy custodian, guiding it from its infancy (planning) to maturity (publishing). Besides collaboration, it offers comprehensive tools to manage your project’s life cycle while championing the cause of open science.

What are the benefits of OSF?

  • Project Management: Assists in managing projects with a suite of collaborative tools, aiding the organization of research.
  • Open Science: Promotes transparency and reproducibility by enabling public sharing of datasets, protocols, and research outputs.
  • Cross-Platform Integration: Supports integration with many other tools like GitHub, Google Drive, and Mendeley.

For academics seeking a comprehensive platform to manage research projects from start to finish, OSF is an excellent tool. Its learning curve may be a hurdle, but the benefits it offers for project management and open science make it worth the effort.

Source: https://osf.io/

#9. GitHub – Best for Collaborative Coding and Version Control

Credits: GitHub ,Best Academic Networking and Collaboration Platforms

  • Ideal for coding collaborations and version control
  • Provides open-source platforms for various projects
  • Hosts a vibrant community of developers

GitHub is like a beehive for coding enthusiasts. The platform thrives with collaborative projects, offering unparalleled version control. It’s a dynamic platform where coders and researchers converge to build and refine code.

What are the benefits of GitHub?

  • Version Control: GitHub offers a robust version control system, facilitating collaborative coding and data analysis projects.
  • Repository: This tool allows the hosting of programming projects and open access of code, promoting open source development.
  • Community: Large community of users contributes to learning, problem-solving, and improvement of code.

If your research involves coding, GitHub is a must-have tool. While it might be intimidating for beginners, the benefits it offers for collaborative coding and version control are unmatched.

  • Starts from $4 per user

Source: https://github.com/

#10. StackExchange (Academia) – Best for Question-and-Answer Format Discussions Related to Academia

Credits: SiteJabber ,Best Academic Networking and Collaboration Platforms

  • Q&A platform specifically for academia
  • Facilitates sharing of knowledge and advice
  • Hosts a diverse community of researchers and academics

StackExchange (Academia) is like your reliable academic counsel. This Q&A platform fuels robust discussions on academic issues, connecting you with a community eager to share knowledge and insights.

What are the benefits of Stack Exchange?

  • Expertise: Provides a forum for asking and answering questions related to academic life, drawing from a wide pool of experiences and expertise.
  • Categorized Discussions: Allows for categorization of discussions by topics, making it easier to find relevant information.
  • Reputation Points: Users earn reputation points for quality questions and answers, encouraging a high standard of contributions.

StackExchange (Academia) shines as a Q&A platform specifically for academic discussions. While it’s not suitable for document sharing or formal networking, it’s an excellent resource for tapping into collective academic wisdom.

Source: https://academia.stackexchange.com/

#11. Reddit – Best for Informal Academic Discussions and Advice

Credits: Adweek ,Best Academic Networking and Collaboration Platforms

  • Platform for informal academic discussions
  • Wide variety of topic-specific communities
  • Enables anonymity in conversation

Reddit is a bustling online tavern where academic conversations flow as freely as the drinks. With its countless communities and casual tone, it opens up a world of informal academic discussions and advice.

What are the benefits of Reddit?

  • Subreddit Communities: Offers numerous academic subreddits, enabling discussion and advice on niche topics.
  • Anonymous Interaction: Allows users to maintain anonymity, encouraging open and honest discussion.
  • Global Perspectives: Provides a platform to interact with a diverse, worldwide user base, broadening perspectives on various topics.

Reddit is a treasure trove for those seeking informal academic discussions. While it’s not ideal for formal networking, the depth and breadth of its communities make it a unique platform for candid conversations.

Source: https://www.redditinc.com/

#12. Quora – Best for Receiving Expert Answers and Sharing Knowledge

Credits: Wikimedia Commons ,Best Academic Networking and Collaboration Platforms

  • Q&A platform with diverse range of topics
  • Hosts experts across various fields
  • Great for knowledge sharing and learning

Quora is like a dynamic global seminar, buzzing with questions and brimming with expert answers. This social networking site is an invaluable platform for sharing your knowledge and quenching your thirst for insights from various fields.

What are the benefits of Quora?

  • Diverse Topics: Covers a wide range of topics, enabling users to ask and answer questions on virtually any subject.
  • Expert Answers: Often features responses from industry and academic experts, providing authoritative answers.
  • Personalized Feed: Users can follow topics of interest to customize their feed, staying updated on their preferred subjects.

Quora is a versatile platform for knowledge sharing and gaining expert insights. While it may not be a traditional academic networking tool, its strength lies in the diversity of its topics and the expertise of its contributors.

Source: https://www.quora.com/

#13. Facebook – Best for Building Community and Networking in Field-Specific Groups

Credits: Facebook ,Best Academic Networking and Collaboration Platforms

  • Popular platform for building communities
  • Houses a multitude of field-specific groups
  • Allows for event organization and announcement sharing

Facebook might be known for connecting friends and families, but it’s also a melting pot for academic networking. With its diverse field-specific groups and community-building features, it’s a trove of academic possibilities.

What are the benefits of Facebook?

  • Social Networking: Connects researchers on a social level, enabling informal discussions and relationship building.
  • Academic Groups: Hosts numerous academic and research-focused groups for collaboration, discussion, and sharing of resources.
  • Events: Provides a platform for promoting and discovering academic events, lectures, and webinars.

Facebook is one of the most useful social networking sites for academic networking due to its wide reach and diverse groups. However, it may not cater to every academic need and its data handling practices might give privacy-conscious users pause.

Source: https://edu.gcfglobal.org/

#14. Twitter – Best for Sharing Quick Research Updates and Engaging in Academic Discussions

Credits: TechCrunch, Best Academic Networking and Collaboration Platforms

  • Microblogging social media platform ideal for quick updates
  • Connects academics and researchers globally
  • Hashtag system enables focused conversations

Twitter is the academic equivalent of the town crier, broadcasting research updates in quick, digestible bites. Its global reach and hashtag system offer unique ways to engage with the academic community.

What are the benefits of Twitter?

  • Real-Time Updates: Provides real-time updates from conferences, symposia, and fellow academics, keeping users up to date.
  • Networking: Enables networking with a broad audience, allowing for the sharing and promotion of research.
  • Hashtag Use: Allows for the organization of content using hashtags, aiding discoverability of research topics.

If brevity is your thing, Twitter excels at disseminating quick research updates. Its potential for distractions and information overload should be considered, but its wide reach and hashtag-driven discussions can be invaluable. Therefore, Twitter is one of the best academic social networks.

Source: https://about.twitter.com/

#15. Scopus – Best for Abstract and Citation Searching

Credits: Scopus,Best Academic Networking and Collaboration Platforms

  • Specializes in abstract and citation searching
  • Houses extensive database of research literature
  • Provides analytical tools to track citation impact

Scopus is akin to a scholarly lighthouse, guiding researchers through the dense sea of abstracts and citations. Its expansive database and analytical tools make it a valuable asset for any academic.

What are the benefits of Scopus?

  • Comprehensive Database: Offers access to a large database of peer-reviewed literature from various fields.
  • Analytic Tools: Provides various tools for analyzing research output and trends, supporting academic decision-making.
  • Author Profiles: This tool facilitates the tracking of an author’s work and citation impact, aiding reputation management.

Scopus is a powerhouse for abstract and citation searching, though it doesn’t offer much in the way of networking or collaboration. While it is a subscription-based service, the depth and breadth of its tools and database can justify the cost for many researchers.

  • Custom price

Source: https://www.scopus.com/

Academic networking and collaboration platforms are the lifeblood of today’s scholarly community. They offer many opportunities for sharing research, connecting with fellow academics, and advancing your academic career. 

Whether you are looking for strictly academic or social media platforms, these blog post provide the tools and resources to take your academic journey to new heights.

There is more.

Check out our other articles on the Best Academic Tools Series for Research below.

  • Learn how to get more done with these Academic Writing Tools  
  • Learn how to proofread your work with these Proofreading Tools
  • Learn how to broaden your research landscape with these Academic Search Engines
  • Learn how to manage multiple research projects with these Project Management Tools
  • Learn how to run effective survey research with these Survey Tools for Research
  • Learn how get more insights from important conversations and interviews with Transcription Tools
  • Learn how to manage the ever-growing list of references with these Reference Management Software
  • Learn how to double your productivity with literature reviews with these AI-Based Summary Generators
  • Learn how to build and develop your audience with these Academic Social Network Sites
  • Learn how to make sure your content is original and trustworthy with these Plagiarism Checkers
  • Learn how to talk about your work effectively with these Science Communication Tools

Photo of author

Leave a Comment Cancel reply

Save my name, email, and website in this browser for the next time I comment.

We maintain and update science journals and scientific metrics. Scientific metrics data are aggregated from publicly available sources. Please note that we do NOT publish research papers on this platform. We do NOT accept any manuscript.

academic paper network

2012-2024 © scijournal.org

Download 55 million PDFs for free

Explore our top research interests.

academic paper network

Engineering

academic paper network

Anthropology

academic paper network

  • Earth Sciences

academic paper network

  • Computer Science

academic paper network

  • Mathematics

academic paper network

  • Health Sciences

academic paper network

Join 257 million academics and researchers

Track your impact.

Share your work with other academics, grow your audience and track your impact on your field with our robust analytics

Discover new research

Get access to millions of research papers and stay informed with the important topics around the world

Publish your work

Publish your research with fast and rigorous service through Academia.edu Publishing. Get instant worldwide dissemination of your work

Unlock the most powerful tools with Academia Premium

academic paper network

Work faster and smarter with advanced research discovery tools

Search the full text and citations of our millions of papers. Download groups of related papers to jumpstart your research. Save time with detailed summaries and search alerts.

  • Advanced Search
  • PDF Packages of 37 papers
  • Summaries and Search Alerts

academic paper network

Share your work, track your impact, and grow your audience

Get notified when other academics mention you or cite your papers. Track your impact with in-depth analytics and network with members of your field.

  • Mentions and Citations Tracking
  • Advanced Analytics
  • Publishing Tools

Real stories from real people

academic paper network

Used by academics at over 16,000 universities

academic paper network

Get started and find the best quality research

  • Academia.edu Publishing
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Cognitive Science
  • Academia ©2024

🇺🇦    make metadata, not war

A comprehensive bibliographic database of the world’s scholarly literature

The world’s largest collection of open access research papers, machine access to our vast unique full text corpus, core features, indexing the world’s repositories.

We serve the global network of repositories and journals

Comprehensive data coverage

We provide both metadata and full text access to our comprehensive collection through our APIs and Datasets

Powerful services

We create powerful services for researchers, universities, and industry

Cutting-edge solutions

We research and develop innovative data-driven and AI solutions

Committed to the POSI

Cost-free PIDs for your repository

OAI identifiers are unique identifiers minted cost-free by repositories. Ensure that your repository is correctly configured, enabling the CORE OAI Resolver to redirect your identifiers to your repository landing pages.

OAI IDs provide a cost-free option for assigning Persistent Identifiers (PIDs) to your repository records. Learn more.

Who we serve?

Enabling others to create new tools and innovate using a global comprehensive collection of research papers.

Companies

“ Our partnership with CORE will provide Turnitin with vast amounts of metadata and full texts that we can ... ” Show more

Gareth Malcolm, Content Partner Manager at Turnitin

Academic institutions.

Making research more discoverable, improving metadata quality, helping to meet and monitor open access compliance.

Academic institutions

“ CORE’s role in providing a unified search of repository content is a great tool for the researcher and ex... ” Show more

Nicola Dowson, Library Services Manager at Open University

Researchers & general public.

Tools to find, discover and explore the wealth of open access research. Free for everyone, forever.

Researchers & general public

“ With millions of research papers available across thousands of different systems, CORE provides an invalu... ” Show more

Jon Tennant, Rogue Paleontologist and Founder of the Open Science MOOC

Helping funders to analyse, audit and monitor open research and accelerate towards open science.

Funders

“ Aggregation plays an increasingly essential role in maximising the long-term benefits of open access, hel... ” Show more

Ben Johnson, Research Policy Adviser at Research England

Our services, access to raw data.

Create new and innovative solutions.

Content discovery

Find relevant research and make your research more visible.

Managing content

Manage how your research content is exposed to the world.

Companies using CORE

Gareth Malcolm

Gareth Malcolm

Content Partner Manager at Turnitin

Our partnership with CORE will provide Turnitin with vast amounts of metadata and full texts that we can utilise in our plagiarism detection software.

Academic institution using CORE

Kathleen Shearer

Executive Director of the Confederation of Open Access Repositories (COAR)

CORE has significantly assisted the academic institutions participating in our global network with their key mission, which is their scientific content exposure. In addition, CORE has helped our content administrators to showcase the real benefits of repositories via its added value services.

Partner projects

Ben Johnson

Ben Johnson

Research Policy Adviser

Aggregation plays an increasingly essential role in maximising the long-term benefits of open access, helping to turn the promise of a 'research commons' into a reality. The aggregation services that CORE provides therefore make a very valuable contribution to the evolving open access environment in the UK.

logo

Advertisement

Issue Cover

  • Previous Article
  • Next Article

1. Introduction

2. related works, 3. methodology, 4. application, 5. data set, 6. conclusion, author contributions, aminer: search and mining of academic social networks.

ORCID logo

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Information School, Renmin University of China, Beijing 100872, China

  • Cite Icon Cite
  • Open the PDF for in another window
  • Permissions
  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Search Site

Huaiyu Wan , Yutao Zhang , Jing Zhang , Jie Tang; AMiner: Search and Mining of Academic Social Networks. Data Intelligence 2019; 1 (1): 58–76. doi: https://doi.org/10.1162/dint_a_00006

Download citation file:

  • Ris (Zotero)
  • Reference Manager

AMiner is a novel online academic search and mining system, and it aims to provide a systematic modeling approach to help researchers and scientists gain a deeper understanding of the large and heterogeneous networks formed by authors, papers, conferences, journals and organizations. The system is subsequently able to extract researchers’ profiles automatically from the Web and integrates them with published papers by a way of a process that first performs name disambiguation. Then a generative probabilistic model is devised to simultaneously model the different entities while providing a topic-level expertise search. In addition, AMiner offers a set of researcher-centered functions, including social influence analysis, relationship mining, collaboration recommendation, similarity analysis, and community evolution. The system has been in operation since 2006 and has been accessed from more than 8 million independent IP addresses residing in more than 200 countries and regions.

A variety of academic social networking websites including Google Scholar ① , Microsoft Academic ② , Semantic Scholar ③ , ResearchGate ④ and Academia.edu ⑤ have gained great popularity over the past decade. The common purpose of these academic social networking systems is to provide researchers with an integrated platform to query academic information and resources, share their own achievements, and connect with other researchers.

Several issues within academic social networks have been investigated in these systems. However, most of the issues are investigated separately through independent processes. As such, there is not a congruent process or series of methods for mining the whole of disparate academic social networks. The lack of such methods can be attributed to two reasons:

Lack of semantic-based information. The user profile information obtained solely from the user who entered his or her information or extracted by heuristics is sometimes incomplete or inconsistent. Users do not fill in personal information merely because they are unwilling to do so;

Lack of a unified modeling approach for effective mining of the social network. Traditionally, different types of information sources in the academic social network were modeled individually, and thus dependencies between them cannot be captured. However, dependencies may exist between social data. High-quality search services need to consider the intrinsic dependencies between the different heterogeneous information sources.

AMiner ⑥ [ 1 ], the second generation of the ArnetMiner system, is designed to search and perform data mining operations against academic publications on the Internet, using social network analysis to identify connections between researchers, conferences, and publications. In AMiner, our objective is to answer four questions:

How to automatically extract the researcher profile from the existing Web?

How to integrate the extracted information (i.e., researchers’ profiles and publications) from different sources?

How to model the different types of information sources in a unified model?

How to provide powered search services in a constructed network?

To answer the above questions, a series of novel approaches are implemented within the AMiner system. The overall architecture of the system is shown in Figure 1 .

The architecture of AMiner.

The architecture of AMiner.

The system mainly consists of five components:

Extraction. Focus is on automatically extracting researchers’ profiles from the Web. The service first collects and identifies one's relevant pages (e.g., homepages or introducing pages) from the Web, then uses a unified approach [ 2 , 3 ] to extract data from the identified documents. It also extracts publications from online digital libraries using heuristic rules. In addition, a simple but very effective approach is taken for profiling Web users by leveraging the power of big data [ 4 ].

Integration. Joins and integrates the extracted researchers’ profiles and the extracted publications. The application employs the researcher name as the identifier. A probabilistic model [ 5 ] and a comprehensive framework [ 6 ] have been developed to deal with the name ambiguity problem in the integration. The integrated data are then stored, sorted and indexed into a research network knowledge base.

Storage and Access. Provides storage and indexing for the extracted and integrated data in the researcher network knowledge base. Specifically, for storage it employs Jena [ 7 ], a tool to store and retrieve ontological data; for indexing, it employs the inverted file indexing method, an established method for facilitating information retrieval [ 8 ].

Modeling. Utilizes a generative probabilistic model [ 1 ] to simultaneously model the different types of information sources. The system estimates a mixture of topic distribution associated with the different information sources.

Services. Provides several powered services based on the modeling results: profile search, expert finding, conference analysis, course search, sub-graph search, topic browser, academic ranks, and user management.

For several features in the system, e.g., profile extraction, name disambiguation, academic topic modeling, expertise search and academic social network mining, we propose some new approaches to overcome the drawbacks that exist in the conventional methods.

The rest of this paper is organized as follows. Section 2 discusses related works, and Section 3 presents our proposed approaches in the system. Section 4 shows some applications of AMiner. Section 5 lists the data sets we constructed. Finally, Section 6 makes a conclusion.

Previously several issues in academic social networks have been investigated and some systems were developed.

Google Scholar provides a search engine to identify the hyperlinks of publications that are publicly available or may be obtained through institutional libraries. Google Scholar is not a social networking website in the general sense, but yet it has become an important platform for searching academic resources, keeping up with the latest research, promoting one's own achievements, and tracking academic impact. Registered users can create a personal Google Scholar profile to post their research interests, manage their publications, correct their co-authors, and access their citations per year metrics. The social part of Google Scholar is very simple: a user can follow a researcher so that when he or she has a new publication or citation the user will receive an email; the user can also set up alerts based on his or her own research field.

Microsoft Academic [ 9 ] employs technologies of machine learning, semantic analysis and data mining to help users explore academic information more powerfully. A user can create an account and a public profile by claiming the publications he or she authored. Microsoft Academic provides more extensive “follow” functions. Users can follow researchers, publications, journals, conferences, organizations and research topics. Based on a user's publication history and the events the user is following, Microsoft Academic will show the most relevant items and news on his or her personalized homepage. In addition, rather than providing a simple keyword-based search engine, Microsoft Academic presents relevant results and recommendations to help users discover more academic information resources of interest to support a more expansive learning and research experience.

Semantic Scholar is designed to be a “smart” search engine to help researchers find better academic publications faster. It uses a combination of machine learning, natural language processing, and machine vision to analyze publications and extract important features, adding a supplementary layer of semantic analysis to the traditional methods of citation analysis. In comparison to Google Scholar and Microsoft Academic, Semantic Scholar can quickly highlight the most important papers and identify the connections between them. The resulting influential citations, images and key phrases that the engine provides quickly become more relevant and impactful to the user's work.

ResearchGate's aim and objective is to connect geographically distant researchers and allow them to communicate continuously. Registered users of the site each have a user profile and can share their research output including papers, data, book chapters, patents, research proposals, algorithms, presentations and software source code. Users can also follow the activities of others and engage in discussions with them. ResearchGate organizes itself mainly around research topics and maintains its own index, i.e., the ResearchGate Score, based on the user's contribution to content, profile details and participation in interaction on the site. An example is asking questions and offering answers.

Academia.edu is a for-profit academic social networking website. It allows its users to create a profile, share their works, monitor their academic impact, select areas of interests and follow the research evolving in particular fields. Users can browse the networks of people with similar interests from around the world on the website. Academia.edu includes an analytics dashboard where users can see the influence and diffusion of their works in real time. In addition, Academia.edu has an alert service that sends registered users an email whenever a person whom they are following publishes a new paper. Academia.edu alerts anyone who is following a certain topic. In this way the awareness of a paper can be raised by potential citators through the alert system.

Although most of the above systems have integrated a gigantic amount of academic resources and provided abundant means of searching and querying social networking functions, they have not performed systematic semantic-level analysis or mining. Consequently, in our AMiner system, our primary objective is to provide a unified modeling approach to gaining a greater and deeper understanding of the semantic connection in large and heterogeneous academic networks consisting of authors, papers, conferences, journals and organizations. As a result, our system can provide topic-level expertise search and researcher-centered functions.

In this section we introduce in detail the challenges we are addressing with our AMiner system, and we present our methods and solutions.

3.1 Profile Extraction

We define the schema of the researcher profile by extending the FOAF ontology [ 10 ], as shown in Figure 2 . In the schema, 24 properties and two relations are defined [ 2 , 3 ].

The schema of the researcher profile.

The schema of the researcher profile.

It is certainly not a trivial task to extract the research network from the Web. The researchers from different universities, institutes or companies have disparate page and profile templates and data feeds. So an ideal extraction method should consider processing all kinds of templates and formats. The approach we proposed consists of three steps:

Relevant page identification. Given a researcher name, we first get a list of Web pages by a search engine (Google API is used) and then identify the homepage or introducing page using a classifier. We define a set of features, such as whether the title of the page contains the person name and whether the URL address (partly) contains the person name, and employ SVM [ 11 ] for the classification.

Preprocessing. We separate the text into tokens and assign possible tags to each token. The tokens form the basic units and the pages form the sequences of units in the following tagging step.

Tagging. Given a sequence of units, we determine the most likely corresponding sequence of tags by using a trained tagging model. The type of tag corresponds with the property defined in Figure 2 . We define five types of tokens (i.e., standard word, special word, image token, term, and punctuation mark) and use heuristics to identify tokens on the Web. After that, we assign several possible tags to each token based on the token type, and then a trained CRF model [ 12 ] is used to find the best tag assignment having the highest likelihood.

Recently, we revisit the problem of Web user profiling in the big data and propose a simple but very effective approach, referred to as MagicFG [ 4 ], for profiling Web users by leveraging the power of big data. To avoid error propagation, the approach integrates page identification and profile extraction in an unified framework. To improve the profiling performance, we present the concept of contextual credibility. The proposed framework also supports the incorporation of human knowledge. It defines human knowledge as Markov logics statements and formalizes them into a factor graph model. The MagicFG method has been deployed in AMiner system for profiling millions of researchers.

Figure 3 gives an example of researcher profile.

An example of researcher profile.

An example of researcher profile.

3.2 Name Disambiguation

We have collected more than 200 million publications from existing online data libraries, including DBLP ⑦ , ACM DL ⑧ , CiteSeerX ⑨ and others. In each data source, authors are identified by their names. For integrating the researcher profiles and the publication data, we use researcher name and the author name as the identifier. This process inevitably has the ambiguous problem.

A few years ago, we had proposed a probabilistic framework [ 5 ] based on Hidden Markov Random Fields (HMRF) [ 13 ] which is able to capture dependencies between observations (here each paper is viewed as an observation). The disambiguation problem is cast as assigning a tag to each paper with each tag representing an actual researcher.

More recently we proposed an additional comprehensive framework [ 6 ] to address the name disambiguation problem. The overview of the framework is as shown in Figure 4 . A novel representation learning method is proposed, which incorporates both global and local information. In addition, an end-to-end cluster size estimation method is presented in the framework. To improve the accuracy, we involve human annotators into the disambiguation process. The method has now been deployed in AMiner to deal with the name disambiguation problem at the billion scale, which demonstrates its effectiveness and efficiency.

An overview of the name disambiguation framework in AMiner.

An overview of the name disambiguation framework in AMiner.

3.3 Topic Modeling

In academic search, representation of the content of text documents, authors interests and conferences themes is a critical issue of any approach. Traditionally, documents are represented based on the “bag of words” (BOW) assumption. However, this representation cannot utilize the “semantic” dependencies between words. In addition, in the course of an academic search there are different types of information sources, thus how to capture the dependencies between them becomes a challenging issue. Unfortunately, existing topic models such as probabilistic Latent Semantic Indexing (pLSI) [ 14 ], Latent Dirichlet Allocation (LDA) [ 15 ] and Author-Topic model [ 16 , 17 ] cannot be directly applied to the context of academic search. This is because they simply cannot capture all intrinsic dependencies between papers and conferences.

A unified topic modeling approach [ 1 ] is proposed for simultaneously modeling characteristics of documents, authors, conferences and dependencies among them. (For simplicity, we use conference to denote conference, journal and book in the model.) The proposed model is called Author-Conference-Topic (ACT) model. More specifically, different strategies can be employed to model the topic distributions (as shown in Figure 5 ) and consequently the implemented models can have different knowledge representation capacities. In Figure 5 (a) each author is associated with a mixture of weights over topics. For example, each word token correlated to a paper, and likewise a conference stamp associated to each word token, is generated from a sampled topic. In Figure 5 (b) each author-conference pair is associated with a mixture of weights over the topics, and word tokens are then generated from the sampled topics. In Figure 5 (c), each author is associated with topics, each word token is generated from a sampled topic, and then the conference is generated from the sampled topics of all word tokens in a paper.

Graphical representation of the three Author-Conference-Topic (ACT) models.

Graphical representation of the three Author-Conference-Topic (ACT) models.

3.4 Expertise Search

When searching for academic resources and formulating a query, a user endeavors to find authors with specific expertise, and papers and conferences related to the research areas of interest.

In the AMiner system we present a topic level expertise search framework [ 18 ]. Different from the traditional Web search engines that perform retrieval and ranking at document level, we study the expertise search problem at topic level over disparate heterogenous networks. A unified topic model, namely Citation-Tracing-Topic (CTT), is proposed to simultaneously model topical aspects of different objects in the academic network. Based on the learned topic models, we investigate the expertise search problem from three dimensions: ranking, citation tracing analysis and topic graph search. Specifically, we propose a topic level random walk method for ranking different objects. In citation tracing analysis we seek to uncover how a study influences its follow-up study. Finally, we have developed a topical graph search function, based on the topic modeling and citation tracing analysis.

Figure 6 gives an example result of experts found for the query “Data Mining”.

An example result of experts found for “Data Mining”.

An example result of experts found for “Data Mining”.

3.5 Academic Social Network Mining

Based on the AMiner system, this set of researcher-centric academic social network mining functions includes social influence analysis, relationship mining, collaboration recommendation, similarity analysis and community evolution.

Social Influence Analysis. In large social networks, persons are influenced by others for various reasons. We propose a Topic Affinity Propagation (TAP) model [ 19 ] to differentiate and quantify the social influence. TAP can take results of any topic modeling and the existing network structure to perform topic-level influence propagation. Recently we design an end-to-end framework that we call DeepInf for feature representation learning and to predict social influence [ 20 ]. Each user is represented with a local subnetwork which he or she is embedded in. A graph neural network is used to learn the representation of the sub-network which in turn effectively integrates the user-specific features and network structures. The framework of DeepInf is shown in Figure 7 .

Model Framework of DeepInf.

Model Framework of DeepInf.

Social Relationship Mining. Inferring the type of social relationships between two users is a very important task in social relationship mining. We propose a two-stage framework named Time-constrained Probabilistic Factor Graph model (TPFG) [ 21 ] for inferring advisor-advisee relationships in the co-author network. The main idea is to leverage a time-constrained probabilistic factor graph model to decompose the joint probability of the unknown advisors over all the authors. Furthermore, we develop a framework named TranFG for classifying the type of social relationships across disparate heterogeneous resources [ 22 ]. The framework incorporates social theories into a factor graph model, which effectively improves the accuracy of predicting the types of social relationships in a target network by borrowing knowledge from another source network.

Similarity Analysis. Estimating similarity between vertices is a fundamental issue in social network analysis. We propose a sampling-based method to estimate the top- k similar vertices [ 23 ]. The method is based on the novel idea of the random path sampling method known as Panther. Given a particular network as a starting point, Panther randomly generates a number of paths of a pre-defined length, and then the similarity between two vertices can be modeled as estimating the possibility that the two vertices appear on the same paths.

Collaboration Recommendation. Interdisciplinary collaborations have generated a huge impact on society. However, it is usually hard for researchers to establish such cross-domain collaborations. We analyze the cross-domain collaboration data from research publications and propose a Cross-domain Topic Learning (CTL) model [ 24 ] for collaboration recommendation. For handling sparse connections, CTL consolidates the existing cross-domain collaborations through topic layers as opposed to utilizing author layers. This alleviates the sparseness issue. For handling complementary expertise, CTL models topic distributions from source and target domains separately, as well as the correlation across domains. For handling topic skewness, CTL only models relevant topics to the cross-domain collaboration.

Community Evolution. Since social networks are rather dynamic, it is interesting to study how persons in the networks form different clusters and how the various clusters evolve over time. We study mining co-evolution of multi-typed objects in a special type of heterogeneous network, called a star network. We subsequently examine how the multi-typed objects influence each other in the network evolution [ 25 ]. A Hierarchical Dirichlet Process Mixture Model-based evolution is proposed which detects the co-evolution of multi-typed objects in the form of a multi-typed cluster evolution in dynamic star networks. An efficient inference algorithm is provided to learn the proposed model.

AMiner is developed to provide comprehensive search and mining services for researcher social networks. In this system we focus on: (1) creating a semantic-based profile for each researcher by extracting information from the distributed Web; (2) integrating academic data (e.g., the bibliographic data and the researcher profiles) from multiple sources; (3) accurately searching the heterogeneous network; (4) analyzing and discovering interesting patterns from the built researcher social network. The main search and analysis functions in AMiner are summarized in the following section.

Profile Search. Input a researcher name (e.g., Jie Tang). The system will return the semantic-based profile created for the researcher using information extraction techniques. In the profile page, the extracted and integrated information include: contact information, photo, citation statistics, academic achievement evaluation, (temporal) research interest, educational history, personal social graph, research funding (currently only US and CN) and publication records (including citation information and the papers that are automatically assigned to several different domains).

Expert Finding. Input a query (e.g., data mining). The system will return experts on this topic. In addition, the system will suggest the top conference and the top ranked papers on this topic. There are two ranking algorithms: VSM and ACT. The former is similar to the conventional language model and the latter is based on our Author-Conference-Topic (ACT) model. Users can also provide feedbacks to the search results.

Conference Analysis. Input a conference name (e.g., KDD). The system returns those who are the most active researchers on this conference as well as the top-ranked papers.

Course Search. Input a query (e.g., data mining). The system will return those who are teaching courses relevant to the query.

Sub-graph Search. Input a query (e.g., data mining). The system first tells you what topics are relevant to the query (e.g., five topics “Data mining”, “XML Data”, “Data Mining/Query Processing”, “Web Data/Database design” and “Web Mining” are relevant) and subsequently display the most important sub-graph discovered on each relevant topic augmented with a summary for the sub-graph.

Topic Browser. Based on our Author-Conference-Topic (ACT) model, we automatically discover 200 hot topics from the publications. For each topic we automatically assign a label to represent its meanings. Furthermore, the browser presents the most active researchers, the most relevant conferences/papers and the evolution trend of the topics that are discovered.

Academic Ranks. We define eight measures to evaluate the researcher's achievement. The measures include “h-index”, “Citation”, “Uptrend”, “Activity”, “Longevity”, “Diversity”, “Sociability” and “New Star”. For each measure, we output a ranking list in different domains. For example, one can search those who have the highest citation numbers in the “data mining” domain. Figure 8 gives an example of researcher ranking by sociability index.

An example of researcher ranking by sociability index.

An example of researcher ranking by sociability index.

User Management. One can register as a user to: (1) modify the extracted profile information; (2) provide feedback on the search results; (3) follow researchers in AMiner; and (4) to create an AMiner page (which can be used to advertise conferences and workshops, or to recruit students).

AMiner has collected a large scholar data set with more than 130,000,000 researcher profiles and 233,000,000 publications from the Internet by June 2018 along with a number of subsets that were constructed for different research purposes. The details of these subsets are as follows and can be found at https://www.aminer.cn/data .

Citation Network. The citation data are extracted from DBLP, ACM DL and other sources. The data set contains 1,572,277 papers and 2,084,019 citation relationships. Each paper is associated with abstract, authors, year, venue, and title. The data set can be used for clustering with network and side information, studying influence in the citation network, finding the most influential papers, topic modeling analysis, etc.

Academic Social Network. These data include papers, paper citation, author information and author collaboration. The data set contains 1,712,433 authors, 2,092,356 papers, 8,024,869 citation relationships and 4,258,615 collaboration relationships noted between authors.

Advisor-advisee: The data set is comprised of 815,946 authors and 2,792,833 co-author relationships. For evaluating the performance of inferring advisor-advisee relationships between co-authors we created a smaller ground truth data using the following method: (1) collecting the advisor-advisee information from the Mathematics Genealogy project and the AI Genealogy project; (2) manually crawling the advisor-advisee information from researchers’ homepages. Finally, we have labeled 1,534 co-author relationships of which 514 are advisor-advisee relationships.

Topic-co-author. It is a topic-based co-author network, which contains 640,134 authors of 8 topics and 1,554,643 co-author relationships. The eight topics are: Data Mining/Association Rules, Web Services, Bayesian Networks/Belief Function, Web Mining/Information Fusion, Semantic Web/Description Logics, Machine Learning, Database Systems/XML Data and Information Retrieval.

Topic-paper-author. The data set is collected for the purpose of cross domain recommendation which contains 33,739 authors associated to5 topics as well as 139,278 co-author relationships. The five topics are Data Mining (with 6,282 authors and 22,862 co-author relationships), Medical Informatics (with 9,150 authors and 31,851 co-author relationships), Theory (with 5,449 authors and 27,712 co-author relationships), Visualization (with 5,268 authors and 19,261 co-author relationships) and Database (with 7,590 authors and 37,592 co-author relationships).

Topic-citation. It is a topic-based citation network which contains 2,329,760 papers of 10 topics and 12,710,347 citations relationships. The 10 topics are: Data Mining/Association Rules, Web Services, Bayesian Networks/Belief Function, Web Mining/Information Fusion, Semantic Web/Description Logics, Machine Learning, Database Systems/XML Data, Pattern Recognition/Image Analysis, Information Retrieval, and Natural Language System/Statistical Machine Translation.

Kernel Community. It is a co-authorship network with 822,415 nodes and 2,928,360 undirected edges. Each vertex represents an author and each edge represents a co-author relationship.

Dynamic Co-author. The data set contains 1,768,776 papers published during the time period from 1986 to 2012 with 1,629,217 authors involved. Each year is regarded as a time stamp and there are 27 time stamps in total. At each time stamp, we create an edge between two authors if they have co-authored at least one paper in the most recent three years (including the current year). We convert the undirected co-author network into a directed network by regarding each undirected edge as two symmetric directed edges.

Expert Finding. This data set is a benchmark for expert finding which contains 1,781 experts of 13 topics.

Association Search. This data set is used to evaluate the effectiveness of association search approaches which contains 8,369 author pairs specific to nine topics. Each author pair contains a source author and target author.

Topic Model Results for AMiner Data Set: There are the results of ACT model on the AMiner data set which contains the top 1,000,000 papers and authors of 200 topics.

Co-author. This is a co-author network on the AMiner system which contains 1,560,640 authors and 4,258,946 co-author relationships.

Disambiguation. This data set is used for studying name disambiguation in a digital library. It contains 110 authors and their affiliations as well as their disambiguation results (ground truth).

In this paper we present a novel online academic searching and mining system, Aminer. It is the second generation of the ArnetMiner system. We first present the overview architecture of the system which consists of five main components, i.e., extraction, integration, storage and access, modeling and services. Then we follow this by introducing the important methodologies proposed in the system including the profile extraction and user profiling methods, name disambiguation algorithms, topic modeling methods, expertise search strategies and series of academic social network mining methods. Furthermore, we introduce the typical applications as well as a broad and significant offering of available data sets already presented on the platform.

We acknowledge that AMiner is still at its developmental stage on both the scale of resources and the quality of services. However, in the future we are going to exploit additional intelligent methods for mining deep knowledge from scientific networks and we will deploy a more convenient and personalized framework for delivering academic search and finding services.

This work was a collaboration between all of the authors. J. Tang ([email protected], corresponding author) is the leader of the AMiner project, who drew the whole picture of the system. Y.T. Zhang ([email protected]) and J. Zhang ([email protected]) summarized the methodology part of this paper. H.Y. Wan ([email protected]) summarized the applications and data sets in the AMiner system and drafted the paper. All the authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.

https://scholar.google.com/

https://academic.microsoft.com/

https://www.semanticscholar.org/

https://www.researchgate.net/

https://www.academia.edu/

https://www.aminer.cn/

https://dblp.uni-trier.de/

https://dl.acm.org/

http://citeseerx.ist.psu.edu/

Author notes

graphic

Email alerts

Related articles, affiliations.

  • Online ISSN 2641-435X

A product of The MIT Press

Mit press direct.

  • About MIT Press Direct

Information

  • Accessibility
  • For Authors
  • For Customers
  • For Librarians
  • Direct to Open
  • Open Access
  • Media Inquiries
  • Rights and Permissions
  • For Advertisers
  • About the MIT Press
  • The MIT Press Reader
  • MIT Press Blog
  • Seasonal Catalogs
  • MIT Press Home
  • Give to the MIT Press
  • Direct Service Desk
  • Terms of Use
  • Privacy Statement
  • Crossref Member
  • COUNTER Member  
  • The MIT Press colophon is registered in the U.S. Patent and Trademark Office

This Feature Is Available To Subscribers Only

Sign In or Create an Account

Academic Networks

In the academic world networking was, and often still is, mediated through a mentor or other superior. Collaborative opportunities often come through mutual projects or existing networks already established through your mentor. Often these networks, constructed slowly over time through common work, can be narrowly focused on subject matter that may be limited in diversity and scope. It may miss the work and collaborative potential of a key professional right around the corner simply because they weren’t in your mentor’s network. This could be because their subject matter was not obviously related to your mentor’s. Lost opportunities for innovative and productive collaborations may have a significant impact on your research and career.

To address these lost opportunities, academia has begun to encourage networking among faculty and trainees to enhance innovation and collaboration to advance a new research paradigm – Team Science. Maintaining a personal network is extremely important to engage in the vibrant nature of Team Science. Academic relationships are dynamic, reflecting the diverse expertise and influence of all individuals, which in turn are always in flux. These factors are often rapidly changing, and in an unpredictable fashion. To manage this ever-changing academic environment, personal network management is a crucial aspect of one’s professional information management. It is the practice of managing multiple collaborative contacts and connections for social and professional benefits. If you can successfully tap key influential networks within your institution and wider academic discipline, you will be more likely to be nurtured toward success. In other words, it is not just what you know, but also who you know and can work with that matters in achieving academic success. Who you know may facilitate knowledge of essential methods and processes ranging from the science itself to accessing funding to “grantsmanship” to policy impact.

Building and Nurturing a Network

Although it may be easy to build a network by simply accruing a list of contacts, the real challenge is maintaining and leveraging those connections effectively. Information fragmentation can lead to difficulties encountered in ensuring co-operation and keeping track of different personal information assets (e.g. Facebook, Twitter etc.). Simply maintaining a contact list with accurate contact information (office phone, cell phone, email address, etc.) is a simple yet essential piece of maintaining a network. Devising and committing to a sustainable and organized approach to personal networking including the use of social media resources is becoming increasingly important to academic efficiency and success

Being Engaged in Interdisciplinary Team Research: Is Your Network Working for You?

If building and nurturing a professional network is requisite to being a successful academic researcher, then to catalyze and seize opportunities for interdisciplinary team-based research, the argument for Academic Networking couldn’t be stronger. Each academic has a professional network. Some build a deep network of people mostly in the same field, which can unwittingly limit potential interdisciplinary opportunities. Some build widely diverse networks using carefully selected individuals who can optimize their chances for interdisciplinary research. Others have made it a numbers game, focusing on the quantity of people and longstanding list of mentees in their network. But, have you taken a moment to look carefully at what kind of network you have? What strategies do you use to connect with others in a way that helps you harness opportunities for doing research in interdisciplinary teams? How many interdisciplinary research projects are you working on now?

Having a targeted and effective professional network can make the difference between working hard and working smart in the Team Science paradigm. Effective professional networking on-line can provide access to quick conversations, expert opinions, issues or systems scans. It can lead to new ideas, new connections and provide real-time insights about your research or your discipline. It can be an efficient way to find out what people in your network are doing and whether to reconnect with them. It can facilitate connections at conferences and meetings, open doors and build relationships with experts, influencers, and others key individuals.

If you don’t know the answers to or have never thought about these questions, consider taking some time to review, enhance and nurture your network. Your self-reflection should focus on “who” should be in your network – identifying those individuals who best facilitate your participation in team science. When considering your personal network, keep in mind that to be effective, your core connections and relationships should bridge smaller, more-diverse groups and geography. These relationships should also result in more learning, less bias, and greater personal growth by modeling positive behaviors: generosity, authenticity, and enthusiasm. Once you have defined your core network and how they relate to you and others within your network, consider who in your core can help with your professional and academic challenges. Is your core network group diverse enough and are you generating new ideas from this core? Are there people who take but don’t give? Should you continue your affiliation to them? Are there gaps in expertise, skill, support or availability?

The “benefits” of effective and well-curated networks that facilitates your ability to actively engage in team science include:  1. Communicating with peers and colleagues to keep informed and up to date about who is doing what; 2. To learn about new methods and tools people are using; 3. To create visibility for yourself that can help you develop a reputation (your brand); 4. To build career stability for yourself; 5. To move new ideas through the network and test it out; 6. To seek placement opportunities for your trainees – just to name a few.

Remember that the type, degree and targets for academic networking will evolve throughout the course of your career depending on your professional and academic needs. The Academic Network required by a trainee during transitions (e.g. new GMS student or Post-Doc) will vary. But what they all have in common is the absolute need to establish and nurture a well curated network of supporters and collaborators as they proceed within their academic field. Networking continues to be important even in mid to late career as ones needs and capacity to support others evolves. Each collaborator plays a different but critical role in the scientific enterprise. At times, special networks may be important based on other important commonalities. For example, for women, networking can be particularly challenging because attempts at networking requires self-promotion (which can be unfamiliar or uncomfortable for some), and can be misunderstood by others. Similar issues may exist for underrepresented minority researchers, and having a robust and supportive network can be invaluable to their success.

Effective networking is a critical yes often underestimated factor in establishing and sustaining a successful academic career in an ever-changing, increasingly more collaborative and competitive research environment. In a future blog post we will address a key personal networking activity – networking at a professional conference.

By  C. Shanahan

Share this:

  • Click to email this to a friend
  • Click to Press This!
  • Click to share on Twitter
  • Share on Facebook
  • Click to share on LinkedIn
  • Click to share on Google+

View all posts

Detail of a painting depicting the landscape of New Mexico with mountains in the distance

Explore millions of high-quality primary sources and images from around the world, including artworks, maps, photographs, and more.

Explore migration issues through a variety of media types

  • Part of The Streets are Talking: Public Forms of Creative Expression from Around the World
  • Part of The Journal of Economic Perspectives, Vol. 34, No. 1 (Winter 2020)
  • Part of Cato Institute (Aug. 3, 2021)
  • Part of University of California Press
  • Part of Open: Smithsonian National Museum of African American History & Culture
  • Part of Indiana Journal of Global Legal Studies, Vol. 19, No. 1 (Winter 2012)
  • Part of R Street Institute (Nov. 1, 2020)
  • Part of Leuven University Press
  • Part of UN Secretary-General Papers: Ban Ki-moon (2007-2016)
  • Part of Perspectives on Terrorism, Vol. 12, No. 4 (August 2018)
  • Part of Leveraging Lives: Serbia and Illegal Tunisian Migration to Europe, Carnegie Endowment for International Peace (Mar. 1, 2023)
  • Part of UCL Press

Harness the power of visual materials—explore more than 3 million images now on JSTOR.

Enhance your scholarly research with underground newspapers, magazines, and journals.

Explore collections in the arts, sciences, and literature from the world’s leading museums, archives, and scholars.

CitNetExplorer

For full functionality of this site it is necessary to enable JavaScript. Here are the instructions how to enable JavaScript in your web browser .

academic paper network

Welcome to CitNetExplorer

Citnetexplorer is a software tool for visualizing and analyzing citation networks of scientific publications. the tool allows citation networks to be imported directly from the web of science database. citation networks can be explored interactively, for instance by drilling down into a network and by identifying clusters of closely related publications., why use citnetexplorer.

Examples of applications of CitNetExplorer include:

Analyzing the development of a research field over time. CitNetExplorer visualizes the most important publications in a field and shows the citation relations between these publications to indicate how publications build on each other.

Identifying the literature on a research topic. CitNetExplorer delineates the literature on a research topic by identifying publications that are closely connected to each other in terms of citation relations.

Exploring the publication oeuvre of a researcher. CitNetExplorer visualizes the citation network of the publications of a researcher and shows how the work of a researcher has influenced the publications of other researchers.

Supporting literature reviewing. CitNetExplorer facilitates systematic literature reviewing by identifying publications cited by or citing to one or more selected publications.

Download CitNetExplorer

academic paper network

Problems opening Web of Science files

Users of CitNetExplorer may experience problems when opening files downloaded from Web of Science. Opening these files may result in a so-called null pointer exception. This problem can be avoided by excluding publications of the document type 'early access' from the search results in Web of Science.

academic paper network

You may also be interested in our VOSviewer tool

VOSviewer is a software tool for constructing and visualizing bibliometric networks. Networks can be constructed based on citation relations. Examples of such networks are bibliographic coupling and co-citation networks of journals, researchers, and individual publications. VOSviewer also offers text mining functionality that can be used to construct and visualize co-occurrence networks of important terms extracted from a body of scientific literature.

VOSviewer website

Share this page

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Elsevier - PMC COVID-19 Collection

Logo of pheelsevier

Using the full-text content of academic articles to identify and evaluate algorithm entities in the domain of natural language processing

Associated data.

  • • We extract algorithms from articles and evaluate the impact of algorithms based on the number of papers and mention duration.
  • • We analyze the algorithms with high impact in different years, and explore the evolution of influence over time.
  • • Algorithms and sentences we extracted can be used as training data for automatic extraction of algorithms in the future.

In the era of big data, the advancement, improvement, and application of algorithms in academic research have played an important role in promoting the development of different disciplines. Academic papers in various disciplines, especially computer science, contain a large number of algorithms. Identifying the algorithms from the full-text content of papers can determine popular or classical algorithms in a specific field and help scholars gain a comprehensive understanding of the algorithms and even the field. To this end, this article takes the field of natural language processing (NLP) as an example and identifies algorithms from academic papers in the field. A dictionary of algorithms is constructed by manually annotating the contents of papers, and sentences containing algorithms in the dictionary are extracted through dictionary-based matching. The number of articles mentioning an algorithm is used as an indicator to analyze the influence of that algorithm. Our results reveal the algorithm with the highest influence in NLP papers and show that classification algorithms represent the largest proportion among the high-impact algorithms. In addition, the evolution of the influence of algorithms reflects the changes in research tasks and topics in the field, and the changes in the influence of different algorithms show different trends. As a preliminary exploration, this paper conducts an analysis of the impact of algorithms mentioned in the academic text, and the results can be used as training data for the automatic extraction of large-scale algorithms in the future. The methodology in this paper is domain-independent and can be applied to other domains.

1. Introduction

The speed of social development is accelerating, and new technologies are born every day. Societal developments provide people with new opportunities and conveniences. However, human beings still face new challenges and problems. At the end of 2019, a novel coronavirus (SARS-CoV-2) was detected. The virus spreads very quickly, causing substantial losses to the entire society. Issues such as how to find a cure for the virus, how to find the source of the virus, how to develop a vaccine, and how to distribute materials during the epidemic require experts and scholars to find new or more suitable methods in their research field.

Among different categories of methods, algorithms are bound to have an important role; algorithms are ubiquitous and offer precise methodologies to solve problems ( Carman, 2013 ). Informally, an algorithm is any well-defined computational procedure that takes a set of values as input and produces some value as output ( Cormen, Leiserson, Rivest, & Stein, 2009 ), which is needed in scientific research. Especially in the era of big data, data-driven research requires algorithms to extract, process, and analyze massive amounts of data. Therefore, algorithms have become research objects, as well as useful technologies, of scholars in different fields. In the "Venice Time Machine" project in the field of digital humanities, researchers used machine learning algorithms to reveal Venice's history in a dynamic digital form to reproduce the glorious style of the ancient city ( Abbott, 2017 ). In the field of computer science, scientists have used machine learning algorithms to combat the novel coronavirus, including the use of algorithms to detect infections, differentiate COVID-19 from the common flu and to predict the epidemic situation ( Dave, 2020 ).

Academic papers in many disciplines, especially in the computer science domain, propose, improve, and use various algorithms ( Tuarob & Tucker, 2015 ). However, not everyone is an algorithm expert. For many researchers, especially beginners in a field, gaining a thorough understanding of the algorithms and finding one that is suitable for their own research are urgent problems. Scholars usually find suitable algorithms through two methods. One method is direct consultation with more experienced scholars, but this method depends on the advisers’ knowledge and does not guarantee the comprehensiveness of the algorithm suggestions. Another method is reading academic literature and finding algorithms from the research of other people, which provides scholars with more algorithms. Academic papers are a perfect source of algorithms; however, information overload cannot be ignored. The research has pointed out that the number of academic literature entries generated worldwide has reached the level of millions, and it continues to increase at a rate of approximately 3% each year ( Bornmann & Mutz, 2015 ). If scientists only search for algorithms by reading articles, it will be a time-consuming and labor-intensive challenge. If the algorithms mentioned in papers, namely, any algorithm appearing in the papers, including the algorithm proposed, used, improved, described or simply mentioned by the author, can be identified and evaluated, it can save time for scholars and provide a solid foundation for them to sort out the algorithms of specific disciplines or research topics.

To this end, this article aims to collect algorithms in research papers in a domain and further explore the influence of algorithms. Tuarob et al. (2020) defined a standard algorithm in academic papers as one that is well known by people in a field and is usually recognized by its name, including Dijkstra’s shortest-path algorithm, the Bellman-Ford algorithm, the Quicksort algorithm, etc. On this basis, we use our experience, authors’ descriptions and other external knowledge to annotate the named algorithms in articles. In addition, we posit that, when an algorithm appears in an article, it has an influence on the article. Therefore, we evaluate the influence of an algorithm based on the number of papers that mention the algorithm in the full-text content. Mention count has proven to be a suitable indicator to measure the influence of entities in academic papers ( Howison & Bullard, 2016 ; Ma & Zhang, 2017 ; Pan, Yan, Wang, & Hua, 2015 ). Therefore, we take the field of natural language processing as an example and explore three research questions:

What are the high-influence algorithms in natural language processing?

What are the differences in the influential algorithms in different years?

How does the influence of the algorithm change over time?

To be more specific, we attempt to research on the influence of different algorithms in NLP domain through the first question. For question 2, we combine the influence and time to understand the changes of high-impact algorithms in different years, and try to analyze the development of NLP field from the perspective of changes in algorithm. The third question is the refinement of the second question. We will explore evolution of specific independent algorithms and give the pattern of changes in influence.

The reason for choosing the field of natural language processing is that papers in the computer science field are more likely to propose and use algorithms than papers in other fields. Furthermore, computer science is a rapidly developing discipline, and there are various algorithms that emerge in the discipline, which ensures that we can collect enough algorithms to carry out our research. It should be noted that, in the traditional named-entity recognition task, named entities refer to nouns or noun phrases representing various entities ( Petasis, Cucchiarelli, & Velardi, 2000 ). Therefore, in this paper, the annotated algorithm refers to the noun or noun phrase representing the algorithm with a specific name, for example, the support vector machine , rather than the concept described by the author, for example, a novel classification algorithm.

2. Related works

The algorithm entity is a kind of knowledge entity. To be specific, the existing research classifies knowledge entities into various types, including research methods, theories, software, algorithms, datasets and so on. Scholars use the full text of academic papers to identify entities by manual annotation or machine learning methods and then analyze the influence of knowledge entities based on various indicators.

2.1. Evaluating knowledge entities based on frequency

At present, most papers analyze the influence of knowledge entities based on bibliometric indicators, which are usually the frequency of mentions, citations and uses of entities in academic papers ( Belter, 2014 ). Jarvelin and Vakkari (1990) pioneered the use of academic papers to collect research methods and establish a methodological framework. Blake (1994) turned his attention from journals to dissertations. By reading the contents of abstracts, he identified research methods often used in dissertations. Pan et al. (2015) adopted the method of bootstrapped learning to automatically extract software entities from academic articles and evaluated the software according to the frequency of citation and use. He, Lou and Li (2019) selected 14 science mapping tools and analyzed their influence by exploring the number of citations of articles that used these tools. On the basis of frequency, scholars also gave some other indicators to analyze the influence of knowledge entities from different aspects. Chu and ke (2017) annotated the research methods in academic papers of LIS (library and information science) and combined the frequency of mention with the time of publication to analyze the changes in the influence of research methods. Pan, Yan and Hua (2016) studied the influence of software in different disciplines, and the results show that software is more widely used in agriculture, health sciences, and biology than mathematics, information sciences, and social sciences. Zhao, Yan and Li (2018) manually annotated the datasets in PLoS One papers and analyzed the collection, storage, availability and other features of data sets in various disciplines based on frequency.

In addition to bibliometric indicators, some altmetric concepts are also used to evaluate the influence of entities. These indicators can be the frequency of votes, downloads, and visits of entities. In 2006, the organizers of ICDM (The IEEE International Conference on Data Mining) used the votes of experts to evaluate the influence of algorithms ( Wu et al., 2008 ). Stack Overflow evaluates the influence of IT technology and databases based on the votes of practitioners in the IT domain 1 . TIOBE considers the number of programmer votes, the number of courses and the number of vendors to calculate the popularity and influence of different programming languages 2 . Thelwall and Kousha (2016) posited that the download volume of open-source software can be used as an indicator of software value. Based on the concept, Priem and Piwowar initiated the open-source project Depsy 3 , which is dedicated to analyzing the influence of various software in the open-source community. The indicators include the frequency of downloads, citations in academic papers and so on. Subsequently, Zhao and Wei (2017) used Depsy to obtain the number of downloads and citations of some software in the Python community and analyzed the academic influence. Generally, different types of frequencies have their own advantages. How to integrate them to analyze the influence of knowledge entities is worth exploring in the future.

2.2. Evaluating knowledge entities based on text content

In addition to using frequency to measure influence, some works have also utilized text content to deeply explore the role, function and relationship of knowledge entities. Li, Rollins and Yan (2017) , Li, Yan and Feng (2017) took 19,478 papers that mentioned WoS (web of science) as the research object and analyzed the abstract content through Stanford CoreNLP. According to the verbs and language patterns in the context related to WoS, the results showed that the most important reason researchers mention WoS is that it is used as a source of data. The analysis of the relationship includes both the relationship between entities and the relationship between entities and articles. Li and Yan (2018) proposed a software recognition algorithm based on the software name dictionary. By extracting sentences that mentioned the R software package in papers published by PLOS, they analyzed the co-mention network of the software package and noted that packages with similar disciplines and functions were more likely to be mentioned at the same time. Similarly, Zhang, Ma and Zhang (2019) used the full text of academic papers published by PLOS One to cluster 260 kinds of software mentioned in the article. The results also showed that software with similar functions would be clustered together. Yang, Huang, Wang and Rousseau (2018) analyzed the relationship between article and software, and the results indicated that articles published by a journal with higher quality tended to use newer software and that international articles used new software earlier than Chinese articles.

Furthermore, some scholars have used the content of academic papers to identify the pattern of citation and use of knowledge entities. Yoon, Chung, Lee and Kim (2019) investigated how the Health Information National Trends Survey (HINTS) data were cited in academic literature. The results indicated that more than half of the articles cited HINTS-related documents rather than the data itself. Costa, Meirelles and Chavez (2018) took statistical analysis tools as the example and researched the sustainability of statistical analysis tools. The results demonstrated that many tools had short life cycles. Similar to datasets, software has a diverse citation pattern in academic articles. Pan, Yan, Cui and Hua (2019) studied the software that was actually used in academic papers rather than just mentioned in the LIS (library and information science) discipline, and they pointed out that the dependence of papers on software was increasing gradually and that the citation pattern of software was various and irregular. The work of Li, Rollins et al. (2017) , Li, Yan et al. (2017) indicated that the reason for the informal citation of software is that the citation standards of software were diversified and that the authors have not followed the specifications.

2.3. Application of extracted knowledge entities

Tuarob, Bhatia, Mitra and Giles (2013) , Tuarob, Mitra and Giles (2013) carried out a series of applications based on knowledge entities. They first used rule-based method to identify algorithms in academic papers, and built an algorithm search system ( Bhatia, Tuarob, Mitra, & Giles, 2011 ). At the same time, with the help of the full text, the co-citation network ( Tuarob, Mitra, & Giles, 2012 ), function ( Tuarob, Bhatia et al., 2013 ; Tuarob, Mitra et al., 2013 ) and efficiency ( Safder, Sarfraz, Hassan, Ali, & Tuarob, 2017 ) of algorithms were further investigated, which can be used to optimize the search system. With the development of technology, they combined the rule-based methods and machine-learning methods, and by identifying the pseudocode and process description in the full text of academic literature, algorithms mentioned in the article were extracted. After that, they built the novel algorithm search system, namely algorithmseer ( Tuarob, Bhatia, Mitra, & Giles, 2016 ), and gave the query results by calculating the similarity between the algorithm description and the search words. Citation function of algorithms was explored to improve the performance of the system ( Tuarob et al., 2020 ). In addition to the search system, Zha, Chen, Li and Yan (2019) utilized the deep learning method to extract algorithms from the tables in academic papers, and then constructed an algorithm roadmap to describe evolutionary relation between different algorithms. However, these studies have few limitations. They only target the algorithms with restrictions. For example, the algorithms proposed by the authors, the algorithms described in detail, or algorithms mentioned in tables. The algorithms without detailed description or appearing in other parts of the article were ignored.

In general, the existing work on extraction and evaluation of knowledge entities has received widespread attention. For the assessment of the influence of knowledge entities, frequency is still the main indicator. In addition to basic frequency-based impact assessments, some work has explored features such as functions, relationships, uses and citation patterns of knowledge entities. Among different knowledge entities, software gets the most attention because software has a clearer definition and is easier to identify. However, there are few research studies on algorithms. Even if studies have identified entities and built a retrieval system and roadmap, they only focused on the algorithms proposed in the article or the algorithms described in detail, and did not evaluate the influence of algorithms. To this end, this article attempts to identify more algorithm entities from papers and construct an algorithm dictionary by manual annotation. We intend to conduct preliminary statistical analysis of the algorithms mentioned in academic papers in specific fields and carry out more in-depth exploration based on the data in the future.

3. Methodology

As shown in Fig. 1 , using natural language processing (NLP) as a case, we identified and evaluated algorithms in the full-text content of academic articles in the NLP domain. For the full text of academic papers, we identified algorithms and compiled the dictionary manually, and then we extracted the algorithm sentences from articles through dictionary-matching and evaluated the influence of different algorithms. Different from the bootstrapping method used in other work, we used the method of manual annotation to identify the algorithm entities, which could ensure the accuracy of the recognition results.

Fig. 1

Framework of our work.

3.1. Data collection

We selected conference proceedings of the ACL (Annual Meeting of Association for Computational Linguistics) as the dataset. In the computer science field, outstanding research results are often published in conferences rather than journals, and therefore, conference papers are more suitable resources for exploring the algorithms in computer science. ACL is the top conference in the field of NLP, and it is believed that research methods and results in ACL conference papers are more representative. To this end, papers published in the ACL conference could be an excellent material for studying algorithms in NLP. We downloaded all of the ACL conference papers published between 1979 and 2015 from the ACL anthology reference corpus ( https://acl-arc.comp.nus.edu.sg/ ), and 4641 papers were available in both PDF and XML formats.

Fig. 2 shows the number of papers published each year in the ACL conference. It is clear that, before 2003, the number of papers accepted by the conference was less than 100 per year. The year 1998 was a special year, with two volumes of conference proceedings published. Since 2006, the number of ACL conference papers has increased significantly.

Fig. 2

The number of papers accepted by ACL conference each year.

3.2. Manual annotation of algorithm entities in academic papers

For the papers in PDF format, we randomly selected a sample of 100 papers from the entire dataset. Two annotators, a Ph.D. student and a master student, were invited to annotate the algorithm entities mentioned in the 100 papers. Both of the annotators were familiar with natural language processing. The labeling process included the following steps:

  • (i) The annotators read the title, abstract, introduction, method and result of the article and labeled algorithms mentioned in the content of these sections.
  • (ii) The annotators reviewed figures and tables in the article since the results of the used algorithms are usually presented in tables and figures.
  • (iii) The annotators used the word “algorithm” as a search term to perform automatic retrieval in the full text to find algorithms in other sections.
  • (iv) The annotators quickly browsed the full text to determine whether there were any missing algorithms.

In the process of tagging, the annotators used the following steps to determine whether a noun or noun phrase should be labeled as an algorithm:

  • (i) For a well-known algorithm, or if all of the authors of the ACL papers called the method an "algorithm", it was directly labeled as an "algorithm", for example, support vector machines.
  • (ii) For a noun that the author did not define in the paper but was suspected to be an algorithm, for example, the Gibbs sampling method, the annotators searched for it in external knowledge bases (Wikipedia, Google Scholar, other academic papers or monographs) and determined whether it was an algorithm according to the introduction.
  • (iii) For a noun for which the author did not provide a unified definition, further judgment was required. Taking the Hidden Markov Model as an example, some authors called it a "model," and some authors called it an "algorithm." After accessing other information and consulting experts, the annotators determined that in essence, it is still an "algorithm."

We combined the results independently labeled by the two annotators and then compiled an algorithm dictionary. After that, we used the method of dictionary-matching to match algorithms in the dictionary with the full-text content of the 100 articles. Thus, we could find algorithms that were ignored by the annotators in some articles but found in other articles. In the end, we knew which algorithms from the dictionary appeared in each article, and we regarded the result as the gold standard and compared it with the two annotators’ original annotations. Taking each article as a unit, we identified the algorithms that each annotator ignored in each article and calculated the missing rate. The missing rates of the two annotators were 13 % (Ph.D. student) and 14 % (Master student), and both of the coders missed 11 % of algorithms in the gold standard. To measure the interrater reliability (IRR) between the two labelers, we employed Cohen’s kappa coefficient ( Cohen, 1960 ) and achieved an IRR of 0.78. The result indicated the sufficient reliability of one labeler annotating all of the papers. Therefore, the Ph.D. student annotated all of the remaining papers.

The results were stored in tables. As shown in Table 1 , we recorded the ID (the unique identifier consisting of numbers and letters for each paper in the ACL anthology reference corpus), title of the paper and all of the algorithms appearing in the body text.

Example of algorithms extracted from articles.

3.3. Algorithm dictionary compilation

Since authors’ writing styles are different, algorithms may be mentioned with different names in each paper, including their full name, abbreviation and various aliases. For example, ‘support vector machine’ is also called ‘SVM’, ‘SVMs’, ‘support vector machines’, ‘support-vector machine’, or ‘support-vector machines’.

We summarized all of the names of an algorithm and compiled a dictionary of algorithms by removing duplicate words and manual classification. In the dictionary, each line represents an algorithm, and each line contains all of the names of the algorithms identified from ACL papers. Specifically, the dictionary includes 877 algorithms and 1840 different names; examples of algorithms in the dictionary are displayed in Table 2 .

Examples about different names of algorithms.

3.4. Algorithm sentences extraction

Consider that the labeler may omit articles containing the algorithms in the dictionary, which may lead to inaccurate results. For example, the labeler found the support vector machine algorithm in 9 articles, but in fact, there were 10 articles mentioning support vector machine ; we needed to find the article that was ignored by the labeler. To find all articles mentioning algorithms compiled in the dictionary, we matched the algorithm name with the full text of the academic paper in XML format. The algorithm sentences were also extracted; that is, as long as a name of an algorithm appeared in a sentence, we saved the ID of the article, the algorithm name and the algorithm sentence together. Through comparison with our manual annotation results, we could find the missing articles.

In addition, the algorithms have abbreviated names, and the same abbreviated name could represent different algorithm. For example, the BP algorithm can represent both the back-propagation algorithm and the belief propagation algorithm. There are also a few abbreviations that can represent both algorithms and other entities. For example, EM can represent the expectation maximization algorithm, or it may just represent the m- th entity. Therefore, we disambiguated all of the algorithm sentences extracted by abbreviations. Specifically, if the full name and abbreviations of an algorithm appeared in an article simultaneously, then we assumed that the abbreviation in the text represented this algorithm. For example, the following two sentences were extracted from the same paper ( Clark, 2002 ): “ This paper discusses the supervised learning of morphology using stochastic transducers, trained using the Expectation Maximization (EM) algorithm ” and “ We have presented some algorithms for the supervised learning of morphology using the EM algorithm applied to non-deterministic finite-state transducers. ” Since the first sentence gives the full name of the EM algorithm, we assumed that the EM in the second sentence indicated expectation maximization. If the full name did not appear in the article, we needed to make a further judgment based on the context of the algorithm sentence extracted by the abbreviation. In the final extraction result, the full name of all the abbreviations was added. Examples of matched sentences are shown in Table 3 .

Example of sentence matched by the algorithm.

3.5. Analyzing the impact of algorithms

  • (1) Indicator of influence

In this paper, the number of papers mentioning an algorithm was utilized as an indicator to analyze the influence of the algorithm. Scholars have used various bibliometric indicators to evaluate the influence of papers ( Cartes-Vellásquez & Manterola Delgado, 2014 ), authors ( Fu & Ho, 2013 ) or institutions ( Abramo, D’Angelo, & Costa, 2011 ), as well as knowledge entities. The number of mentions, citations, downloads, visits, and votes can be used as indicators to measure the impact ( Ding et al., 2013 ; Urquhart & Dunn, 2013 ; ( Howison, Deelman, McLennan, Ferreira da Silva, & Herbsleb, 2015 ) Pan et al., 2019 ). For the influence of the algorithm, the numbers of mentions ( Wang & Zhang, 2018 ) and votes ( Wu et al., 2008 ) were used to measure the influence in previous studies.

Compared with other indicators, the mention count is more suitable to measure the influence of an algorithm in academic papers. Because there are irregular citations to knowledge entities in academic papers ( Mooney & Hailey, 2011 ), if we use the number of citations to measure the influence of algorithms, algorithms that are mentioned but not formally cited will be ignored; the mention count solves the problem. Furthermore, the counting unit of mentions could be either a sentence or an article. Considering the different writing styles of different authors, some authors repeatedly mention the algorithm in many sentences, and some authors may only mention the algorithm a few times; we choose the article as the counting unit to eliminate the inconsistency caused by different writing styles.

For the above reasons, in this article, the indicator of influence is the “mention count,” and the counting unit is the “article.”

  • (2) Influence of different algorithms

Because each algorithm has various names in different articles, we summarized the ID of articles that mention the same algorithm based on the dictionary and then removed the duplicate data. We used the number of papers that mention algorithms to evaluate the influence of algorithms. However, the total number of articles published each year is inconsistent, and the year in which each algorithm first appeared in ACL conference papers is also different; these cause interference in the measurement of the algorithm's influence. Therefore, on the basis of the number of papers mentioning algorithms, we also considered the total number of articles per year and the duration of influence. We first calculated the annual influence of the algorithm based on the publication time of papers; the sum of the annual influences was divided by the influence time of the algorithm, and finally the influence of algorithm j was obtained. Influence ( j ) is given by:

Where i represents the year, ranging between 1979 and 2015, N i is the total number of publications in year i, and N i j is the number of publications mentioning algorithm j in year i . Therefore, N i j N i is the annual influence of algorithm j in year i. T j is the duration from the year when algorithm j first appeared in ACL conference papers to 2015.

  • (3) Different types of algorithms

We further explored the impact of different categories of algorithms in ACL conference papers.

In most cases, algorithms in NLP articles are utilized to solve problems. By classifying the algorithms according to their functions, we can know what types of algorithms are mentioned in NLP papers, and then calculate the average influence of different types of algorithms. In this way, we can infer what the algorithm is mainly used to do in the NLP domain, and then judge what the common problems need to be solved in the NLP domain. In addition, we can study whether the influence of different algorithms in the same category affect each other.

Regarding the categories of algorithms, we referred to the category framework in Wikipedia and various textbooks ( Christopher & Hinrich, 2000 ; Jennings & Wooldridge, 2012 ; Mitchell, 1997 ) and then proposed a preliminary classification framework. We invited experts in the field of NLP to optimize the framework. Finally, we divided the algorithms in the field of NLP into the following 14 types.

  • 1 Classification algorithm: a method that sorts data into labeled classes, or categories of information, on the basis of a training set of data containing observations whose category membership is known 4 , for example, support vector machine.
  • 2 Clustering algorithm: a method that groups a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups 5 , for example, K-means.
  • 3 Dimension reduction algorithm: a method that reduces the number of random variables under consideration by obtaining a set of principal variables 6 , for example, singular-value decomposition.
  • 4 Grammar: a method that is used to represent the set of structural rules governing the composition of clauses, phrases and words in a natural language 7 , for example, context-free grammar.
  • 5 Ensemble learning algorithm: a method that uses multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone 8 , for example, AdaBoost.
  • 6 Link analysis algorithm: a data-analysis technique used to evaluate relationships (connections) between nodes 9 , for example, Pagerank.
  • 7 Metric algorithm: an algorithm that defines the distance between each pair of elements of a set; they are usually utilized to evaluate the quality 10 , importance, and similarity of texts, words, vectors and so on, for example, the BLEU algorithm.
  • 8 Neural networks: a computing method vaguely inspired by the biological neural networks that constitute animal brains, which learns to perform tasks by considering examples, generally without being programmed with task-specific rules 11 , for example, convolutional neural networks.
  • 9 Optimization algorithm: a method that uses an iterative approach to make the result as close as possible to the optimal solution to the problem when the machine learning problem has no optimal solution or it is difficult to obtain the optimal solution 12 , for example, expectation maximization.
  • 10 Probabilistic graphical model: a probabilistic model for which a graph expresses the conditional dependence structure between random variables 13 , for example, the hidden markov model.
  • 11 Regression algorithm: a method for estimating the relationships between a dependent variable and one or more independent variables 14 , for example, logistic regression.
  • 12 Search algorithm: an algorithm that solves a search problem, namely, to retrieve information stored within some data structure or calculated in the search space of a problem domain, either with discrete or continuous values 15 , for example, the beam-search algorithm.
  • 13 Unique algorithm in the NLP domain: an algorithm only used to deal with natural language processing tasks. For example, the CKY algorithm .
  • 14 Other: an algorithm that cannot be divided into 13 categories mentioned above, for example, the smith-waterman algorithm.

Using this framework, we classify algorithms and explore the types of algorithms with high influence.

  • (4) Influence of algorithms in different years

According to the calculation results of each algorithm's annual influence in different years, we ranked the influence of the algorithm each year and discussed the differences among the algorithms with high influence.

  • (5) Evolution of the influence of different algorithms

Based on the influence of each algorithm ach year, we took each algorithm as the research object and analyzed the changes in the annual influence of each algorithm over time. After that, algorithms are classified according to different trends in evolution.

As described in Section 3.5 , we obtain the influence of algorithms in the NLP domain, influence of different types of algorithms, and evolution of the algorithms. The results are presented in this section.

4.1. Influence of different algorithms in ACL papers

We answer RQ1 in this section. We collected 4641 papers, of which 4043 papers mentioned algorithms, accounting for 87 % of all papers. The result shows that algorithm entities are widely studied in the NLP field and that they play an important role.

  • (1) Top-10 algorithms with the highest influence

The influence of different algorithms is calculated by the formula given in Section 3.5 (1). Due to the space limitation, this section only shows the top-10 algorithms with the highest influence. As displayed in Table 4 , most of these algorithms are classical and basic algorithms. The support vector machine (SVM) takes the first place with the absolute advantage in quantity. As described in The Top Ten Algorithms in Data Mining ( Wu et al., 2008 ), the SVM algorithm has a solid theoretical foundation and is one of the most stable and accurate algorithms among all well-known algorithms. What comes next is context-free grammar; this algorithm is powerful enough to express the syntax pattern of most programming languages. Besides, context-free grammar is simple enough for scholars to construct an effective analysis algorithm to determine whether a given string is generated by a context-free grammar ( Kadlec, 2008 ). The BLEU algorithm, namely, the bilingual evaluation replacement algorithm, ranks third in the list. BLEU is one of the metrics to claim a high correlation with human judgments of quality and remains one of the most popular automated and inexpensive metrics ( Papineni, Roukos, Ward, & Zhu, 2002 ). Maximum entropy comes next, which is a classical algorithm for solving classification problems.

The top-10 most influential algorithms in ACL conference papers.

Among these 10 algorithms, the most special is word2vec. As an emerging word vector representation method, it combines the advantages of high accuracy and low computational cost ( Mikolov, Chen, Corrado, & Dean, 2013 ). It is not a classic algorithm with a long history, but it has also gained high influence in ACL papers, showing its superior performance. In general, support vector machine has the highest influence in the field of NLP. Most of the high-impact algorithms are classic algorithms, but the influence of emerging algorithms cannot be ignored either.

  • (2) The type of high-impact algorithms

Because the task of identifying categories of all algorithms is time-consuming, we obtain the top-100 algorithms according to the influence and manually classify them according to the classification framework introduced in Section 3.5 . A professor who is familiar with natural language processing reviewed and improved the classification results. Then we counted the number of algorithms in each category and calculated the average influence of each category. The results are shown in Table 5 .

The type of high-impact algorithms.

Among the top-100 algorithms, the average influence of the classification algorithm ranks the first, and the number of algorithms in this category ranks the second, which means that the influence of classification algorithms is significantly higher than other categories. Classification is a basic task in the field of NLP, it can be regarded as a subtask in different research topics, and for instance, recognizing named entities with the help of classification algorithms and sequence labeling models. There are also separate classification tasks in NLP, including citation intent classification, text classification, emotion classification, etc. In contrast, although optimization algorithms make up the largest proportion in the top-100 list, their average influence only ranks the fourth. We can speculate that this is because optimization is often solely a part or a step of a NLP task, which makes the result be as close to the optimal solution as possible. Apart from those, the probability graph model demonstrates the feature of "small but excellent", and a small number of algorithms achieve high average influence. However, the proportions of link analysis algorithms and regression algorithms in the list of high-impact algorithms are relatively small.

4.2. Top-10 algorithms in different ages

We answer RQ2 in this section. Considering the space limitations, we provide a list of top-10 algorithms with high influence each year. As displayed in Appendix Table A1 , it could be easily found out that the differences between popular algorithms in different years. Generally, the high-impact algorithms in different generations reveal the following trends: the early high-impact algorithm was the syntax analysis algorithm, which later became the traditional machine learning algorithm. What is more, deep learning algorithms had the highest influence in the last two years. Dynamic demonstration of algorithm influence in NLP can be accessed at https://chengzhizhang.github.io/research/algorithm_entity/algorithm_influence.html .

Based on the Appendix Table A1 , we counted the number of algorithms in the top-10 list grouped by three types to explore the evolution of high-impact algorithms in different eras. As shown in Fig. 3 , the changes in the number of high-impact syntactic analysis algorithms and machine learning algorithms demonstrate completely opposite trends. To be specific, the first period of dramatic changes appeared between 1992 and 1994, during which the number of machine learning algorithms increased significantly, while the syntactic analysis algorithms showed the opposite. After that, in 1997, the number of high-impact machine learning algorithms surpassed syntactic analysis algorithms for the first time. The second period of dramatic changes was from 2001 to 2002, during which the gap between the numbers of the two became larger. Since 2013, the proportion of machine learning algorithms in the top-10 has begun to decline. On the contrary, the share of deep learning algorithms in the high-impact algorithm list was increased. According to the results listed in Fig. 3 , we identify three major epochs in the algorithm history in NLP: The syntax analysis period (1979–1996), traditional machine learning period (1997–2013), the deep learning period (2014 to present).

Fig. 3

The number of papers accepted by ACL conference each year.

Based on Appendix Table A1 , we selected a representative algorithm from each period, which includes context free grammar, support vector machine and neural network. Fig. 4 shows the changes in their rankings in the top ten each year. According to the ranking of these algorithms, we analyze the characteristics of algorithms each stage in detail in the following content.

Fig. 4

Ranking of representative algorithms in the top-10 each year.

In the period of syntactic analysis, more than half of the 10 algorithms were grammars, and the influence of context-free grammar (CFG) was particularly excellent. According to Appendix Table A1 and Fig. 4 , we found that during the 18 years from 1979 to 1996, CFG appeared annually and won the first place for nine years. Most of these influential algorithms are used for syntactic analysis, and other algorithms are related to syntactic analysis. In addition to the popular syntax analysis algorithm, such as Cocke-Younger-Kasami algorithm and the Earley algorithm, and the Hobbs algorithm is a coreference resolution algorithm based on the syntactic analysis, which indicated that scholars were focusing on linguistic research during this period.

In the period of machine learning, the important role of machine learning algorithms in ACL papers was revealed. Although the hidden markov model and the decision tree appeared in the top-10 list before, the rankings were relatively low. It was not until 1997 that the decision tree occupied the first place. Between 1997 and 2001, syntactic analysis algorithms and machine learning algorithms took up more than half of the position in turn, and CFG still often appeared in the first place. After 2001, an increasing number of machine learning algorithms entered the top-10 list, while the proportion of parsing algorithms began to decrease sharply. In 2006, support vector machines had the highest influence remaining the first until 2014. Because, at this stage, large-scale text processing has become the main target of natural language processing, and scholars were increasingly adopting automatic machine learning algorithms to acquire language knowledge.

In the period of deep learning, a growing number of deep learning algorithms appeared in the top-10 list and occupied an important position. However, the number of traditional machine learning algorithms in the high-impact list was decreased. In 2015, neural networks took the first place, and word2vec and Skip-gram also appeared in the top-10 list. Admittedly, traditional machine learning algorithms rely on large-scale manual corpus tagging and feature engineering. However, constructing effective labeled corpus and classification features are still characterized with being time-consuming and labor-intensive. Compared with the traditional algorithms, the deep learning algorithm reduces manual investment. On the one hand, it solves the problem of data sparseness caused by the high-dimensional vector space. On the other hand, the word vector contains more semantic information than the manually selected features. Therefore, the influence of deep learning algorithms will gradually exceed that of traditional machine learning algorithms. It is believed that this trend will become more obvious in the future.

4.3. The evolution of influence of different algorithms

We answer RQ3 in this section. For each algorithm, we have drawn trend graphs for the evolution of influence over time, and algorithms with similar trends of evolution were analyzed together. Three types of change trends are shown in Fig. 5 , Fig. 6 , Fig. 7 .

  • (1) Algorithms with rapidly growing influence

Fig. 5

Algorithms with rapidly growing influence.

Fig. 6

Algorithms with steadily growing influence.

Fig. 7

Algorithms with steadily declining influence.

The first type is called sharp growth. Nine algorithms that conform to this trend were selected. As shown in Fig. 5 , the influence of these algorithms increased rapidly in a certain year.

These algorithms can be subdivided into two subcategories. One subtype is the algorithm for which the influence in the early years was not high but began to grow rapidly after one year. In Fig. 5 , algorithms in line with this trend are back-propagation, Brown clustering, neural networks, etc. They are not newly proposed algorithms and were mentioned in early years. It is deemed that at a certain time, scholars found that these algorithms could be utilized to solve some new tasks, or broke the technical restrictions that had previously restricted their development.

Taking the neural network model as an example, in our results, the first peak of influence appeared in 1984, which then entered the stable period after the 1990s, but the influence began to grow rapidly after 2011. The neural network was born in the 1940s, and gained great popularity in the 1980s. However, after the 1990s, its development was less outstanding. On the one hand, the rise of statistical learning methods suppressed the development of neural networks. On the other hand, as the number of neural network layers increases, the difficulty of training the neural network increased geometrically, and the lack of computing resources once again limited the development of the model. Fortunately, in 2006, Hinton and Salakhutdinov (2006) solved the problem of how to set initial values ​​in neural network learning and proposed a method for quickly training deep neural networks. After that, the neural network model entered a prosperous period after 2010. According the results, the influence of the neural network model also entered a growth period after 2010 and began to grow strikingly in 2013. The main reason is that the training methods of neural network models have been improved. At the same time, with the development of science and technology, scholars obtained powerful computing resources to train the model.

Another subtype is algorithms that had no article mentioning them in the early years, but after their first mention, their influence showed a sharp increase. These algorithms include word2vec, AdaGrad and Skip-gram. Obviously, they are all newly proposed methods. Taking the AdaGrad algorithm as an example, as one of the most commonly used optimization algorithms in deep learning, it was put forward in 2010. It adjusts the learning rate of each dimension and avoids the problem in which the unified learning rate cannot adapt to all dimensions ( Duchi, Hazan, & Singer, 2011 ). In general, these algorithms are mostly adapted to solve deep learning tasks, so it is not difficult to speculate on the reason for rapid growth of influence; compared to traditional statistical machine learning algorithms, deep learning algorithms have achieved better performance in natural language processing in recent years.

  • (2) Algorithms with steadily growing influence

The second type is called stable growth. As displayed in Fig. 6 , for algorithms in this type, they have been mentioned for a long time, and from the first year when the algorithm was mentioned, their influence fluctuated over time, but the overall evolution shown an upward tendency. All algorithms can be called classic algorithms, and the results in Fig. 6 demonstrates that a classical algorithm can pass the test of time and prove its value. Regardless of changes over time and whether new algorithms are developed, the influence of classic algorithms maintains stable growth. They usually have features that cannot be replaced by new methods, which may be due to the fact that they are simpler to use or have lower cost, and undoubtedly, they are more likely to be compared with new methods as a baseline. In general, they give rise to more far-reaching effects.

Among the nine algorithms, the influence of support vector machines shows the highest growth, which once again proves our results in Section 4.1 . Support vector machine is indeed a very influential algorithm whose influence continues to increase. In general, influence of the nine algorithms appeared in the 1980s and began to show a clear growing tendency approximately around 2000. As the processing speed and storage capacity of computers have increased significantly since the mid-1990s, statistical machine learning algorithms have developed speedily since the 1990s. On the other hand, during the 1990s, due to the commercialization of the Internet and the development of network technology, the need for information retrieval and information extraction based on natural language became more prominent, and the influence of the algorithms used to process these two tasks, for example the Beam-search, naturally ascended.

  • (3) Algorithms with steadily declining influence

The third type is called stable decline. The influence of an algorithm in this type shows a downward trend over time. For algorithms in this type, it is speculated that, with the development of new algorithms, the performance or application of these algorithms no longer presents advantages, and scholars have more new choices for their own research.

As shown in Fig. 7 , there are two types of algorithms showing a declining trend of influence. The first type is the grammar whose influence before 1990 was significantly higher than that after 1990, and the influence became very low after 2000. In contrast, according to our previous analysis, the influence of machine learning algorithms was increasing during this period. Among these algorithms, the showpieces include the augmented transition network (transition network grammar) and the context-free grammar. The influence of the augmented transition network (ATN) has been declining since the first year (1979). Although the ATN ( Woods, 1970 ) has been widely used in the research of human-computer conversation and machine translation, the algorithm relies too much on grammar, which makes it unable to deal with sentences that go against the rules of grammar. When processing flexible objects, it cannot meet the needs. The context-free grammar represents an algorithm for which the influence increase first and then decrease. Because the context-free grammar can easily deduce the grammatical structure of a sentence, but the generated structure may be ambiguous.

The second type is traditional machine learning algorithms. Before 2002, their influence showed an upward trend. Between 2002 and 2005, they entered a relatively flat stage of development, and then began to decline over time. Based on the tendency, it is not difficult to judge that the increase in its influence is related to the popularity of machine learning algorithms in the NLP field, but the decline of influence is due to deficiencies of different algorithms. For example, the decision tree is simple to understand and to interpret, but it can be unstable because small variations in the data might result in a completely different tree 16 , and when there are more categories, errors may increase correspondingly. Naïve bayes can handle multi-classification tasks with higher stability, but it requires the independence of data set attributes. Otherwise, the classification performance is projected to be unsatisfactory.

In general, the influence of the above two algorithms decreases with time going by, but they show a clear cut-off point. Before 1996, grammar was more influential than machine learning algorithms. After 1996, the situation was just the opposite, which was also in line with the conclusion we reached in Section 4.2 . Therefore, no matter what it analyzes, the influence of syntactic analysis algorithms is gradually being replaced by machine learning algorithms.

According to the previous research results, although the comprehensive influence of the grammar is relatively high, its overall influence shows a downward trend. Currently, the construction of various corpora indicates that people pay more attention to the processing of large-scale real texts, which are difficult to process by using the traditional rule-based parsing technology. Taking information extraction as an example, when the amount of end-to-end data is sufficient to extract information directly, scholars are able to achieve the goal directly by using machine learning algorithms and no longer need the syntax analysis. Therefore, the influence of machine learning algorithms increased from 1990 to 2000, while the influence of grammars in the field of NLP began to decrease year by year.

  • (4) Time span from appearance to height

Concerning the 28 algorithms mentioned above, we further analyze the time from their emergence to the highest influence, which is called the rising span. Displayed in Table 6 , the rising span of algorithms shows different patterns. For algorithms with growing influence, most of them got the highest influence in 2015. The rising span of the algorithm with rapidly growing influence is mostly lower than that of the algorithm with steadily growing influence, the former is often less than five years, and the latter is almost more than 15 years. The longest can reach as long as 31 years. For algorithms with steadily decreasing influence, the rising span is almost 10 years. Comparing the two kinds of algorithms with stable influence, it can be found that the rising span of the influence reduction algorithm was shorter, although many of them appeared in ACL earlier, which means that, in the long run, people can predict its development trend in a shorter time.

Rising span of different algorithms.

Additionally, we find that different algorithms with the same function show progressive trend in influence changes. Taking the classification algorithm as an example, the classification algorithms in Table 6 include support vector machines, the decision tree and naïve bayes. The decision tree appeared and reached the peak of influence earliest, however, its influence began to decline significantly after 2002, while the influence of naive Bayes still maintained the good momentum of development, and surpassed the decision tree in 2005. In contrast, the support vector machine (SVM) appeared when influence of the two algorithms was close to the peak, and entered the popular period in 2003, which was the year when the influence of the former two algorithms began to decline. Similarly, the probability graph model shows the same trend. When the influence of hidden markov model (HMM) showed a downward trend, conditional random fields (CRF) just appeared the ACL papers, and began to enter a stable growth period. As classical probabilistic graph models, they actually play an important role in sequence annotation tasks. However, compared with HMM, CRF can provide more reasonable probability normalization results and global optimal solutions, so it is not difficult to explain why its influence shows the more obvious growth.

5. Discussion

5.1. algorithm evolution and domain evolution.

Cambria and White (2014) reviewed the development of the NLP field in his research. They pointed out that most NLP research carried out in the early days focused on syntax analysis and statistical NLP has been the mainstream NLP research direction since the late 1990s. According to our research, on the basis of the development of algorithms, research in the field of NLP has indeed shown this trend. In the early ACL papers (before 1996), algorithms that accounted for greater ratios were various grammars. On the one hand, in the field of computational linguistics, there may be more research studies on linguistics in the early stage, and grammar is indeed a necessary method. On the other hand, some scholars believe that syntactic processing is necessary in many NLP tasks. Therefore results of this article reveal that some grammars, e.g., context-free grammar, still plays an important role in the top-10 list after 2000. The so-called syntax-driven characteristic refers to the strategy adopted by researchers in their work to first solve the grammar, which makes machine learning techniques more directly applicable. After grammar was widely mentioned, in the late 1990s, statistical machine learning algorithms began to appear in the top-10 list and occupied a dominant position after 2005. In recent years, deep learning algorithms (e.g. neural networks) got higher impact. It should be noted that because we were only able to obtain the data before 2016, if the papers published after 2015 could be obtained, the tendency of deep learning algorithms to dominate would become more apparent. Comparing our preliminary results and experts’ reviews, we can find that changes in the influence of the algorithm can also reflect the development of the research field. NLP has witnessed the emergence of several subfields, from the early grammar-based approaches in the 1950 s–1970 s, to the statistical revolution in the 1990s, to the recent deep learning algorithms ( Jurgens, Kumar, Hoover, McFarland, & Jurafsky, 2018 ).

5.2. Reasons for changes in the influence of algorithms

We analyzed the different modes of changes in the influence of algorithms in Section 4.3 . It is obvious that the influence of some algorithms increases with time and even increases rapidly in a short time, while the influence of some algorithms gradually decreases or even disappears. We think that the reasons for the change in influence can be summarized by the following three points.

  • (1) The performance of the algorithm itself

For algorithms that have high influence and an overall upward trend of influence, they usually have some excellent characteristics, for example, a solid theoretical basis, wider range of application, high stability, and low resource consumption. Support vector machine is representative of such algorithms. Algorithms with reduced influence are often easily replaced by new algorithms because of poor performance.

  • (2) Development of technology

In the early days, due to the immaturity of technologies, some algorithms encountered a bottleneck period during the development, limiting their early influence. However, with the development of science and technology, the computing and storage capabilities of computers have been greatly improved, an increasing number of data resources are available to people, and the technical and resource problems that previously limited the performance of algorithms have been solved. Therefore, some algorithms are widely used again, and their influence has increased significantly.

  • (3) Changes in user demands

After 1990, in the field of NLP, scholars needed algorithms to automatically process larger-scale data. The development of the Internet has also greatly increased the demand for information mining and information retrieval algorithms. Traditional rule algorithms can no longer meet these needs. Thus, in our results, the influence of grammar is gradually replaced by the influence of machine learning algorithms. It is at this stage that statistical revolution takes place ( Jurafsky, 2015 ).

5.3. The top-ten data mining algorithms in the NLP domain

Wang and Zhang (2018) used the numbers of papers that mentioned algorithms to conduct a preliminary inquiry into the influence of the top ten data mining algorithms in the NLP field. The results are shown in Table 7 . As classic methods of data mining, the 10 algorithms are all famous, but their influence in the field of NLP is different. When we compare other algorithms with these ten algorithms, the advantages of data mining algorithms in the NLP field are not obvious. Although SVM and EM algorithms still appear in the top-10 algorithm list, the remaining algorithms are no longer data mining algorithms. In our results, among the 10 most influential algorithms in the ACL papers, in addition to data mining algorithms, there are also grammar and statistical learning algorithms. It can be speculated that classic data mining algorithms have their advantages, but not all classic data mining algorithms could get high influence in different fields. The influence of the algorithm in a particular domain will inevitably present its own characteristics.

The influence of top-10 date mining algorithms in NLP domain.

5.4. Differences between our method and other methods

In the existing work on algorithm influence measurement, the representative method is expert voting and the number of mentions. For the expert voting, in September 2006, the ICDM hosted a poll of classic algorithms in data mining. The sponsor invited various experts to recommend candidate algorithms and then evaluated the influence of candidate algorithms through the expert voting. However, there are some limitations. First, this method of selecting and evaluating algorithms relies on the experts’ personal experience and ideas and lacks detailed quantitative data to support the result. Second, organizing expert voting is time-consuming, and such events cannot be held frequently. Third, the results did not examine the influence of algorithms in a specific field and could not provide fundamental data for subsequent applications. Regarding number of mentions, we have introduced Wang’s work in Section 5.3 . Although their work also uses mention count to evaluate the influence of the algorithms, they only explored the influence of ten algorithms. In addition, they directly used the number of papers to evaluate the influence of the algorithms within a period without considering the impact of the difference in the total number of articles published each year on the numbers of articles that mentioned the algorithms. Different from the above research, this article first uses manual annotation to collect algorithm entities from academic papers. Compared with expert recommendations, we can obtain more objects. Subsequently, we use the number of articles to evaluate the impact of the algorithms and consider the total number of articles each year and the duration of influence, which is more convenient than inviting experts and ensuring the objectivity of the results.

6. Conclusion and future works

To explore the algorithms in the academic papers in a domain, this paper takes the NLP domain as an example and identifies algorithms from the full-text content of papers by manual annotation. Influence of algorithms is analyzed based on the number of articles. It should be noted that although this paper focuses on algorithms in the field of NLP, the methodology can be applied to identify algorithms in other fields or disciplines. Our results show that among different algorithms in the NLP domain, SVM has the highest influence due to its stability and accuracy. At the same time, for different types of algorithms, the influence of classification algorithms is significantly higher than that of other categories. The change in algorithm influence over time also reflects the development of the research field; in the early stage, the popular algorithms were syntax analysis algorithmsin linguistic research, and then they gradually became statistical learning algorithms. In recent years, deep learning algorithms begun to occupy the dominant position. Among the evolution of all kinds of influence, there are three obvious trends, namely, the steady growth of classical algorithms, the rapid growth of new algorithms, and the steady decline of algorithms.

The contribution of this article is threefold. First, for the first time, this paper identifies more algorithms mentioned in papers in a specific field, not only those with pseudocode or detailed descriptions ( Tuarob et al., 2016 ). Common algorithms are not omitted because many algorithms mentioned in papers are collected. Compared with the traditional bibliographic information and citation content, the full-text content provides more algorithms that are mentioned but not cited in academic papers. Second, we discuss the influence of algorithms from various perspectives. In addition to the commonly mentioned algorithms, we also analyze the evolution of algorithm influence. Third, this work provides more objective results regarding the influence of algorithms. This work collects more authors' opinions and directly identifies algorithms from their papers, and it evaluates influence based on the number of papers and duration of influence simultaneously.

As a preliminary study, this paper has some limitations. First, we annotate the algorithms in articles manually; although manual tagging can identify most of the algorithms in the article, we cannot guarantee that all of the algorithms are identified. We use the method of dictionary matching to ensure that all articles that mention the algorithm in the dictionary can be found, but if the algorithm is not included in the dictionary at the beginning, we cannot get the relevant sentences. However, we do not believe that the missing algorithms have a negative impact on our results, and the impact of missing annotations on our current results is not significant, because the algorithms with high influence; that is, the algorithms often mentioned in the papers would not be ignored by the annotator. Second, in this paper, the indicator of influence is only the number of articles mentioning algorithms. Other syntactic features, such as chapters, and semantic features, are not considered to measure the influence. Finally, we only collect the full text of conference papers in the ACL meeting to explore algorithms concerning NLP. There are other conference papers and journal papers in the field of natural language processing in the ACL anthology ( http://www.aclweb.org/anthology/ ), but they do not provide the full-text content in XML format. The ACL reference corpus provides a dataset in XML format concerning other conference papers, but it only gives articles published over a few years. As a result, although we can manually extract the algorithms from more articles in PDF format, we cannot retrieve them again in the full text in XML format to ensure that we can collect all of the articles mentioning each algorithm (work in Section 3.4 ).

In the future, we will collect more conference papers and journal papers in the field of NLP to ensure that the dataset can cover more NLP research and attempt to transform them into structured data that machines can process. Subsequently, we intend to utilize the results of this article as training data, which means that algorithms and algorithm sentences in this work will be used to train machine learning models, and an optimal model will be selected to carry out the automatic extraction of algorithm entities. Furthermore, we will apply the full-text content to explore other features of the algorithms, such as the location, function and relationship of the algorithms, and use a variety of features to conduct a comprehensive evaluation of the extracted algorithms. Finally, we intend to explore the specific tasks solved in the article using the algorithm so that we can not only understand the reason for the change of the algorithm's influence from the perspective of the task but also recommend classic and emerging algorithms according to the task. The results of the influence analysis of the algorithms can provide recommendations for scholars.

Author contributions

Yuzhuo Wang: Conceived and designed the analysis, Collected the data, Contributed data or analysis tools, Wrote the paper.

Chengzhi Zhang: Conceived and designed the analysis, Collected the data, Wrote the paper.

Acknowledgements

This study is supported by the National Natural Science Foundation of China (Grant No. 72074113), Science Fund for Creative Research Group of the National Natural Science Foundation of China (Grant No. 71921002) and Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No. KYCX19_0347).

1 https://insights.stackoverflow.com/survey/2020/?utm _source=social-share&utm_medium =social&utm _campaign =dev-survey-2020

2 https://blog.okfn.org/category/open-data-index/

3 http://depsy.org/

16 https://dirtysalt.github.io/sklearn.html

Appendix B Supplementary material related to this article can be found, in the online version, at doi: https://doi.org/10.1016/j.joi.2020.101091 .

4 https://deepai.org/machine-learning-glossary-and-terms/classifier

5 http://en.volupedia.org/wiki/Cluster_analysis#Algorithms

6 http://en.volupedia.org/wiki/ Dimension reduction

7 http://en.volupedia.org/wiki/Grammar

8 http://en.volupedia.org/wiki/Ensemble_learning

9 http://en.volupedia.org/wiki/Link_analysis

10 http://en.volupedia.org/wiki/Metric_(mathematics)

11 http://en.volupedia.org/wiki/Artificial_neural_network

12 http://en.volupedia.org/wiki/Mathematical_optimization#Optimization_algorithms

13 http://en.volupedia.org/wiki/Graphical_model

14 http://en.volupedia.org/wiki/Regression_analysis

15 http://en.volupedia.org/wiki/Search_algorithm

Appendix A. 

Top-10 algorithms with higher influence in each year.

Note : “*”means that there are less than 10 algorithms in this year.

Appendix B. Supplementary data

The following is Supplementary data to this article:

  • Abbott A. The “time machine” reconstructing ancient Venice’s social networks. Nature. 2017; 546 (7658):341–344. [ PubMed ] [ Google Scholar ]
  • Abramo G., D’Angelo C.A., Costa F.D. National research assessment exercises: A comparison of peer review and bibliometrics rankings. Scientometrics. 2011; 89 (3):929–941. [ Google Scholar ]
  • Belter C.W. Measuring the value of research data: A citation analysis of oceanographic data sets. PloS One. 2014; 9 (3) [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bhatia S., Tuarob S., Mitra P., Giles C.L. An algorithm search engine for software developers. Proceedings of the 3rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation; New York, USA; 2011. pp. 13–16. [ Google Scholar ]
  • Blake V.L.P. Since shaughnessy: research methods in library and information science dissertation, 1975–1989. Collection Management. 1994; 19 (1–2):1–42. [ Google Scholar ]
  • Bornmann L., Mutz R. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology. 2015; 66 (11):2215–2222. [ Google Scholar ]
  • Cambria E., White B. Jumping NLP curves: A review of natural language processing research. Computational Intelligence Magazine IEEE. 2014; 9 (2):48–57. [ Google Scholar ]
  • Carman S.H. The Pennsylvania State University; 2013. Algseer: An architecture for extraction, indexing and search of algorithms in scientific literature. [ Google Scholar ]
  • Cartes-Vellásquez R., Manterola Delgado C. Bibliometric analysis of articles published in ISI dental journals, 2007–2011. Scientometrics. 2014; 98 (3):2223–2233. [ Google Scholar ]
  • Christopher M., Hinrich S. MIT Press; 2000. Foundations of statistical natural language processing. [ Google Scholar ]
  • Chu H., ke Q. Research methods: What’s in the name? Library & Information Science Research. 2017; 39 :284–294. [ Google Scholar ]
  • Clark A. Memory-based learning of morphology with stochastic transducers. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics; Philadelphia, PA, USA; 2002. pp. 513–520. [ Google Scholar ]
  • Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement. 1960; 20 (1):37–46. [ Google Scholar ]
  • Cormen T.H., Leiserson C.E., Rivest R.L., Stein C. 3rd ed. The MIT Press; 2009. Introduction to algorithms. [ Google Scholar ]
  • Costa J., Meirelles P., Chavez C. On the sustainability of academic software: The case of static analysis tools. Proceedings of the Xxxii Brazilian Symposium on Software Engineering (BSSE); Sao Carlos, Brazil; 2018. pp. 202–207. [ Google Scholar ]
  • Dave G. 2020. Computer scientists are building algorithms to tackle COVID-19. https://onezero.medium.com/computer-scientists-are-building-algorithms-to-tackle-covid-19-f4ec40acdba0 [ Google Scholar ]
  • Ding Y., Song M., Han J., Yu Q., Yan E., Lin L. Entitymetrics: Measuring the impact of entities. PloS One. 2013; 8 (8) [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Duchi J., Hazan E., Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research. 2011; 12 (7):257–269. [ Google Scholar ]
  • Fu H.Z., Ho Y.S. Comparison of independent research of China’s top universities using bibliometric indicators. Scientometrics. 2013; 96 (1):259–276. [ Google Scholar ]
  • He J., Lou W., Li K. Vol. 56. 2019. How were science mapping tools applied? The application of science mapping tools in LIS and non‐LIS domains; pp. 404–408. (Proceedings of the Association for Information Science and Technology (ASIS&T 2019)). 1. [ Google Scholar ]
  • Hinton G., Salakhutdinov R. Reducing the dimensionality of data with neural networks. Science. 2006; 313 (5786):504–507. [ PubMed ] [ Google Scholar ]
  • Howison J., Bullard J. Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. Journal of the Association for Information Science and Technology. 2016; 67 (9):2137–2155. [ Google Scholar ]
  • Howison J., Deelman E., McLennan M.J., Ferreira da Silva R., Herbsleb J.D. Understanding the scientific software ecosystem and its impact: current and future measures. Research Evaluation. 2015; 24 (4):454–470. [ Google Scholar ]
  • Jarvelin K., Vakkari P. Content analysis of research articles in library and information science. Library & Information Science Research. 1990; 12 (4):395–421. [ Google Scholar ]
  • Jennings N.R., Wooldridge M.J. The MIT Press; 2012. Foundations of machine learning. [ Google Scholar ]
  • Jurafsky D. 2015. The language of food. https://www.slideshare.net/Idibon1/dan-jurafsky-the-language-of-food 2015. [ Google Scholar ]
  • Jurgens D., Kumar S., Hoover R., McFarland D., Jurafsky D. Measuring the evolution of a scientific field through citation frames. Transactions of the Association for Computational Linguistics. 2018; 6 :391–406. [ Google Scholar ]
  • Kadlec V. Masaryk University; 2008. Syntactic analysis of natural languages based on context free grammar backbone. [PhD Thesis Specification] [ Google Scholar ]
  • Li K., Yan E. Co-mention network of R packages: Scientific impact and clustering structure. Journal of Informetrics. 2018; 12 (1):87–100. [ Google Scholar ]
  • Li K., Rollins J., Yan E. Web of science use in published research and review papers 1997–2017: A selective, dynamic, cross-domain, content-based analysis. Scientometrics. 2017; 115 (1):1–20. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Li K., Yan E., Feng Y. How is R cited in research outputs? Structure, impacts, and citation standard. Journal of Informetrics. 2017; 11 (4):989–1002. [ Google Scholar ]
  • Ma S., Zhang C. Using full-text to evaluate impact of different software groups information. PRoceedings of the 16th International Conference on Scientometrics and Informetrics; ISSI 2017, Wuhan, China; 2017. pp. 1666–1667. [ Google Scholar ]
  • Mikolov T., Chen K., Corrado G., Dean J. 2013. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781 [ Google Scholar ]
  • Mitchell T.M. McGraw-hill science. 1997. Machine learning. [ Google Scholar ]
  • Mooney, Hailey Citing data sources in the social sciences: Do authors do it? Learned Publishing. 2011; 24 (2):99–108. [ Google Scholar ]
  • Pan X., Yan E., Cui M., Hua W. How important is software to library and information science research? A content analysis of full-text publications. Journal of Informetrics. 2019; 13 (1):397–406. [ Google Scholar ]
  • Pan X., Yan E., Hua W. Disciplinary differences of software use and impact in scientific literature. Scientometrics. 2016; 109 (3):1593–1610. [ Google Scholar ]
  • Pan X., Yan E., Wang Q., Hua W. Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers. Journal of Informetrics. 2015; 9 (4):860–871. [ Google Scholar ]
  • Papineni K., Roukos S., Ward T., Zhu W.J. BLEU: A method for automatic evaluation of machine translation. Proceeding of 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002); Grenoble, France; 2002. pp. 311–318. [ Google Scholar ]
  • Petasis G., Cucchiarelli A., Velardi P. Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; New York: ACM Press; 2000. pp. 128–135. [ Google Scholar ]
  • Safder I., Sarfraz J., Hassan S.-U., Ali M., Tuarob S. Detecting target text related to algorithmic efficiency in scholarly big data using recurrent convolutional neural network model. Proceedings of the 18th International Conference on Asian Digital Libraries (ICADL); Bangkok, Thailand; 2017. pp. 30–40. [ Google Scholar ]
  • Thelwall M., Kousha K. Academic software downloads from google code: Useful usage indicators? Information Research. 2016; 21 (1):n1. [ Google Scholar ]
  • Tuarob S., Tucker C. Quantifying product favorability and extracting notable product features using large scale social media data. Journal of Computing and Information Science in Engineering. 2015; 15 (3):1–13. [ Google Scholar ]
  • Tuarob S., Bhatia S., Mitra P., Giles C.L. Algorithmseer: A system for extracting and searching for algorithms in scholarly big data. IEEE Transactions on Big Data. 2016; 2 (1):3–17. [ Google Scholar ]
  • Tuarob S., Kang S.W., Wettayakorn P., Pornprasit C., Sachati T., Hassan S.-U. Automatic classification of algorithm citation functions in scientific literature. IEEE Transactions on Knowledge and Data Engineering. 2020:1–16. Early Access. [ Google Scholar ]
  • Tuarob S., Mitra P., Giles C.L. Improving algorithm search using the algorithm co-citation network. Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries; Washington, DC USA; 2012. pp. 277–280. [ Google Scholar ]
  • Tuarob S., Bhatia S., Mitra P., Giles C.L. Automatic detection of pseudocodes in scholarly documents using machine learning. Proceedings of 12th International Conference on Document Analysis and Recognition; Washington, DC, USA; 2013. pp. 738–742. [ Google Scholar ]
  • Tuarob S., Mitra P., Giles C.L. A classification scheme for algorithm citation function in scholarly works. Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries; Indiana, USA; 2013. pp. 367–368. [ Google Scholar ]
  • Urquhart C., Dunn S. A bibliometric approach demonstrates the impact of a social care data set on research and policy. Health Information and Libraries Journal. 2013; 30 (4):294–302. [ PubMed ] [ Google Scholar ]
  • Wang Y., Zhang C. Using full-text of research articles to analyze academic impact of algorithms. Proceedings of the 13th International Conference on Information; Sheffield, UK; 2018. pp. 395–401. [ Google Scholar ]
  • Woods W.A. Transition network grammars for natural language analysis. Communications of the ACM. 1970; 13 (10):591–606. [ Google Scholar ]
  • Wu X., Kumar V., Quinlan J.R., Ghosh J., Yang Q., Motoda H. Top 10 algorithms in data mining. Knowledge and Information Systems. 2008; 14 (1):1–37. [ Google Scholar ]
  • Yang B., Huang S., Wang X., Rousseau R. How important is scientific software in bioinformatics research? A comparative study between international and Chinese research communities. Journal of the Association for Information Science and Technology. 2018; 69 (9):1122–1133. [ Google Scholar ]
  • Yoon J., Chung E., Lee J.Y., Kim J. How research data is cited in scholarly literature: A case study of HINTS. Learned Publishing. 2019; 32 :199–206. [ Google Scholar ]
  • Zha H., Chen W., Li K., Yan X. Mining algorithm roadmap in scientific publications. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; Anchorage, Alaska USA; 2019. pp. 1083–1092. [ Google Scholar ]
  • Zhang H., Ma S., Zhang C. Using full-text of academic articles to find software clusters. Proceedings of the 17th International Conference on Scientometrics and Informetrics (ISSI 2019); Rome, Italy; 2019. pp. 2776–2777. [ Google Scholar ]
  • Zhao R., Wei M. Impact evaluation of open source software: An Altmetrics perspective. Scientometrics. 2017; 110 (2):1–17. [ Google Scholar ]
  • Zhao M., Yan E., Li K. Data set mentions and citations: A content analysis of full-text publications. Journal of the Association for Information Science and Technology. 2018; 69 (1):32–46. [ Google Scholar ]

Heterogeneous Information Network enhanced Academic Paper Recommendation

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Book cover

CCF International Conference on Natural Language Processing and Chinese Computing

NLPCC 2022: Natural Language Processing and Chinese Computing pp 669–681 Cite as

Automatic Academic Paper Rating Based on Modularized Hierarchical Attention Network

  • Kai Kang   ORCID: orcid.org/0000-0002-8156-7863 11 ,
  • Huaping Zhang   ORCID: orcid.org/0000-0002-0137-4069 11 ,
  • Yugang Li   ORCID: orcid.org/0000-0002-0442-7146 11 ,
  • Xi Luo   ORCID: orcid.org/0000-0001-6182-1425 12 &
  • Silamu Wushour   ORCID: orcid.org/0000-0003-4592-7806 13  
  • Conference paper
  • First Online: 24 September 2022

2351 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13551))

Automatic academic paper rating (AAPR) remains a difficult but useful task to automatically predict whether to accept or reject a paper. Having found more task-specific structure features of academic papers, we present a modularized hierarchical attention network (MHAN) to predict paper quality. MHAN uses a three-level hierarchical attention network to shorten the sequence for each level. In the network, the modularized parameter distinguishes the semantics of functional chapters. And a label-smoothing mechanism is used as a loss function to avoid inappropriate labeling. Compared with MHCNN and plain HAN on an AAPR dataset, MHAN achieves a state-of-the-art accuracy of 65.33%. Ablation experiments show that the proposed methods are effective.

  • Automatic academic paper rating
  • Modularized
  • Hierarchical

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Models can be found at https://huggingface.co/prajjwal1/bert-medium. .

Kang, D., et al.: A dataset of peer reviews (PeerRead): collection, insights and NLP applications. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1647–1661 (2018)

Google Scholar  

Yang, P., Sun, X., Li, W., Ma, S.: Automatic academic paper rating based on modularized hierarchical convolutional neural network. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 496–502 (2018)

Qiao, Feng, Xu, Lizhen, Han, Xiaowei: Modularized and attention-based recurrent convolutional neural network for automatic academic paper aspect scoring. In: Meng, Xiaofeng, Li, Ruixuan, Wang, Kanliang, Niu, Baoning, Wang, Xin, Zhao, Gansen (eds.) WISA 2018. LNCS, vol. 11242, pp. 68–76. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02934-0_7

Chapter   Google Scholar  

Leng, Y., Yu, L., Xiong, J.: DeepReviewer: collaborative grammar and innovation neural network for automatic paper review. In: 2019 International Conference on Multimodal Interaction, pp. 395–403 (2019)

Skorikov, M., Momen, S.: Machine learning approach to predicting the acceptance of academic papers. In: 2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT). IEEE (2020)

Vincent-Lamarre, P., Larivière, V.: Textual analysis of artificial intelligence manuscripts reveals features associated with peer review outcome. Quant. Sci. Stud. 2 (2), 662–677 (2021)

Article   Google Scholar  

Shen, A., Salehi, B., Baldwin, T., Qi, J.: A joint model for multimodal document quality assessment. In: 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 107–110. IEEE (2019)

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9 (8), 1735–1780 (1997)

Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

Kenton, J.D.M.W.C., Toutanova, L.K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)

Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)

Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)

Zaheer, M., et al.: Big bird: transformers for longer sequences. Adv. Neural Inf. Process. Syst. 33 , 17283–17297 (2020)

Langford, J., Guzdial, M.: The arbitrariness of reviews, and advice for school administrators. Commun. ACM 58 (4), 12–13 (2015)

Lin, J., Song, J., Zhou, Z., Shi, X.: Automated scholarly paper review: possibility and challenges. arXiv preprint arXiv:2111.07533 (2021)

Shibayama, S., Yin, D., Matsumoto, K.: Measuring novelty in science with word embedding. PLoS ONE 16 (7), e0254034 (2021)

Daudaravicius, V.: Automated evaluation of scientific writing: AESW shared task proposal. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 56–63 (2015)

Springstein, M., Müller-Budack, E., Ewerth, R.: QuTI! quantifying text-image consistency in multimodal documents. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2575–2579 (2021)

Hou, Y., Jochim, C., Gleize, M., Bonin, F., Ganguly, D.: TDMSci: a specialized corpus for scientific literature entity tagging of tasks datasets and metrics. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 707–714 (2021)

Gupta, Y., et al.: The effect of pretraining on extractive summarization for scientific documents. In: Proceedings of the Second Workshop on Scholarly Document Processing, pp. 73–82 (2021)

Wang, Q., Zeng, Q., Huang, L., Knight, K., Ji, H., Rajani, N.F.: ReviewRobot: explainable paper review generation based on knowledge synthesis. In: Proceedings of the 13th International Conference on Natural Language Generation, pp. 384–397 (2020)

Yuan, W., Liu, P., Neubig, G.: Can we automate scientific reviewing? arXiv preprint arXiv:2102.00176 (2021)

de Buy Wenniger, G.M., van Dongen, T., Aedmaa, E., Kruitbosch, H.T., Valentijn, E.A., Schomaker, L.: Structure-tags improve text classification for scholarly document quality prediction. In: Proceedings of the First Workshop on Scholarly Document Processing, pp. 158–167 (2020)

Huang, J.B.: Deep paper gestalt. arXiv preprint arXiv:1812.08775 (2018)

Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (2014)

Download references

Acknowledgments

This work is partly supported by the Beijing Natural Science Foundation (No. 4212026) and the Fundamental Strengthening Program Technology Field Fund (No. 2021-JCJQ-JJ-0059).

Author information

Authors and affiliations.

Beijing Institute of Technology, Beijing, 100081, China

Kai Kang, Huaping Zhang & Yugang Li

Beijing Union University, Beijing, 100101, China

Xinjiang University, Xinjiang, 830046, China

Silamu Wushour

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Huaping Zhang .

Editor information

Editors and affiliations.

Singapore University of Technology and Design, Singapore, Singapore

Nanjing University, Nanjing, China

Shujian Huang

Soochow University, Suzhou, China

Soochow University, Soochow, China

Xiabing Zhou

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Kang, K., Zhang, H., Li, Y., Luo, X., Wushour, S. (2022). Automatic Academic Paper Rating Based on Modularized Hierarchical Attention Network. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_52

Download citation

DOI : https://doi.org/10.1007/978-3-031-17120-8_52

Published : 24 September 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-17119-2

Online ISBN : 978-3-031-17120-8

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF)

  • Find a journal
  • Track your research

April 10, 2024 — 17:30

Come read a paper with ACM and learn about CS research here at UMN! Never read an academic paper before? This is a great place to start!

We will be reading Multi-Touch Querying on Data Physicalizations in Immersive AR !

What do I need to do beforehand?

Nothing, just an inquisitive spirit!

Where is this?

Akerman Hall 327

A Solar Eclipse Means Big Science

By Katrina Miller April 1, 2024

  • Share full article

Katrina Miller

On April 8, cameras all over North America will make a “megamovie” of the sun’s corona, like this one from the 2017 eclipse. The time lapse will help scientists track the behavior of jets and plumes on the sun’s surface.

There’s more science happening along the path of totality →

An app named SunSketcher will help the public take pictures of the eclipse with their phones.

Scientists will use these images to study deviations in the shape of the solar surface , which will help them understand the sun’s churning behavior below.

The sun right now is approaching peak activity. More than 40 telescope stations along the eclipse’s path will record totality.

By comparing these videos to what was captured in 2017 — when the sun was at a lull — researchers can learn how the sun’s magnetism drives the solar wind, or particles that stream through the solar system.

Students will launch giant balloons equipped with cameras and sensors along the eclipse’s path.

Their measurements may improve weather forecasting , and also produce a bird’s eye view of the moon’s shadow moving across the Earth.

Ham radio operators will send signals to each other across the path of totality to study how the density of electrons in Earth’s upper atmosphere changes .

This can help quantify how space weather produced by the sun disrupts radar communication systems.

(Animation by Dr. Joseph Huba, Syntek Technologies; HamSCI Project, Dr. Nathaniel Frissell, the University of Scranton, NSF and NASA.)

NASA is also studying Earth’s atmosphere, but far from the path of totality.

In Virginia, the agency will launch rockets during the eclipse to measure how local drops in sunlight cause ripple effects hundreds of miles away . The data will clarify how eclipses and other solar events affect satellite communications, including GPS.

Biologists in San Antonio plan to stash recording devices in beehives to study how bees orient themselves using sunlight , and how the insects respond to the sudden atmospheric changes during a total eclipse.

Two researchers in southern Illinois will analyze social media posts to understand tourism patterns in remote towns , including when visitors arrive, where they come from and what they do during their visits.

Results can help bolster infrastructure to support large events in rural areas.

Read more about the eclipse:

The sun flares at the edge of the moon during a total eclipse.

Advertisement

IMAGES

  1. (A) Paper network, or information-flow network, linking the papers with

    academic paper network

  2. Academic Paper

    academic paper network

  3. 8+ Academic Paper Templates

    academic paper network

  4. Connected Papers

    academic paper network

  5. An example of a simple academic network including universities

    academic paper network

  6. The Academic Network of a single researcher.

    academic paper network

VIDEO

  1. What is an Academic Paper?

  2. How To Read an Academic Paper

  3. AP Research: Understanding the Academic Paper Rubric

  4. AP Research: Reflecting Upon, Completing, and Submitting Your Academic Paper

  5. How to Write an Effective Research Paper

  6. Citation and Referencing for beginners

COMMENTS

  1. Connected Papers

    Get a visual overview of a new academic field. Enter a typical paper and we'll build you a graph of similar papers in the field. Explore and build more graphs for interesting papers that you find - soon you'll have a real, visual understanding of the trends, popular works and dynamics of the field you're interested in.

  2. 15 Best Academic Networking and Collaboration Platforms 2024

    Best Academic Networking and Collaboration Platforms. #1. Academia.edu - Best for Sharing Research Papers. #2. Google Scholar - Best for Broad Literature Search and Citation Tracking. #3. LinkedIn - Best for Professional Networking Across All Fields. #4. Mendeley - Best for Reference Management and Discovery of New Research.

  3. Home :: SSRN

    Social Sciences. SOCIAL SCIENCES are those disciplines that study (a) institutions and functioning of human society and the interpersonal relationships of individuals as members of society; (b) a particular phase or aspect of human society.

  4. Google Scholar

    Google Scholar provides a simple way to broadly search for scholarly literature. Search across a wide variety of disciplines and sources: articles, theses, books, abstracts and court opinions.

  5. Academia.edu

    Share your work, track your impact, and grow your audience. Get notified when other academics mention you or cite your papers. Track your impact with in-depth analytics and network with members of your field. Mentions and Citations Tracking. Advanced Analytics.

  6. CORE

    The world's largest collection of open access research papers. ... CORE has significantly assisted the academic institutions participating in our global network with their key mission, which is their scientific content exposure. In addition, CORE has helped our content administrators to showcase the real benefits of repositories via its added ...

  7. ResearchGate

    Access 160+ million publications and connect with 25+ million researchers. Join for free and gain visibility by uploading your research.

  8. Networking and Collaborating in Academia: Increasing Your ...

    Networking is an essential element of an academic career, though it can spark a multitude of reactions ranging from delight to distress. For early career researchers, having a strong network of collaborators can be invaluable in terms of supporting you through the difficult times of academia, as well as helping you to enjoy the good moments.

  9. AMiner: Search and Mining of Academic Social Networks

    Abstract. AMiner is a novel online academic search and mining system, and it aims to provide a systematic modeling approach to help researchers and scientists gain a deeper understanding of the large and heterogeneous networks formed by authors, papers, conferences, journals and organizations. The system is subsequently able to extract researchers' profiles automatically from the Web and ...

  10. Academic Networks

    Each academic has a professional network. Some build a deep network of people mostly in the same field, which can unwittingly limit potential interdisciplinary opportunities. Some build widely diverse networks using carefully selected individuals who can optimize their chances for interdisciplinary research. Others have made it a numbers game ...

  11. JSTOR Home

    Harness the power of visual materials—explore more than 3 million images now on JSTOR. Enhance your scholarly research with underground newspapers, magazines, and journals. Explore collections in the arts, sciences, and literature from the world's leading museums, archives, and scholars. JSTOR is a digital library of academic journals ...

  12. Is there a tool to visualize the academic citation network around a

    I have found Connected Papers very useful, particularly for machine learning. You type in your paper and it shows you a network of related papers where node size indicates number of citations and color indicates recency. Here's an example for the paper "Attention is All you Need."

  13. Academic networks and career trajectory: 'There's no career in academia

    This study surveyed more than 100 working academics and found that most participated in some form of academic networking. This article's significance comes from exploring the lived experiences that have been identified by academics engaging in active network building.

  14. Computer Networks

    The International Journal of Computer and Telecommunications Networking Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors.

  15. CitNetExplorer

    CitNetExplorer is a software tool for visualizing and analyzing citation networks of scientific publications. The tool allows citation networks to be imported directly from the Web of Science database. Citation networks can be explored interactively, for instance by drilling down into a network and by identifying clusters of closely related ...

  16. Predicting High Impact Academic Papers Using Citation Network Features

    Predicting future high impact academic papers is of benefit to a range of stakeholders, including governments, universities, academics, and investors. Being able to predict 'the next big thing' allows the allocation of resources to fields where these rapid developments are occurring. This paper develops a new method for predicting a paper ...

  17. Academic Paper Recommendation Method Combining Heterogeneous Network

    At the same time, academic social networks usually contain rich and interrelated academic information features, which can be used as a heterogeneous network containing multiple entity types and relationship types , from which link relationships of scholars and papers can be more conveniently obtained. In this paper, we propose an academic paper ...

  18. Using the full-text content of academic articles to identify and

    Academic papers in many disciplines, especially in the computer science domain, propose, improve, and use various algorithms (Tuarob & Tucker, 2015). However, not everyone is an algorithm expert. ... Taking the neural network model as an example, in our results, the first peak of influence appeared in 1984, which then entered the stable period ...

  19. Academic Paper Recommendation Method Combining Heterogeneous Network

    The method HNTA for academic paper recommendation based on the combination of heterogeneous network and temporal attributes, which can not only comprehensively utilize both relationships of scholars and the content information of papers, but also it considers the impact of the temporal weight of scholars' research interests. In the case of information overload of academic papers, the demand ...

  20. PDF The Structure of an Academic Paper

    Not all academic papers include a roadmap, but many do. Usually following the thesis, a roadmap is a narrative table of contents that summarizes the flow of the rest of the paper. Below, see an example roadmap in which Cuevas (2019) succinctly outlines her argument. You may also see roadmaps that list

  21. Heterogeneous Information Network enhanced Academic Paper

    Abstract: Academic paper recommender (APR) systems that assist researchers in solving the information overload problem have attracted lots of attention. Recently, many works have been done to improve APR with heterogeneous information network (HIN). However, these works plainly depend on graph embedding to generate recommendations and achieve unsatisfactory performance due to the neglect of ...

  22. Automatic Academic Paper Rating Based on Modularized ...

    Automatic academic paper rating (AAPR) remains a difficult but useful task to automatically predict whether to accept or reject a paper. Having found more task-specific structure features of academic papers, we present a modularized hierarchical attention network (MHAN) to predict paper quality. MHAN uses a three-level hierarchical attention ...

  23. Paper Pals

    Paper Pals April 10, 2024 — 17:30 (Posted on 2024-04-10) Come read a paper with ACM and learn about CS research here at UMN!

  24. April 8 Total Solar Eclipse Means Big Science

    A Solar Eclipse Means Big Science. On April 8, cameras all over North America will make a "megamovie" of the sun's corona, like this one from the 2017 eclipse. The time lapse will help ...