VIVO Talks!
Development of an automatic text-based (inter)disciplinary classification of projects and publications using NLP
Overview
The approach of analyzing and classifying research projects and outputs using texts available on websites as well as the collection of archived research papers can play an important role in structuring and presenting research information. Natural language processing (NLP), a branch of AI, has the potential of significantly reducing the effort and time with the growing need of documenting research information. We have been working on the subject of classifying research papers and projects within the frame work of the BUA-VIVO-Project.
Florian Kotschka (DevOps) and Rolf Guescini (IT-Development) of the VIVO Team at the Computer and media service (CMS) presented and discussed the achieved results with the community on October 10th, 2023 from 10-11 am CEST.
“VIVO Talks!” event series is hosted by the “VIVO Research Information Platform (German: FIP)” CMS-Project Team within the frmaework of the Berlin University Alliance. The presenation and the video of this talk are available online.
👉 Presentation and Video
About this Talk
Research information is metadata about projects, datasets, outputs, or other research entities. Research information can be used to categorize them, facilitate the search process, improve the quality of search results, and suggest additional relevant results. Machine learning technologies make it possible to analyze text on web pages or in documents and automatically extract, structure and classify the research information. Natural Language Processing (NLP) is used for this purpose, a machine learning technology that provides algorithms for processing and analyzing large amounts of language. This enables a computer to understand content and contexts in a similar way to a human.
As part of the CMS-Project "FIP with VIVO" project, an approach is being developed to automatically extract and categorize research information from web pages or from research outputs. For this purpose, texts are automatically scanned and analyzed using text extraction to extract entities such as names and organizations, specific information such as research foci and areas, or predefined keywords. This specific task of using AI for text classification is carried out with the support of the Kairntech Team as external service provider.
In “VIVO Talks!” Florian Kotschka and Rolf Guescini are going to point out how the approach works. The presentation from this session will be available online.
About “VIVO Talks!”
In the process of developing a VIVO-based research information platform for the Berlin University Alliance (BUA), we value regular discussions with our stakeholders across the four BUA organizations and beyond. In the online event series “VIVO Talks!” we bring together all interested parties to address key questions around the development of the platform. After an insightful presentation on a specific topic, we turn to the audience for further discussions in the fields of research information, research infrastructure, research management and many more.
About VIVO Research Information Platform
At CMS, a four-member project team is currently developing a federated platform for the Berlin University Alliance to present information of researchers, their research and activities within the alliance. Using the open source software VIVO, semantic web technologies allow to present people and their work in a structured and searchable way. In the current development phase (until end of 2023), research information of cross-institutional collaborations within BUA, called Clusters of Excellence, are integrated into a demonstrator to illustrate the potential of the platform, especially for the presentation of interdisciplinary research.
This project is funded by the Federal Ministry of Education and Research (BMBF) and the state of Berlin within the framework of the Excellence Strategy of the Federal Government and the Länder.
Contact
Fadwa Alshawaf, Project Lead VIVO Platform
Email: fadwa.alshawaf@hu-berlin.de