Information Extraction

Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers

Our paper “Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers” got recently accepted and will be presented at Joint Conference on Digital Libraries 2018. Abstract: Bibliographic reference parsing refers to extracting machine-readable metadata, such as the names of the authors, the title, or journal name, from bibliographic reference strings. Many approaches to this problem have been proposed so far, including regular expressions, knowledge bases and supervised machine learning. Many open source reference parsers based on various algorithms are also available. In this paper, we apply, evaluate and compare ten reference parsing tools in a specific business use case. The tools are Anystyle-Parser, Biblio, CERMINE, Citation, Citation-Parser, GROBID, ParsCit, PDFSSA4MET, Reference Tagger Read more…

By Dominika Tkaczyk, ago
Information Extraction

Who Did What? Identifying Author Contributions in Biomedical Publications using Naïve Bayes

Our paper “Who Did What? Identifying Author Contributions in Biomedical Publications using Naïve Bayes” got recently accepted and will be presented at Joint Conference on Digital Libraries 2018. Abstract: Creating scientific publications is a complex process. It is composed of a number of different activities, such as designing the experiments, analyzing the data, and writing the manuscript. Information about the contributions of individual authors of a paper is important for assessing authors’ scientific achievements. Some biomedical publications contain a short section written in natural language, which describes the roles each author played in the process of preparing the article. In this paper, we present a study of authors’ roles commonly appearing in these sections, and propose an algorithm for automatic Read more…

By Dominika Tkaczyk, ago
Docear

Howto: Import references from webpages (e.g. PubMed, IEEE, ACM, …)

Compared to several other reference managers, Docear lacks a feature to directly import references from the Web. For instance, if you visit the detail page of a research article on a publisher's website, you might wish to directly import the bibliographic data of that article to Docear. Many publishers offer export options for reference managers such as Endnote, RefWorks, or Zotero. So, how do you do it with Docear? Fortunately, Docear uses the BibTeX format to store references. BibTeX is a de-facto standard for references that is supported by almost any publisher and any reference manager. So, read on to learn how to import bibliographic data from web-pages in two steps! (more…)

By Joeran Beel, ago
Information Extraction

Metadata retrieval and recommendations deactivated due to heavy server load

We are experiencing a very high server load due to several reasons (many people are using our services, we are doing some extensive research analyses, etc.). Therefore we decided to deactivate the metadata retrieval and recommendations for a while, hopefully only a few days. We will let you know as soon as the services are available again.   UPDATE (April 15th): Service is online again!

By Joeran Beel, ago