Recommender Systems

Mr. DLib’s Living Lab for Scholarly Recommendations (preprint)

We published a manuscript on arXiv about the first living lab for scholarly recommender systems. This lab allows recommender-system researchers to conduct online evaluations of their novel algorithms for scholarly recommendations, i.e., research papers, citations, conferences, research grants etc. Recommendations are delivered through the living lab´s API in platforms such as reference management software and digital libraries. The living lab is built on top of the recommender system as-a-service Mr. DLib. Current partners are the reference management software JabRef and the CORE research team. We present the architecture of Mr. DLib’s living lab as well as usage statistics on the first ten months of operating it. During this time, 970,517 recommendations were delivered with a mean click-through rate of 0.22%. Read more…

By Joeran Beel, ago
Recommender Systems

RARD II: The 2nd Related-Article Recommendation Dataset (preprint)

We released a new version of RARD, i.e. RARD II and describe the new release in a preprint published on arXiv. The dataset is available at http://data.mr-dlib.org and the new manuscript is available on arXiv and here in our Blog. The main contribution of this paper is to introduce and describe a new recommender-systems dataset (RARD II). It is based on data from a recommender-system in the digital library and reference management software domain. As such, it complements datasets from other domains such as books, movies, and music. The RARD II dataset encompasses 89m recommendations, covering an item-space of 24m unique items. RARD II provides a range of rich recommendation data, beyond conventional ratings. For example, in addition to the usual rating Read more…

By Joeran Beel, ago
Information Extraction

Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers

Our paper “Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers” got recently accepted and will be presented at Joint Conference on Digital Libraries 2018. Abstract: Bibliographic reference parsing refers to extracting machine-readable metadata, such as the names of the authors, the title, or journal name, from bibliographic reference strings. Many approaches to this problem have been proposed so far, including regular expressions, knowledge bases and supervised machine learning. Many open source reference parsers based on various algorithms are also available. In this paper, we apply, evaluate and compare ten reference parsing tools in a specific business use case. The tools are Anystyle-Parser, Biblio, CERMINE, Citation, Citation-Parser, GROBID, ParsCit, PDFSSA4MET, Reference Tagger Read more…

By Dominika Tkaczyk, ago
Information Extraction

Who Did What? Identifying Author Contributions in Biomedical Publications using Naïve Bayes

Our paper “Who Did What? Identifying Author Contributions in Biomedical Publications using Naïve Bayes” got recently accepted and will be presented at Joint Conference on Digital Libraries 2018. Abstract: Creating scientific publications is a complex process. It is composed of a number of different activities, such as designing the experiments, analyzing the data, and writing the manuscript. Information about the contributions of individual authors of a paper is important for assessing authors’ scientific achievements. Some biomedical publications contain a short section written in natural language, which describes the roles each author played in the process of preparing the article. In this paper, we present a study of authors’ roles commonly appearing in these sections, and propose an algorithm for automatic Read more…

By Dominika Tkaczyk, ago
Mr. DLib

RARD: The Related-Article Recommendation Dataset

We are proud to announce the release of ‘RARD’, the related-article recommendation dataset from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains information about 57.4 million recommendations that were displayed to the users of Sowiport. Information includes details on which recommendation approaches were used (e.g. content-based filtering, stereotype, most popular), what types of features were used in content based filtering (simple terms vs. keyphrases), where the features were extracted from (title or abstract), and the time when recommendations were delivered and clicked. In addition, the dataset contains an implicit item-item rating matrix that was created based on the recommendation click logs. RARD enables researchers to train machine learning algorithms for research-paper recommendations, perform offline evaluations, and Read more…

By Joeran Beel, ago
Mr. DLib

Several new publications: Mr. DLib, Lessons Learned, Choice Overload, Bibliometrics (Mendeley Readership Statistics), Apache Lucene, CC-IDF, TF-IDuF

In the past few weeks, we published (or received acceptance notices for) a number of papers related to Mr. DLib, research-paper recommender systems, and recommendations-as-a-service. Many of them were written during our time at the NII or in collaboration with the NII. Here is the list of publications: Beel, Joeran, Bela Gipp, and Akiko Aizawa. “Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia.” In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2017. Beel, Joeran. “Real-World Recommender Systems for Academia: The Gain and Pain in Developing, Operating, and Researching them.” In 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) at the 39th European Conference on Information Retrieval (ECIR), 2017. [short version, official], [long version, arxiv] Beierle, Felix, Akiko Aizawa, and Joeran Beel. Read more…

By Joeran Beel, ago
Machine Learning

Some numbers about Mr. DLib’s Recommendations-as-a-Service (RaaS)

Six months ago, we launched Mr. DLib’s recommendations-as-a-service for Academia. Time, to look back and provide some numbers: Since September 2016, Mr. DLib´s recommender system has delivered 60,836,800 recommendations to our partner Sowiport, and Sowiport’s users have clicked 91,545 of the recommendations. This equals on overall click-through rate (CTR) of 0.15%. The figure shows the number of delivered recommendations and CTR by month (2016-09-08 to 2017-02-11).  CTR is rather low and there is a notable variance among the months (e.g. 0.21% in September and 0.10% in December). The variance may be caused by different algorithms we are experimenting with. In addition, recommendations are also delivered when web spiders such as Google Bot are crawling our partner website Sowiport.de. In contrast, clicks are Read more…

By Joeran Beel, ago
Publications

Paper accepted at ISI conference in Berlin: “Stereotype and Most-Popular Recommendations in the Digital Library Sowiport”

Our paper titled “Stereotype and Most-Popular Recommendations in the Digital Library Sowiport” is accepted for publication at the 15th International Symposium on Information Science (ISI) in Berlin. Abstract: Stereotype and most-popular recommendations are widely neglected in the research-paper recommender-system and digital-library community. In other domains such as movie recommendations and hotel search, however, these recommendation approaches have proven their effectiveness. We were interested to find out how stereotype and most-popular recommendations would perform in the scenario of a digital library. Therefore, we implemented the two approaches in the recommender system of GESIS’ digital library Sowiport, in cooperation with the recommendations-as-a-service provider Mr. DLib. We measured the effectiveness of most-popular and stereotype recommendations with click-through rate (CTR) based on 28 million delivered Read more…

By Joeran Beel, ago
Machine Learning

Two of our papers about citation and term-weighting schemes got accepted at iConference 2017

Two of our papers about weighting citations and terms in the context of user modeling and recommender systems got accepted at the iConference 2017. Here are the abstracts, and links to the pre-print versions: Evaluating the CC-IDF citation-weighting scheme: How effectively can ‘Inverse Document Frequency’ (IDF) be applied to references? In the domain of academic search engines and research-paper recommender systems, CC-IDF is a common citation-weighting scheme that is used to calculate semantic relatedness between documents. CC-IDF adopts the principles of the popular term-weighting scheme TF-IDF and assumes that if a rare academic citation is shared by two documents then this occurrence should receive a higher weight than if the citation is shared among a large number of documents. Although CC-IDF Read more…

By Joeran Beel, ago