Machine Learning

An Empirical Comparison of Syllabuses for Curriculum Learning (Pre-Print)

We have published a pre-print (now available on Arxiv) which outlines our work comparing different syllabuses for curriculum learning. Neural networks are typically trained by repeatedly randomly selecting examples from a dataset and taking steps of stochastic gradient descent. Curriculum learning is an alternative approach to training neural networks, inspired by human learning in which training examples are presented according to a syllabus typically of increasing “difficulty”. Curriculum learning has shown some impressive empirical results, but little is known about the relative merits of different syllabuses. In this work we provide an empirical comparison of a number of syllabuses found in the literature. Abstract Syllabuses for curriculum learning have been developed on an ad-hoc, per task basis and little is Read more…

By Mark Collier, ago
Machine Learning

3rd Call for EU Marie Curie EDGE Fellowships (e.g. in Machine-Learning or Recommender-Systems Research)

The 3rd call for EU Marie Curie EDGE fellowships has been published. We successfully supported the application of one EDGE fellow a year ago, and we would be happy to support talented candidates this year, too. So, if you are interested in an EDGE fellowship in the field of machine learning, recommender systems or another of our research areas, please contact us. Let us know some details about your yourself and your project idea. EDGE is Marie Skłodowska-Curie COFUND Action, led by Trinity College Dublin on behalf of a group of academic institutions from across Ireland. EDGE will offer 71 prestigious Fellowships for experienced researchers (post-doctoral or equivalent) relocating to Ireland. EDGE is also a training and development programme for scientific excellence, offering a Read more…

By Joeran Beel, ago
Recommender Systems

Mr. DLib’s Living Lab for Scholarly Recommendations (preprint)

We published a manuscript on arXiv about the first living lab for scholarly recommender systems. This lab allows recommender-system researchers to conduct online evaluations of their novel algorithms for scholarly recommendations, i.e., research papers, citations, conferences, research grants etc. Recommendations are delivered through the living lab´s API in platforms such as reference management software and digital libraries. The living lab is built on top of the recommender system as-a-service Mr. DLib. Current partners are the reference management software JabRef and the CORE research team. We present the architecture of Mr. DLib’s living lab as well as usage statistics on the first ten months of operating it. During this time, 970,517 recommendations were delivered with a mean click-through rate of 0.22%. Read more…

By Joeran Beel, ago
Recommender Systems

RARD II: The 2nd Related-Article Recommendation Dataset (preprint)

We released a new version of RARD, i.e. RARD II and describe the new release in a preprint published on arXiv. The dataset is available at http://data.mr-dlib.org and the new manuscript is available on arXiv and here in our Blog. The main contribution of this paper is to introduce and describe a new recommender-systems dataset (RARD II). It is based on data from a recommender-system in the digital library and reference management software domain. As such, it complements datasets from other domains such as books, movies, and music. The RARD II dataset encompasses 89m recommendations, covering an item-space of 24m unique items. RARD II provides a range of rich recommendation data, beyond conventional ratings. For example, in addition to the usual rating Read more…

By Joeran Beel, ago
Information Extraction

Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers

Our paper “Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers” got recently accepted and will be presented at Joint Conference on Digital Libraries 2018. Abstract: Bibliographic reference parsing refers to extracting machine-readable metadata, such as the names of the authors, the title, or journal name, from bibliographic reference strings. Many approaches to this problem have been proposed so far, including regular expressions, knowledge bases and supervised machine learning. Many open source reference parsers based on various algorithms are also available. In this paper, we apply, evaluate and compare ten reference parsing tools in a specific business use case. The tools are Anystyle-Parser, Biblio, CERMINE, Citation, Citation-Parser, GROBID, ParsCit, PDFSSA4MET, Reference Tagger Read more…

By Dominika Tkaczyk, ago
Information Extraction

Who Did What? Identifying Author Contributions in Biomedical Publications using Naïve Bayes

Our paper “Who Did What? Identifying Author Contributions in Biomedical Publications using Naïve Bayes” got recently accepted and will be presented at Joint Conference on Digital Libraries 2018. Abstract: Creating scientific publications is a complex process. It is composed of a number of different activities, such as designing the experiments, analyzing the data, and writing the manuscript. Information about the contributions of individual authors of a paper is important for assessing authors’ scientific achievements. Some biomedical publications contain a short section written in natural language, which describes the roles each author played in the process of preparing the article. In this paper, we present a study of authors’ roles commonly appearing in these sections, and propose an algorithm for automatic Read more…

By Dominika Tkaczyk, ago
Mr. DLib

RARD: The Related-Article Recommendation Dataset

We are proud to announce the release of ‘RARD’, the related-article recommendation dataset from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains information about 57.4 million recommendations that were displayed to the users of Sowiport. Information includes details on which recommendation approaches were used (e.g. content-based filtering, stereotype, most popular), what types of features were used in content based filtering (simple terms vs. keyphrases), where the features were extracted from (title or abstract), and the time when recommendations were delivered and clicked. In addition, the dataset contains an implicit item-item rating matrix that was created based on the recommendation click logs. RARD enables researchers to train machine learning algorithms for research-paper recommendations, perform offline evaluations, and Read more…

By Joeran Beel, ago
Mr. DLib

Several new publications: Mr. DLib, Lessons Learned, Choice Overload, Bibliometrics (Mendeley Readership Statistics), Apache Lucene, CC-IDF, TF-IDuF

In the past few weeks, we published (or received acceptance notices for) a number of papers related to Mr. DLib, research-paper recommender systems, and recommendations-as-a-service. Many of them were written during our time at the NII or in collaboration with the NII. Here is the list of publications: Beel, Joeran, Bela Gipp, and Akiko Aizawa. “Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia.” In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2017. Beel, Joeran. “Real-World Recommender Systems for Academia: The Gain and Pain in Developing, Operating, and Researching them.” In 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) at the 39th European Conference on Information Retrieval (ECIR), 2017. [short version, official], [long version, arxiv] Beierle, Felix, Akiko Aizawa, and Joeran Beel. Read more…

By Joeran Beel, ago
Machine Learning

Some numbers about Mr. DLib’s Recommendations-as-a-Service (RaaS)

Six months ago, we launched Mr. DLib’s recommendations-as-a-service for Academia. Time, to look back and provide some numbers: Since September 2016, Mr. DLib´s recommender system has delivered 60,836,800 recommendations to our partner Sowiport, and Sowiport’s users have clicked 91,545 of the recommendations. This equals on overall click-through rate (CTR) of 0.15%. The figure shows the number of delivered recommendations and CTR by month (2016-09-08 to 2017-02-11).  CTR is rather low and there is a notable variance among the months (e.g. 0.21% in September and 0.10% in December). The variance may be caused by different algorithms we are experimenting with. In addition, recommendations are also delivered when web spiders such as Google Bot are crawling our partner website Sowiport.de. In contrast, clicks are Read more…

By Joeran Beel, ago