Machine Learning

An Empirical Comparison of Syllabuses for Curriculum Learning (Pre-Print)

We have published a pre-print (now available on Arxiv) which outlines our work comparing different syllabuses for curriculum learning. Neural networks are typically trained by repeatedly randomly selecting examples from a dataset and taking steps of stochastic gradient descent. Curriculum learning is an alternative approach to training neural networks, inspired by human learning in which training examples are presented according to a syllabus typically of increasing “difficulty”. Curriculum learning has shown some impressive empirical results, but little is known about the relative merits of different syllabuses. In this work we provide an empirical comparison of a number of syllabuses found in the literature. Abstract Syllabuses for curriculum learning have been developed on an ad-hoc, per task basis and little is Read more…

By Mark Collier, ago
Mr. DLib

RARD: The Related-Article Recommendation Dataset

We are proud to announce the release of ‘RARD’, the related-article recommendation dataset from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains information about 57.4 million recommendations that were displayed to the users of Sowiport. Information includes details on which recommendation approaches were used (e.g. content-based filtering, stereotype, most popular), what types of features were used in content based filtering (simple terms vs. keyphrases), where the features were extracted from (title or abstract), and the time when recommendations were delivered and clicked. In addition, the dataset contains an implicit item-item rating matrix that was created based on the recommendation click logs. RARD enables researchers to train machine learning algorithms for research-paper recommendations, perform offline evaluations, and Read more…

By Joeran Beel, ago
Mr. DLib

Several new publications: Mr. DLib, Lessons Learned, Choice Overload, Bibliometrics (Mendeley Readership Statistics), Apache Lucene, CC-IDF, TF-IDuF

In the past few weeks, we published (or received acceptance notices for) a number of papers related to Mr. DLib, research-paper recommender systems, and recommendations-as-a-service. Many of them were written during our time at the NII or in collaboration with the NII. Here is the list of publications: Beel, Joeran, Bela Gipp, and Akiko Aizawa. “Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia.” In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2017. Beel, Joeran. “Real-World Recommender Systems for Academia: The Gain and Pain in Developing, Operating, and Researching them.” In 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR) at the 39th European Conference on Information Retrieval (ECIR), 2017. [short version, official], [long version, arxiv] Beierle, Felix, Akiko Aizawa, and Joeran Beel. Read more…

By Joeran Beel, ago
Publications

Paper accepted at ISI conference in Berlin: “Stereotype and Most-Popular Recommendations in the Digital Library Sowiport”

Our paper titled “Stereotype and Most-Popular Recommendations in the Digital Library Sowiport” is accepted for publication at the 15th International Symposium on Information Science (ISI) in Berlin. Abstract: Stereotype and most-popular recommendations are widely neglected in the research-paper recommender-system and digital-library community. In other domains such as movie recommendations and hotel search, however, these recommendation approaches have proven their effectiveness. We were interested to find out how stereotype and most-popular recommendations would perform in the scenario of a digital library. Therefore, we implemented the two approaches in the recommender system of GESIS’ digital library Sowiport, in cooperation with the recommendations-as-a-service provider Mr. DLib. We measured the effectiveness of most-popular and stereotype recommendations with click-through rate (CTR) based on 28 million delivered Read more…

By Joeran Beel, ago
Machine Learning

Two of our papers about citation and term-weighting schemes got accepted at iConference 2017

Two of our papers about weighting citations and terms in the context of user modeling and recommender systems got accepted at the iConference 2017. Here are the abstracts, and links to the pre-print versions: Evaluating the CC-IDF citation-weighting scheme: How effectively can ‘Inverse Document Frequency’ (IDF) be applied to references? In the domain of academic search engines and research-paper recommender systems, CC-IDF is a common citation-weighting scheme that is used to calculate semantic relatedness between documents. CC-IDF adopts the principles of the popular term-weighting scheme TF-IDF and assumes that if a rare academic citation is shared by two documents then this occurrence should receive a higher weight than if the citation is shared among a large number of documents. Although CC-IDF Read more…

By Joeran Beel, ago
Academic Search Engine Optimization (ASEO)

Do you trust Google Scholar?

Are you using Google Scholar? For finding scientific literature? For obtaining citation counts and publication lists of researchers? Have you ever thought about how trustworthy the information is you get on Google Scholar? My colleague and I performed several tests with Google Scholar and found out that it is really easy to fool Google Scholar. You can easily increase citation counts of articles and therefore increase the article’s rankings. You can easily add invisible keywords to articles and make the article appear relevant for searches it actually isn’t. You can also create complete non-sensical articles with the paper generator SciGen and make Google Scholar index them. And you can place any kind of advertisement in manipulated articles and make users Read more…

By Joeran Beel, ago
Academic Search Engine Optimization (ASEO)

New Paper: On the Robustness of Google Scholar against Spam

I am currently in Toronto presenting our new paper titled "On the Robustness of Google Scholar against Spam" at Hypertext 2010. The paper is about some experiments we did on Google Scholar to find out how reliable their citation data etc. is. The paper soon will be downloadable on our publication page but for now i will post a pre-print version of that paper here in the blog:

Abstract

In this research-in-progress paper we present the current results of several experiments in which we analyzed whether spamming Google Scholar is possible. Our results show, it is possible: We ‘improved’ the ranking of articles by manipulating their citation counts and we made articles appear in searchers for keywords the articles did not originally contained by placing invisible text in modified versions of the article.

1.    Introduction

Researchers should have an interest in having their articles indexed by Google Scholar and other academic search engines such as CiteSeer(X). The inclusion of their articles in the index improves the ability to make their articles available to the academic community. In addition, authors should not only be concerned about the fact that their articles are indexed, but also where they are displayed in the result list. As with all ranked search results, articles displayed in top positions are more likely to be read.

In recent studies we researched the ranking algorithm of Google Scholar [/fusion_builder_column][fusion_builder_column type="1_1" background_position="left top" background_color="" border_size="" border_color="" border_style="solid" spacing="yes" background_image="" background_repeat="no-repeat" padding="" margin_top="0px" margin_bottom="0px" class="" id="" animation_type="" animation_speed="0.3" animation_direction="left" hide_on_mobile="no" center_content="no" min_height="none"][1-3] and gave advice to researchers on how to optimize their scholarly literature for Google Scholar [4]. However, there are provisos in the academic community against what we called “Academic Search Engine Optimization” [4]. There is the concern that some researchers might use the knowledge about ranking algorithms to ‘over optimize’ their papers in order to push their articles’ rankings in non-legitimate ways.

We conducted some experiments to find out how robust Google Scholar is against spamming. The experiments are not all completed yet but those that are completed show interesting results which are presented in this paper. (more…)

By Joeran Beel, ago
Academic Search Engine Optimization (ASEO)

Academic Search Engine Optimization: What others think about it

In January we published our article about Academic Search Engine Optimization (ASEO). As expected, feedback varied strongly. Here are some of the opinions on ASEO:

Search engine optimization (SEO) has a golden age in this internet era, but to use it in academic research, it sounds quite strange for me. After reading this publication (pdf) focusing on this issue, my opinion changed.
[/fusion_builder_column][fusion_builder_column type="1_1" background_position="left top" background_color="" border_size="" border_color="" border_style="solid" spacing="yes" background_image="" background_repeat="no-repeat" padding="" margin_top="0px" margin_bottom="0px" class="" id="" animation_type="" animation_speed="0.3" animation_direction="left" hide_on_mobile="no" center_content="no" min_height="none"][...] on first impressions it sounds like the stupidest idea I’ve ever heard.
ASEO sounds good to me. I think it’s a good idea.
Good Article..
As you have probably guessed from the above criticisms, I thought that the article was a piece of crap.
In my opinion, being interested in how (academic) search engines function and how scientific papers are indexed and, of course, responding to these… well… circumstances of the scientific citing business is just natural.
Check out the following Blogs to read more about it (some in German and Dutch) (more…)

By Joeran Beel, ago
Google Scholar

Academic Search Engine Optimization – make your articles better findable

The Journal of Scholarly Publishing just published our article Academic Search Engine Optimization (ASEO): Optimizing Scholarly Literature for Google Scholar and Co. The article introduces and discusses the concept of what we call “academic search engine optimization” (ASEO) and define as: “Academic search engine optimization is the creation, publication, and modification of scholarly literature in a way that makes it easier for academic search engines to both crawl it and index it”. Based on three recently conducted studies, guidelines are provided on how to optimize scholarly literature for academic search engines in general and for Google Scholar in particular. In addition, we briefly discuss the risk of researchers’ illegitimately ‘over-optimizing’ their articles. Probably not everyone will agree with the article. Read more…

By Joeran Beel, ago