Publications and Further Research Outputs
Joeran Beel, Please visit my Google Scholar profile for a complete list of publications, https://scholar.google.de/citations?user=jyXACVcAAAAJ&hl=en
DescriptionRecommender Systems, User Modelling, Information Retrieval, Machine Learning, Artificial Intelligence, Information Extraction, Natural Language Processing, Text Mining, Citation Analysis, Bibliometrics, Altmetrics, Scientometrics, Plagiarism Detection, Blockchain, Digital Libraries, Digital Humanities, Finance (FinTech), Legal, Tourism, Medical
- DISCANT: Domain-Independent Semantic Annotation of the Text
- Surrounded by huge and exponentially growing volume of information of various nature, every day we face challenges with keeping track of the latest news in a domain of interest or finding good-quality answers to specific questions. The problem of information overload has been addressed by modern information systems and search engines, however, their capabilities are strongly limited by using unstructured textual formats for exchanging and storing information. Textual documents often contain a lot of concrete facts and quantifiable information, but the communication of these facts and information through human languages renders them inaccessible to machines. This "semantic bottleneck" problem substantially slows down communication and knowledge propagation in the world. In order to equip machines with the ability of effective processing of the text, DISCANT ("Domain-Independent SemantiC ANnotation of the Text") aims at creating a comprehensive framework for semantic annotation of textual documents of arbitrary domains, such as scientific papers, legal documents, customer reviews or clinical trial reports. We will develop approaches, methods and tools for two classes of solutions: an environment for discovering entity and relation types in a given domain and a system for automated semantic annotation of the text. The project proposes a novel approach based on a combination of unsupervised natural language processing and machine learning techniques. DISCANT will advance machine understanding of the text, contributing to the release of important knowledge buried in textual documents, the creation of machine-readable knowledge repositories and more effective solutions for semantic search and personal recommendations. As a consequence, DISCANT will equip the consumers of textual documents with better tools for overcoming data deluge and information overload, enabling them to make better-quality, data-driven decisions.
- Funding Agency
- EU / SFI
- Date From
- Date To
- Mr. DLib
- Mr. DLib http://mr-dlib.org is a non-profit open-source project to provide recommendations-as-a-service for research articles, call for papers, and academic news. Mr. DLib was originally developed as a Machine-readable Digital Library at the University of California, Berkeley and is nowadays run by researchers, among others, from the Trinity College Dublin (Ireland), and the University of Konstanz (Germany). Mr. DLib offers three services: 1. Recommendations-as-a-service (RaaS) for operators of academic products Operators of academic products such as digital libraries or reference management tools can easily integrate a recommender system in their own products with Mr. DLib. To do so, operators need no knowledge about recommender systems. In addition, the effort of integrating Mr. DLib's recommendations-as-a-service ranges from a few hours to a few days, compared to several months of work for implementing one's own recommender system. Operators have the choice to recommend only their own content (e.g. research articles) to their users, or content from Mr. DLib's content providers. 2. Academic outreach for providers of academic content Mr. DLib helps content providers such as universities, publishers, conference organizers, and open access repositories to reach out to students and researchers and win them as new visitors, readers, users, or customers. For instance, publishers may gain new readers for their publications; universities may attract new students for their courses; and conference organizers may attract new submissions. Mr. DLib is doing so by recommending the providers' content - e.g. call for papers, course descriptions, or research articles - to the users of Mr. DLib's RaaS partners. 3. A real-world research environment for students and researchers Mr. DLib shares its data, i.e. we allow external researchers and students to conduct their research with Mr. DLib (as long as the privacy of our partners and users is ensured). In the short-term, we publish datasets that contain the documents indexed by Mr. DLib and information about the delivered recommendations. Our long-term goal is to establish a "living lab" that allows external researchers to evaluate their recommendation algorithm in real-time with Mr. DLib and its partners. Mr. DLib is an ideal environment for research about recommender systems and digital libraries as well as research in the field of machine learning, citation analysis, natural language processing and several related disciplines. So, if you are interested in conducting research that has a real impact on how other researchers work.
- Docear http://docear.org is a unique solution to academic literature management, i.e. it helps you organizing, creating, and discovering academic literature. Among others, Docear offers: 1. A single-section user-interface that allows the most comprehensive organization of your literature. With Docear, you can sort documents into categories; you can sort annotations (comments, bookmarks, and highlighted text from PDFs) into categories; you can sort annotations within PDFs; and you can view multiple annotations of multiple documents, in multiple categories - at once. 2. A 'literature suite concept' that combines several tools in a single application (pdf management, reference management, mind mapping, ). This allows you to draft your own papers, assignments, thesis, etc. directly in Docear and copy annotations and references from your collection directly into your draft. 3. A recommender system that helps you to discover new literature: Docear recommends papers which are free, in full-text, instantly to download, and tailored to your information needs.
- CitePlag http://citeplag.org/ is a prototype of a hybrid Plagiarism Prevention and Detection System cooperatively developed by the Information Science Group at the University of Konstanz and Prof. Joeran Beel at the Trinity College Dublin. CitePlag implements the Citation-based Plagiarism Detection (CbPD) approach, which was initially introduced in the doctoral thesis of Bela Gipp. Details on the algorithms implemented in CitePlag can be found here. The current prototype was developed in cooperation with students from the HTW, Berlin Compared to existing approaches for plagiarism detection, the CitePlag prototype does not consider textual similarity alone but uses citation patterns within academic documents as a unique, language-independent fingerprint to identify semantic similarity. This feature for the first time enables automated detection of strongly disguised plagiarism forms, including paraphrases, translated plagiarism, and even idea plagiarism. The suitability of the CbPD approach in detecting disguised plagiarism was first demonstrated on the plagiarized thesis of former German defense minister Karl- Theodor zu Guttenberg [PDF]. While conventional detection approaches could not identify a single instance of translated plagiarism in the thesis, the CbPD approach detected 13 of the 16 translated plagiarisms. The effectiveness of the method was further demonstrated when applied to the works of multiple authors and various plagiarism styles in the VroniPlag collection. Evaluations of real-world plagiarism showed that plagiarists commonly disguise academic misconduct by paraphrasing copied text, but often do not substitute or rearrange the citations copied from the source document. Most recently, the practicability of the CbPD approach was demonstrated by analyzing 185,000 publications in the comprehensive bioscience full-text database PubMed Central. The CbPD algorithms allowed the identification of several plagiarism cases that were non-machine-detectable using today's prevalent methods. [PDF]. As a result, several publications were retracted, including a fraudulent medical study. While the CbPD approach offers unique benefits, it should be seen as a supplement not a replacement to existing software-based plagiarism detection methods, since text-based and citation-based plagiarism detection approaches complement each other. The CitePlag prototype represents a visualization of concepts and algorithms developed by the Information Science Group around the idea of Citation-based Plagiarism Detection (CbPD).
- OriginStamp https://originstamp.org is a non-commercial trusted timestamping service that can be used free of charge and anonymously. Trusted timestamping enables you to prove that you were the originator of an information (e.g. text or any other media file) at a certain time. Trusted timestamping isn't new. Even before computers existed information could be hashed and the hash could be published in a newspaper. However, OriginStamp.org allows you to anonymously timestamp information in a decentralized and tamperproof way within seconds. It only takes a few clicks and it's completely free.
Awards and Honours
DAAD Postdoctoral Fellowship (FIT Weltweit)
5th prize in B-P-W business plan contest
1st prize in ego.BUSINESS business plan contest
Best graduate of the Computer Science Department 2007/08
2nd prize in business plan contest 'B-P-W'
2nd winner at "Jugend-forscht", Germany's most reputable research contest for youth (national wide round)
Award for an outstanding microelectronic equipment development by the Association of German Electrical Engineers