Docear 1.2 Beta is now available and has two major improvements: A new add-on to import any kind of highlighted text from PDFs This new add-on is a true milestone in the Docear development. Until now, you could only import highlighted text from PDF editors that copied the highlighted text Read more…
Thanks to all the generous donors, our student Christoph could work on an improved PDF metadata retrieval for Docear, and today it’s time to present the first preview. The new Docear 1.1 (preview) is able to extract the title of a PDF and fetch appropriate metadata from Google Scholar. Whenever you select a PDF in your mind-map and chose “Create or Update reference”, the following new dialog appears.
The dialog shows the file name of your PDF file, and the extracted title. In the background, the extracted title is sent to Google Scholar and metadata for the first three search results are shown in the dialog. If the title was extracted incorrectly, you can manually correct it. You may also chose to use the PDF’s file name for the search. For instance, when you named your PDF already according to the title, select the radio button with the file name, and the file name is sent as search query to Google Scholar (you may also manually correct the file name before it’s sent to Google Scholar). Of course, all other options you already know are still available, such as creating a blank entry, or importing the XMP data of PDFs. Btw. Docear remembers your choice, i.e. when you select to create a blank entry, the option will be pre-selected when open that dialog the next time. It might happen, that your IP will be blocked by Google Scholar when you use the service too frequently. If this happens, a captcha should appear, and after solving it, you should be able to proceed. We did not yet test this thoroughly. Please let us know your experiences.
The precision of our metadata tool depends on two factors, A) the precision of the title extraction and B) the coverage of Google Scholar. According to a recent experiment, title extraction of our tool is around 70%. However, the final result very much depends on the format of your research articles. In my research field (i.e. recommender systems), I would say that our tool extracts the title correctly for about 90% of the articles in my personal library. In addition, almost all articles that are relevant for my research are indexed by Google Scholar (i would estimate, more than 90%). This means, for around 80% of my PDFs the correct metadata is retrieved fully automatically. Given that I provide the title manually, for even more than 90% the metadata may be retrieved. Please let us know your experience (and your research field). (more…)
Done! We’ve got all the money we need, thank you very much!!!!!!!! Read on here…
One of Docear’s biggest disadvantages, compared to other reference managers, is the rather poor PDF metadata extraction capability. As such, it is no surprise that the second most popular feature request is to add decent PDF metadata extraction and file renaming to Docear. However, adding such a function is a lot of work and we currently do not really have the manpower for this. Fortunately, one of our best students – i.e. Christoph, who already did a lot of work for us – wants a paid job for his semester breaks. If we could pay him 1,800 Euros, he would love to implement the PDF metadata extraction method in his semester breaks, and we have no doubts that he is capable of doing it. The problem is, we don’t have the funds to pay him.
Therefore, we would like to start a call for donation: If you want decent PDF metadata extraction in Docear, please donate, before February 28, 2014. We need 1,800 Euros to pay Christoph for four weeks, almost full-time, starting the end of February.
It’s a while ago that we started crawling the Web for academic PDFs to index them and use them for Docear’s research paper recommender system. Meanwhile, we have collected quite a few PDFs. Unfortunately, in a foreseeable future, our servers’ disks will be full and the load of our servers is too high already (that’s why you sometimes won’t get recommendations in Docear – our servers simply are too busy).
Since our budget is tight and we don’t want to spend too much time for server administration neither, we are asking for your help: Do you have a server that you could spare? What we need is the following
Maybe the most disturbing thing about Docear is the lack of a proper PDF reader that creates comments, bookmarks and highlighted text that can be imported by Docear and that runs on all operating systems. Personally, I use Foxit Reader and create bookmarks to remember important statements but it can’t highlight text properly. PDF XChange Viewer could be a great alternative if they had persistent object numbers – but they don’t (read here for more details).
Due to the lack of a truly proper Java PDF viewer, we are considering to develop our own PDF viewer. There are plenty of Java PDF libraries out there. However, I had a look at them and none of seem is really suitable. Aspose PDF, iText, jPod Renderer>, PDF Tron, Big Faceless Java PDF, CABAReT Stage, jPDFBookmarks, JPedal, PDFBox, ICE Pdf, ReMarksPDF, and Qoppa’s jPDFViewer all have some shortcomings. Either they have many features but are commercial (e.g. Big Faceless Java PDF), or they are open source but do not offer the required feature or have serious bugs (e.g. PDFBox).
So, my question: Do you know of any other Java PDF libraries or even better a fully functional Java PDF viewer? Our requirements are: