Call for Donation: (Automatic) PDF Metadata Extraction and Renaming

Published by Joeran Beel on


Done! We’ve got all the money we need, thank you very much!!!!!!!! Read on here…


 
One of Docear’s biggest disadvantages, compared to other reference managers, is the rather poor PDF metadata extraction capability. As such, it is no surprise that the second most popular feature request is to add decent PDF metadata extraction  and file renaming to Docear. However, adding such a function is a lot of work and we currently do not really have the manpower for this. Fortunately, one of our best students – i.e. Christoph, who already did a lot of work for us – wants a paid job for his semester breaks. If we could pay him 1,800 Euros, he would love to implement the PDF metadata extraction method in his semester breaks, and we have no doubts that he is capable of doing it. The problem is, we don’t have the funds to pay him.

Therefore, we would like to start a call for donation: If you want decent PDF metadata extraction in Docear, please donate, before February 28, 2014. We need 1,800 Euros to pay Christoph for four weeks, almost full-time, starting the end of February.

 

 


During the four weeks, Christoph would (begin to) implement the following work packages (see also GitHub for more details, and to follow the development):

1. Improved PDF metadata extraction dialog

Right now, retrieving metadata in Docear is quite annoying. You need to select a PDF, select the entry in the menu and go through several dialogs. We want a single dialog in which all options are combined and that could look like this:

This means, when you want to create a new reference for a PDF, a dialog opens in which you can select to a) create a blank entry b) retrieve metadata or c) create new entry based on the PDF’s XMP metadata. For b), the PDF’s title is immediately extracted and shown in the dialog. Via the lookup button, additional metatada can be retrieved and selected from the list.

2. Request metadata from Google Scholar

Docear’s digital library is rather small and not always available. Therefore, we would love to additionally request metadata directly from Google Scholar. Docear could send a title, extracted from the PDF, as search query to Google Scholar and show the search results to import them in Docear as BibTeX. We would also need an option for users to enter a captcha when Google Scholar blocks someone’s IP.

3. Auto Retrieve metadata for PDF files

We would also like to auto-retrieve metadata for all your PDF files in the background. We know, Google Scholar only allows a few dozens of requests per day. But we could implement something like requesting metadata for only e.g. 50 PDF files via Google Scholar per day.

4. Auto Rename PDF files based on BibTeX metadata

A function that renames all your PDF files according to your metadata, and you could specify the pattern how PDF files shall be renamed (e.g. [Author]_[Year].pdf).

5. Sort PDF files based on BibTeX metadata

A function that sorts your PDF files based on the metadata into folders like \year\author\filename.pdf in both the physical folder structure and the mind map structure.

Implementing all five items is a lot of work, and we cannot promise that Christoph will be able to do all of them in four weeks. However, if the required 1,800 Euros are donated, we can promise to deliver 1., and 2., and most likely also 3., maybe even 4. within two months (if Christoph does not finish the job entirely, we will help out).

If you want these features, please donate! In the unlikely case that Christoph does not manage to implement at least 1. and 2. by the end of April, you will get your money back. Similarly, if we receive less than 1,800 Euros in donations, and Christoph will not start the job, you will also get your money back if you want. And of course, we ensure that everything Christoph develops will be maintained by us in the long run. And, it’s probably needless to say: All work that Christoph is doing will be open source, so others can use it for their projects as well if they like.

If you live in the European Union, you may also use a bank transfer instead of PayPal. In this case, please transfer the money to Bank: Postbank Frankfurt, account owner: Joeran Beel, IBAN: DE51500100600853552606, BIC: PBNKDEFF.

For questions, or suggestions, please use the comment function!

 

 


12 Comments

roman · 11th February 2014 at 19:43

Please do not remove the document link from the dialogue. There is often the need to open the document to extract the title if it is not correct.

    Joeran [Docear] · 11th February 2014 at 19:49

    we won’t. the screenshot was just a simple illustration to give an overview of the main changes

Saul · 21st February 2014 at 11:06

Good luck raising the cash for this! At the moment I use Mendeley to do my metadata extraction / bibtex management / document renaming but I would *much* prefer to manage the whole thing with Docear and simplify my workflow considerably.

francesco · 21st February 2014 at 14:15

I’ve just donated… Please, keep up with the hard work… I need your mind maps. I was a mess with writing reports and scientific papers. Now I’ve everything in the right place! Moreover your software allows to save tons of paper and a lot of trees.

THANK YOU SO MUCH!

Kermit · 25th February 2014 at 04:19

I tried to donate but the PayPal web-site returned with:

Docear
Return to Docear
Error Message
The link you have used to enter the PayPal system contains an incorrectly formatted item amount.

Return to Docear
At this time, we are unable to process your request. Please return to Docear and try another option.
===============================

Please fix it.

Good luck!

    Joeran [Docear] · 25th February 2014 at 07:15

    it works for me. which button did you try? the one at the bottom of the page or at the top of the page? could it be that you entered a comma or point such as 3,000 or 30.50? this is not accepted by paypal, you can only enter digits.

Ricardo Reis · 27th February 2014 at 07:57

For half the funding, so far, can’t you aim at just 1. and/or 2.?

best,

rreis

Ricardo Reis · 27th February 2014 at 07:58

forget it, I was seeing an elder version of this page. Cache defeat…

Saul · 26th September 2014 at 14:32

Congratulations on implementing this. It’s a great achievement to have raised the cash then done the job on time and to a very high standard. Having donated, I’m really happy to now be using this feature, and have finally been able to dump Mendeley. I’ll be even more excited when the PDF document renaming/moving function is enabled as that’s a really important part of my workflow. At the moment I’m switching between Docear and standalone JabRef 2.7 + renameFile plugin, first importing with Docear, then renaming in JabRef ‘accepting changes’ between them. It’s a little clunky, but it works.

Stephan · 15th March 2016 at 13:56

Is there any progress planned on these key features?
(I just searched through all reference managers and found docear to be highly preferable compared to mendeley and zotero, as I like to work on and with the pdf-files I already have)

I miss especially the renaming feature, which should be impressively easy to implement. JabRef had such plugin, however after plugins got disabled, it does not seem to be merged yet.
Have I missed something?

Does someone know how to replace the renaming-feature (essentially you only need the bib-files), as long as it is not implemented?

    Joeran [Docear] · 31st March 2016 at 10:34

    there is no progress yet. however, renaming is not as easy as e.g. in jabref, because in Docear you need to rename the files and links in both bibtex files and all the mind-maps (of course, it’s also not that difficult to implement)

Leave a Reply to Ricardo Reis Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.