Docear Beta 7 with PDF Metadata Extraction

Published by Joeran Beel on

Beta 7 is out and has one major new feature: (semi-)automatic extraction of bibliographic metadata from PDF files.  That means, when creating a new reference, you don’t have to type everything manually but bibliographic information such as title, author, year, journal, etc. is all provided to you automatically. Here is how it works:
Do a right click on a node with a PDF and select as shown in the picture
Select Docear’s digital library to retrieve data from
Provide the document’s title if it’s not already correctly extracted from the PDF (in about 80% the title should be extracted correctly). A click on “Yes” will send the title to Docear’s digital library and return all metadata for the documents with this title
We have looooooots of data in our database. So the chance to get the correct metadata is really high
Done 🙂
Maybe you remember that we had already a similar function retrieving data from Mr. DLib. However, our new function is much better. First of all, not the entire PDF is send but only the title of the PDF and a hash value. That means, instead of transferring maybe 1MB or more only a few KB are transferred. This will speed up the entire process dramatically. Second, Mr. DLib had a rather small database. Docear’s digital library is filled with metadata from various sources and chances are really high we have the correct metadata available. There is one downside, though: Currently, every user can only do 15 requests a day to Docear’s digital library.  But we are confident to raise this limit very soon. In addition, the function is only available for registered users.
The change-log in detail:
New features include:
#621 PDF Metadata Extraction
#627 Action to automatically export Windows registry and send to Docear for bug fixing
Feature enhancements include:
#661 Information added to dialog how to resolve duplicated entries in BibTeX files created by Mendeley
#651 use one method to open all library maps
#625 Double check PDF-XCV compatibility settings
#656 Change default preferences of JabRef
#612 Updated to the latest source code of Freeplane
#565/#587 Recommendations improved
Bug fixes include:
#613 Installation path in about dialog was wrong
#674 Typo in welcome map
#650 Recommended documents stored on a ftp server could not be downloaded
#637 SciPlore MindMapping files were converted withput permission when opened in the background
#657 Null Pointer Exception occured after starting Docear with completely new settings
#668 Hyperlinks to open folders did not always work
Other changes:
#667 Smart pdf viewer selection for Skim and Preview on MacOS removed

1 Comment

Link Roundup #3 | Personal Knowledge Management for Academia & Librarians · 20th December 2012 at 16:00

[…] the academic research management suite, has added automatic data extraction from PDFs. This is a big addition for their beta version. Docear is loosely descended from […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.