Docear’s PDF Inspector 1.01 released (title extraction now over multiple lines)
Docear’s PDF Inspector had a little bug: titles from PDFs were only extracted over the first two lines. That means when you had a PDF whose title expanded over three or more lines, only a part of the title was extracted. This bug is fixed in the current version 1.01.
Just for those who don’t know what Docear’s PDF Inspector is: Docear’s PDF Inspector is a JAVA library that extracts titles from a PDF file not from the PDF’s metadata but from its full-text. More precisely, Docear’s PDF Inspector extracts the full-text of the first page of a PDF and looks for the largest text in the upper third of that page. This text is returned as title. Of course, this does not always deliver the correct title (e.g. sometimes the journal name is formatted in a larger font size than an article’s title) but in about 70% you will get the correct title.
Download Docear’s PDF Inspector 1.01