I require a Java command-line program that automatically extracts information (content/sentences) between bookmarks in a PDF.
The program should use either Apache PDFBox or Apache Tikka.
The program should do the following:
a) Java -jar extractContent PDFName Bookmark
Extract the content between bookmarks (i.e. print to screen). In the command line a Bookmark name will be provided (i.e. Background) and the program should extract the text between that Bookmark and the next Bookmark. Note: Bookmarks may have several levels (So you need to extract the data between Bookmarks on the same level).
b) Java -jar extractContent PDFName Bookmark Keyword
If a keyword is provided, then the program should extract the paragraph (located in that Bookmark section) that contains that keyword.
The type of PDF that I am interested in can be found at: [url removed, login to view]
Deliverables include the following:
1. Source code with documentation
2. Jar file
Test Cases using the PDF at [url removed, login to view]:
a) Java -jar extractContent PDFName Methods (extracts data between bookmark Methods and Results, i.e. same level)
b) Java -jar extractContent PDFName “Study Population” (extracts data between bookmark Study population and Data, i.e. same level)
c) Java -jar extractContent PDFName Methods SPSS (extracts the paragraph in the the Methods section that contains the keyword SPSS. note: there may be more than one paragraph that contains the keyword).
Bu iş için 7 freelancer ortalamada $87 teklif veriyor
With respect to this project I would like to present myself as a candidate for your consideration. I have more than 8 years of IT experience. I have successfully completed few projects which involved Programming, scrip Daha Fazla
Hi Mr/Miss, I am experienced in Apache Tika. I can do this task. I have in depth knowledge in java. Thanks Sabareesan