Software to compare PDF files
- Durum: Closed
- Ödül: $600
- Alınan Girdiler: 6
- Kazanan: carlquist
This contest is to compare multiple PDF files based on the similarities of bounding boxes. This is not an easy contest and will require understanding of PDF libraries.
There are many PDF libraries available and it is not important which one is used.
Upload multiple PDF files (many).
Converts PDFs to PNGs with bounding box squares
PNG with bounding boxes shown - user selects which bounding boxes are of interest. Can select multiple bounding boxes.
Software then searches ALL the original PDFs - to find which files have the same bounding boxes.
Matches must be based on either:-
1. Approximate co-ordinates of the bounding boxes and the respective page number. Leaving room for 3% error in placement of bounding boxes.
2. Image match the area of the bounding box. It means for each match from (1) that another step must also convert that bounding box to a PNG file and do an image comparison - if almost identical images then it returns as a match.
The end result is the software shows a list of links to download that contain the PNGs\PDFs of the files with ONLY the same bounding boxes.
The winner will be asked to add a module to:-
-Enable the placement of another PNG image over any PDF image and re-write the PDF image. Many github libraries can do this.
-Put the bounding box through tessarect and do OCR text search in addition to the simple bounding box co-ordinate comparison. This would produce another criteria to match on.
So the winner can earn total $800+ from this Contest through the add on module.
Please serious entries only. I have zero patience so only do this once it is fully working! I suggest you first message me your proposed methodology and I can then confirm your ideas will succeed.
I recommend using [login to view URL] to save time.
Some other ideas would be to convert the bounding boxes to SVG format and use an existing SVG comparison library.