Software to compare PDF files

  • Durum: Closed
  • Ödül: $600
  • Alınan Girdiler: 6
  • Kazanan: carlquist

Yarışma Özeti

This contest is to compare multiple PDF files based on the similarities of bounding boxes. This is not an easy contest and will require understanding of PDF libraries.
There are many PDF libraries available and it is not important which one is used.

Features required:
Upload multiple PDF files (many).
Converts PDFs to PNGs with bounding box squares
PNG with bounding boxes shown - user selects which bounding boxes are of interest. Can select multiple bounding boxes.
Software then searches ALL the original PDFs - to find which files have the same bounding boxes.

Matches must be based on either:-
1. Approximate co-ordinates of the bounding boxes and the respective page number. Leaving room for 3% error in placement of bounding boxes.
OR
2. Image match the area of the bounding box. It means for each match from (1) that another step must also convert that bounding box to a PNG file and do an image comparison - if almost identical images then it returns as a match.

The end result is the software shows a list of links to download that contain the PNGs\PDFs of the files with ONLY the same bounding boxes.

The winner will be asked to add a module to:-
-Enable the placement of another PNG image over any PDF image and re-write the PDF image. Many github libraries can do this.

-Put the bounding box through tessarect and do OCR text search in addition to the simple bounding box co-ordinate comparison. This would produce another criteria to match on.

So the winner can earn total $800+ from this Contest through the add on module.

Good Luck.

Please serious entries only. I have zero patience so only do this once it is fully working! I suggest you first message me your proposed methodology and I can then confirm your ideas will succeed.

Be quick!




I recommend using [login to view URL] to save time.

Some other ideas would be to convert the bounding boxes to SVG format and use an existing SVG comparison library.

Tavsiye Edilen Beceriler

Bu yarışmadan başlıca girdiler

Daha Fazla Katılım Görüntüle

Genel Açıklama Panosu

  • Asianexperts
    Asianexperts
    • 9 ay önce

    hehehe all thought to get this prize and disspointed

    • 9 ay önce
  • sunnyguptahotels
    Yarışma Sahibi
    • 10 ay önce

    Please do not enter this contest! One contestant is extremely close to winning.

    • 10 ay önce
    1. danielvz96
      danielvz96
      • 10 ay önce

      :( How close? I've already implemented a bounding box finder (can find from the smallest detail to whole paragraphs), the bulk compare function and was working on the frontend when I saw this.

      • 10 ay önce
  • teachartdevteam
    teachartdevteam
    • 10 ay önce

    Hey there! I have an slightly different idea and I will be happy to discuss it with you. Basically what you think, does it make sense if the user draws the bounding boxes. Rendering a box to each object over the pdf might not be 100% useful, I saw tons of pdf's in the past with bad structure and arrangement which contain overlapping objects. This will result into overlapping bounding boxes. With the current way a recursive lookup must be implemented, each object must be extracted from the pdf and parsed. Each object must be parsed with different internal parser (itextsharp and pdfsharp work on that way) just to take the details like size and position.

    • 10 ay önce
    1. sunnyguptahotels
      Yarışma Sahibi
      • 10 ay önce

      I see what you are saying. So which library do you propose to use for image comparison? And how would you extract the corresponding area from the other PDFs? Or does it need to compare the selected area in png against the entire png full pages of every PDF ?

      • 10 ay önce
    2. sunnyguptahotels
      Yarışma Sahibi
      • 10 ay önce

      Speed is a big consideration. To do what you are describing - it may be neccesary to overlay the page with a 12x16 grid - and then find all 'touched' grid-boxes that the hand-drawn bounding box touches - so that it does the comparison more efficiently. but that seems to add more complexity to the exercise. adobe acrobat reader seems to get the bounding boxes right without much overlap.

      • 10 ay önce
  • ITPyramid85
    ITPyramid85
    • 10 ay önce

    At first, I want to see the pdf quality if it is possible for image processing or not. Can you provide pdf files you have?

    • 10 ay önce
    1. sunnyguptahotels
      Yarışma Sahibi
      • 10 ay önce

      Assume that all the pdfs are generated from the same creation utility. The most obvious example is a bank statement. But - I think image comparison is missing the point - we want comparison by bounding box co-ordinates. So the 1st step is to find the alogirithm that Adobe uses to obtain the bounding-boxes. Most of the open-source utility treat ever character as a separate co-ordinate.

      • 10 ay önce
  • sunnyguptahotels
    Yarışma Sahibi
    • 10 ay önce

    Hi Everyone.. please ask your questions here for everyone. If you don't know what a bounding box is in a PDF document then you should not attempt this contest. I don't have time to educate, sorry. No point explaining your experience - this is a guaranteed contest - if you understand the concepts in the brief then you may submit an entry. It's as simple as that. If you don't understand it then you do the basic work first and return with specific questions.

    • 10 ay önce
  • sunnyguptahotels
    Yarışma Sahibi
    • 10 ay önce

    Hi Everyone

    • 10 ay önce
  • Codeitsmarts
    Codeitsmarts
    • 10 ay önce

    Hi, I have read your project description. I have few queries before I can begin the work. Can we discuss the same through chat? I shall endeavor to exceed your expectations.

    I have 5 years of experience in PHP, mysql, Codeigniter, Wordpress, Jquery, HTML, CSS ,Python and many more . Please see my portfolio for art work samples and my clients feedback.

    1 . http://www.astrologyindubai.com/
    2 . http://www.sweetspace9.com/
    3 . http://www.ngotiator.com/
    4 . http://www.shypon.com/
    5 . https://www.pixbrand.in/
    6 . http://www.etfmodelsolutions.com/
    7 . http://wricitieshub.org/worldtodresource/

    And I'm confident that I can complete your project on time and within your budget. I can achieve the results that you are asking for
    Please initiate chat for further discussion. I will do my best for you , with a Positive Hope! Regards

    • 10 ay önce
  • ITPyramid85
    ITPyramid85
    • 10 ay önce

    Also If you want to do the image searching, It will be normallized by special size so that it is needed image quality, pdf page amounts and it will give effect for searching speed

    • 10 ay önce
  • sprlabs9
    sprlabs9
    • 10 ay önce

    Hi, I would like to discuss. Please drop me a message.

    • 10 ay önce
  • dev681999
    dev681999
    • 10 ay önce

    I am probably wrong fell free to correct me

    • 10 ay önce
  • dev681999
    dev681999
    • 10 ay önce

    By reading the description this is what I have understood - You want a website where people can upload PDFs files. Then the PDF is converted to PNG which contains bounding boxes. These bouding boxes matches any other boxes from uploaded files. Then user can select bouding boxes to download.

    • 10 ay önce
  • sunnyguptahotels
    Yarışma Sahibi
    • 10 ay önce

    It can be in PHP, Python, or C#. There must be a web-front end to accept the upload of the files so Java\VB are not suitable.

    • 10 ay önce
  • a6jack
    a6jack
    • 10 ay önce

    Dear,
    May we know which language (PHP, Python, C#, JAVA ...) this software should be written and is it will be a website or Desktop app?

    • 10 ay önce
  • sunnyguptahotels
    Yarışma Sahibi
    • 10 ay önce

    Please submit a blank entry then it will allow me to message you.

    • 10 ay önce
  • desmondmile03
    desmondmile03
    • 10 ay önce

    Hi, please message me so I can discuss my proposed methodology. Thanks

    • 10 ay önce
  • ahsanfaheem3
    ahsanfaheem3
    • 10 ay önce

    Dear contest holder, kindly message me so I can discuss my proposed methodology. Thanks.

    • 10 ay önce

Daha fazla yorum göster

Yarışmalara nasıl başlanır

  • Projenizi ilan edin

    Yarışmanızı İlan Edin Hızlı ve kolay

  • Tonlarca girdi alın

    Tonlarca Girdi Alın Bütün dünyadan

  • En iyi girdiyi seçin

    En iyi girdiyi seçin Dosyaları indirin - Kolay!

Şimdi bir Yarışma İlan Et ya da Bugün Bize Katılın!