Program to create XML from OCR Document

I require a talented programmer to code a program that has the capability to run a process where:

1. a PDF document will be OCR'd to produce an editable document.

2. the program must locate certain strings of words in the OCR document;

3. then when those strings appear, to create an XML file, namely an answer file for the program called HotDocs (a document generation program) ([url removed, login to view]). The same "answers" should be able to be saved to SQL database if necessary.

The XML Answer file will need to be used by HotDocs to generate a written report. HotDocs has the ability to have a DLL created whereby answers can be "absorbed" into the HotDocs system enabling a report to be prepared. See [url removed, login to view]

The "absorbed answers" written to the HotDocs XML answer file will be the word / phrases which are outputted by your software after the OCR process has occurred.

From a PDF document (see attached dummy contract) I wanted for example on the front page to be able to


1. Extract who the Real Estate Agent is (1st box - L J Hooker Ashfield) and put those details into the answer file

2. Extract 'identifier' details of the address of the property being bought (middle of page) including street number / name / suburb (1 Smith Street Smithville)

3. Extract details of when contract is due to be completed (12 weeks after the date of the Contract)

4. Extract details of the date of the contract which will usually be a handwritten date in the last box of the first page.

The info extracted would be placed as an answer in the hotdocs XML answer file. The answer file would then be used to prduce a document. For example

"Dear Sir

You as the Purchaser are required to complete the Contract and finalise the purchase of [details extracted from no 2.]. in [details extracted from no 3.] days after the date of the Contract which is [details extracted from no 4.].

You also promise to the Vendor that you have only been introduced to [details extracted from no 2.] by the Agent shown on the front page of the Contract being [details extracted from no 1.]. If you have been introduced to the property by another agent please advise us immediately as you may be liable to pay the commission of another Agent. "

If you look at the dummy Contract there is information that I will require which may not be shown simply in text but also in diagrams (see the plans attached to the dummy contract). I need to extract info from the diagrams to confirm that the information matches what's on the front page of the Contract.

These contracts are usually 30 - 40 pages. They will be about 2 - 3mb in size but can be bigger or smaller. I am happy to consider using any OCR process but would prefer an open source one to minimise cost.

Beceriler: OCR, PDF, SQL, XML

Daha fazlasını gör: ocr xml, ocr pdf sql, SQL programmer, xml to pdf php, xml to pdf in php, xml pdf php, word to pdf php code, wanted sql php programmer, sql pdf file, sql code pdf, purchaser required, programmer contracts, php xml to pdf, php programmer for real estate, php code to create report, open source sql, ocr to xml, need a programmer for ocr system, answers from, pdf to word ocr, ocr pdf to word, xml to word, sql l, programmer street, ocr text

İşveren Hakkında:
( 1 değerlendirme ) Five Dock, Australia

Proje NO: #4422500

Bu iş için 16 freelancer ortalamada $1822 teklif veriyor


I would be happy to create something custom made for you to read correctly the document and generate exactly your needs. Let me know if you have a fair budget for a top quality work with full rights on the code and eve Daha Fazla

in %bids___i_period_sub_35% gün içinde5150%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(5 Değerlendirme)

I can help in your project, please check PMB and our ratings/reviews to get idea of our experience. Please let me know if you have any queries.

in %bids___i_period_sub_35% gün içinde500%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(27 Değerlendirme)

will discuss more if you like

in %bids___i_period_sub_35% gün içinde1030%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(16 Değerlendirme)

Hello, i am OCR expert and i have similar solution. Please check PMB

in %bids___i_period_sub_35% gün içinde1500%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(6 Değerlendirme)

Hi, I have gone through your requirement and I am glad that We can accomplish this task, Please give us opportunity to work with you. Please check PM . Thanks, TTS Team

in %bids___i_period_sub_35% gün içinde1000%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(14 Değerlendirme)

Hi, I have a good ocr engine and very good experience dealing with xml files.

in %bids___i_period_sub_35% gün içinde1060%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(11 Değerlendirme)

Hi. I'm a programmer with experience in parsing text documents and XML handling. Please write how many document types are there - documents with the same text structure. Regards.

in %bids___i_period_sub_35% gün içinde1500%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(1 Yorum)

We are very much eager to work on your project. Kindly check out Private msg for links to understand the quality of our past work.

in %bids___i_period_sub_35% gün içinde1050%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

Please find your PMB

in %bids___i_period_sub_35% gün içinde1287%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

Please refer to the PMB

in %bids___i_period_sub_35% gün içinde1430%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

Hello! Your offer sounds good. I want to work with you.

in %bids___i_period_sub_35% gün içinde1100%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

Hi, I'm expert at PDF manipulation, please let's discuss in PMB kindly.

in %bids___i_period_sub_35% gün içinde4738%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

This is a great project. I am ready to work on this project. But the budget is not enough,let me know if you can consider the budget.

in %bids___i_period_sub_35% gün içinde3500%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

Hi, Please read the PM

in %bids___i_period_sub_35% gün içinde2000%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

Thank you for your project. I reviewed your requirement. We have excellent team of programmers and designers to work on your project efficiently and complete job in [url removed, login to view] check your PMB.

in %bids___i_period_sub_35% gün içinde1400%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

Ready to start now.

in %bids___i_period_sub_35% gün içinde1200%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(1 Yorum)

Hi. I'm a c++ programmer with skills in image processing, PDF extraction and Tesseract OCR API. Best Regards.

in %bids___i_period_sub_35% gün içinde1100%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)