PDF intranet search engine

I'm building what could eventually be a *massive* collection of government documents in PDF format. I'm looking for a sharp, experienced and professional programmer to develop what will be the heart of a new website: a fast, indexed, smooth, robust and scalable internal search engine capable of doing keyword searches on a potentially *very large* collection of PDF documents (all text-searchable, some converted via OCR).While I'm fine with the idea of using a public license search engine product (mnogo for instance), I absolutely need the following key features, some of which I haven't seen on any public products:1. Able to do both sophisticated fuzzy natural language searches AND complex boolean searches, including phrase searches, AND/OR/NOT, wildcards, etc. Most importantly as to the latter, it must be capable of doing complex proximity searches (for instance, two words in the same sentence, paragraph, or within a certain number of words.)2. In addition to keyword searching on the full document text, the search engine must be able to limit its search based on certain meta-data fields associated with each file (for instance, date authored, authoring agency, etc.) Some of these fields must themselves be keyword searchable (e.g., "author"), while others would be numerical (e.g., "date within 2 years").3. When the search results are displayed, must include snippets of text under each document link (ala Google) showing how the keyword hits appear.4. Also like Google, needs to be capable of viewing the documents as either PDF or text/html (I assume this requires a separate module to convert the PDFs), and the keyword "hits" need to be highlighted when the document is opened. (I presume this would only be possible with the text version?)

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 2) Installation and implementation on the platform(s) specified. 3) Complete ownership and distribution copyrights to all work purchased. To be clear, this is a work-for-hire proposal, and all copyrights inure to the purchaser, not to the programmer.


I'm pretty much starting from scratch, so it would be helpful if the programmer were willing to consult on languages to be used, site hosting solutions, etc. Would also need to do the installation, and assist with implementation. Best scenario would be someone willing to be involved on an ongoing basis.

Also, there's an entirely separate project involving the creation of spidering software to actually capture all these PDFs off of other certain public sites. I'm trying to do that myself, but would consider adding that to the project. In any case would need to coordinate with the search engine programmer to make sure the meta-data is captured in a usable way.

Finally, the start-up money for this site is limited, and there may be a delay between the posting of bids and my actual ability to commission the work.

Thanks in advance for any bids!

## Platform


Beceriler: Veri Tabanı Yönetimi, Mühendislik, MySQL, PHP, Yazılım Mimarisi, Yazılım Test Etme, SQL, Web Hosting, Web Sitesi Yönetimi, Web Sitesi Testi

Daha fazlasını gör: website creation using html code, to hire professional programmer, source best solutions, search php programmer, search for author, scratch programmer hire, proposal documents, project work proposal format, programmer search for start up, programmer how to make money, professional programmer hire, ocr programmer hire, new search engine, new project proposal format, new hire solutions, new hire proposal, new hire documents, needs of a professional needs of the public, meta-search engine source code, meta search engine source code

İşveren Hakkında:
( 0 değerlendirme ) United States

Proje NO: #2960399

Bu iş için 2 freelancer ortalamada $234 teklif veriyor


See private message.

$170 USD in 90 gün içinde
(34 Değerlendirme)

See private message.

$297.5 USD in 90 gün içinde
(5 Değerlendirme)