The Parser needs extracts all key information from a candidate resume in multiple formats (ie. Word, PDF, etc.) and puts into an HR-XML file.
The INPUT into the Parser can be one of several sources including
1. Single Document in Word/PDF/TXT format
2. Folder containing multiple documents of type mentioned in (1)
3. Single emails in Outlook email folders
4. Outlook Folder containing multiple emails
5. HTML page being viewed inside a web browser (meant to parse pages )
6. Raw Text
The OUTPUT of the Parser will be a structured HR-XML document/stream.
Key Features :
• The Parser should filter documents by eliminating documents and emails that are malformed
• The Parser should filter documents that have virus/Trojans/complex-macros attached to them.
• The Parser will intelligently decide whether to process the body of the email or the attachment or both (based on type of attachment, size of body and attachment)
• If needed - the Parser should be able to use spam whitelists to provide a basic spam filter while iterating emails
• The Parser should be able to handle multiple attachments to emails and circular attachments (attachment within attachment etc).
• The parser should be able to process at least 50 emails/documents per second.
Parsed Fields: The following fields need to be parsed:
• Full Name
• Email Address
• Work Phone
• Home Phone
• Mobile Phone
• Total Experience (yrs)
• Current Company
• Education details
• Work Permit
• Job Type
• Willingness to Travel
• Required Salary
• Hourly Rate
This has to be developed as a .NET component in C#. If the developer is experienced using an open-source text-parsing engine like Lucene – alternative coding languages can be considered (as long as it is fast and can be wrapped into a .NET component).
If you have completed similar project, it will be a big plus. Please mention the same in PMB along with reference to other projects. We can negotiate Payment by Escrow on completion of the scope (against defined test cases) if that is a requirement.
Compiled Component and neatly Commented Source Code will need to be provided.