Bid Requirement: Provide a short description of your proposed approach, including which technology/language you want to use
- Build a web scraper tool that will enter an address into a specified website, pull fields from the resulting page, and then save the data to our local DB (See the attached PowerPoint screenshots for the website and the specific fields required)
- We prefer a solution using a .NET windows application, but we are willing to consider other approaches
- The tool must include a scheduling feature to run once per day
- You must deliver a simple install package with clear instructions for installation and testing
Language Requirement: You must be able to communicate by email or Skype chat (text) in English
1) Our environment is Windows Server 2008 operating system, SQL Server 2008 DB (also must work with Windows 7 and SQL Server 2008 Express)
2) You must comment all functions in English
3) Tool must include a variable to set max number of records to scrape per run. Example: if we set this to 100, and we schedule the scraper to run once per day, then the scraper will attempt to process 100 records per day
4) Tool must include a time delay variable to set how often we attempt to scrape a record. Example: if we set this to 10 seconds, then the scraper will delay 10 seconds between each address it tries to process
5) The addresses to process will be stored in our local DB in a table like the schema shown below. The results should also be written to this table. Note: you can modify the schema if necessary, but you must get our approval for the changes.
create table [url removed, login to view] (
AddressID INT NOT NULL IDENTITY (1,1) PRIMARY KEY,
Status VARCHAR(max) NOT NULL, --Status of the record: 'Queued', 'Completed' or 'Error'
Input_StreetNumber VARCHAR(100), --contains the street number to enter on the website
Input_StreetName VARCHAR(1000), --contains the street name to enter on the website
Result_ErrorDetails VARCHAR(MAX) --text from any error messages received on the website
7) If an address has not been processed, the Status field should say 'Queued'. If a record is processed without an error, the Status should say 'Completed'. Else if a record is processed but there is an error, the Status should say 'Error'.
8) Only addresses with County_And_State = 'Prince William, VA' should be processed. NOTE: in the future we want to add additional counties to this project and each county will have a different website to scrape
20 freelancer bu iş için ortalamada 175$ teklif veriyor
Hi, I can do this job well, I have good experience in Web Scraping/Mining, web automation and html parsing in .net, and very good exp with scrapping data from property sites like [url removed, login to view], [url removed, login to view], Daha fazlası
I don't think it is a hard job, to achieve a excellent performance, I prefer to use Twisted framework (from Python). If you wish, I can provide a demo for it.