I need a python programmer to write a simple web scraping program to extract option prices from Yahoo finance and MSN. I expect your python library to be able to take a symbol and go ahead to download the option pages from Yahoo finance and MSN for that given symbol, and extract the option chains on different dates.
On the delivery, I expect you to use regular expression or any other robust method to parse the data in a robust way and please raise an exception if anything goes wrong (for example, your code fails to download the page or your regular expression fails to parse the data). Finally, please document your code about input, output, exceptions raised in your code etc., and make your parsing code easier to accommodate future changes.
I'll define the details of return data we expect so this should only take one or two hours by any good python programmer. I'd spend only around $40 for this scrape code.
Please include your code a unit test for the following cases:
- test your code on three symbols with options: yhoo, msft, goog
- one symbol which has no options: grvy, in this case, you should return an empty OrderedDict, no None please
- any other important functions need at least one test to show it works
You could check link to have an idea of what data you are downloading:
Yahoo: [url removed, login to view]
MSN: [url removed, login to view]
This link only by default downloads the option chains for the latest expiration date. So you'd need to parse the link for all the other dates. This should be easy since to get option chains on different months, you only need to find the months with options and the link is easily constructed by adding month and year to it:
for MSN: [url removed, login to view]
for yahoo: [url removed, login to view]
I'd expect three functions I could call to get the option chains but you should add other helper functions to make your code more modular.
# return the number of days from today to the expiration date.
# For option, the expiration date for any given month and year will be the 3rd Friday of that month.
Your return for both functions should contain all the option chains for the symbol on different expiration dates. Here is the requirement of the data structure to store the option chains:
Please use OrderedDict in python 3.
In the OrderedDict,
the keys are expiration dates in the form of "2010-08" for Aug 2010, "2010-12" for Dec 2010 etc.
value is a list of two lists [calls, puts], calls keeps all the call option chains for that expiration date, and puts keeps all the put option chains for that expiration date
Details for calls and puts lists in the value of the ordered dict:
calls: [call options at different strikes], a list of Option (a class defined below) objects at different strike prices. the list is sorted by strike price from low to high.
put: [put options at different strikes], a list of Option (a class defined below) objects at different strike prices. the list is sorted by strike price from low to high.
Option object mentioned in calls and puts:
type: 'call' or 'put' is used to define whether it is a put or call option
strike_price: strike price of the option
last: last trading price of the option
chg: change of the option price
bid: bid price of the option
ask: ask price of the option
vol: volume for the option
open_int: open int or open interest for the option
all quantitative fields in the Option object should be float numbers. for any data on yahoo finance or msn, if it is shown as "N/A" or "NA", please replace it with -1 and we do not want strings on quantitative fields.