Purpose:
--------
Currently the free geocoders (google, yahoo, etc) do not allow for an unlimited number of geocoding requests per 24 hour period. Given the number of listings Nestoria must geolocate each day, we need a means to overcome this limitation by using multiple sources for our geocoding efforts.
This motivating factor behind this project is Australia - we don't have street data and are thus unable to geocode street addresses inhouse.
Methodology:
------------
Many sources providing free geocoding, however, a limit is placed on the number of free lookups which can be performed each day. We can overcome this limitation by combining the sources and with that, be able to perform a far greater number of lookups per 24 hour period.
Using multiple sources ETL can geocode addresses to maximum effect without failing due to source limitations. By caching the results, ETL can further avoid hitting the source limit.
An algorithm will rank the sources according to their request limits and each will be queried according to this ranking. For example, source A has a limit of 10 and source B has a limit of 100 - source A will be queried once in every 10 requests.
A scorecard of the requests will be maintained as part of the cache to ensure persistence between restarts.
Sources will not be hard-coded and instead can be added to the module as required.
Proposed Sources:
-----------------
Google:
- CPAN: [login to view URL]~miyagawa/Geo-Coder-Google-0.06/lib/Geo/Coder/[login to view URL]
- Docs: [login to view URL]
- Max: 15k (returns 620 when the limit is reached)
Multimap:
- CPAN: [login to view URL]~gray/Geo-Coder-Multimap-0.01/lib/Geo/Coder/[login to view URL]
- Docs: [login to view URL]
- Max: 50k
Yahoo:
- CPAN: [login to view URL]~abh/Geo-Coder-Yahoo-0.44/lib/Geo/Coder/[login to view URL]
- Docs: [login to view URL]
- Max: 5k (returns xml with a message of 'limit exceeded')
Bing:
- CPAN: [login to view URL]~gray/Geo-Coder-Bing-0.05/lib/Geo/Coder/[login to view URL]
- Docs: [login to view URL]
- Max: 30k
Code:
-----
A cpan contribution under the namespace 'Geo::Coder::Multiple'.
The module will use existing cpan modules for geocoding (eg: Geo::Coder::Google).
The cache for the code will not be an exact copy of the response from the source. Instead it will be reformatted to include only relevant information (lat, long, source name, etc). This will allow for easier cache retrieval/deletion.
Tests:
------
Contained within the cpan module.