My new project requires a PHP 'automator' script that takes a list of URLs in a mySQL DB, and attempts to automatically subscribe to any available newsletter or email sign ups. We are attempting to collect data on incoming emails from popular retailers, and are looking for a way to automatically attempt to subscribe to these retailer's newsletter or promotional emails in a quick and automated way.
Any tips or ideas on the best approach? We essentially have a MYSQL database with various URLS and want to subscribe to all newsletters as quickly as possible. Thanks for any advice!
The scraper that handles the email subscriptions can be run automatically at a predetermined interval, or triggered manually.
Once the script is run, it attempts to perform the following tasks on each URL in the database:
1) SIMPLE SIGNUP: Scan the HTML code on the page for a text input box that has a label matching the following: ‘Newsletter, E-mail, email, signup, sign up, sign-up, subscribe, etc..'. An email address is inserted and submitted. If additional fields are required, the scraper will supply dummy info for name, phone number, address, gender, etc..
2) PROFILE SIGNUP: If a user account is required to obtain the newsletter subscription, the scraper will attempt to supply a username, email address and passwords to create the account, then attempt to login with that information.
3) LINK: The code will scan the page for any hyperlinks pointing to any page that mentions keywords like ‘newsletter, e-mail list, mailing list, email sign-up'.. etc... The code will click each link, looking for the newsletter input box. If found, go to step 1. If not found, the code will continue navigating throught the site 3 levels deep, looking for hyperlinks or ‘newsletter' text input boxes.
4) CAPTCHA REQUIRED: If a captcha or other verification code is required, the company will be marked in the database for requiring manual sign up.
We would like this to be written in PHP if possible, linked to a mySQL database, that marks when a URL in the database has been processed.