Directory Site Scraper

yapan Thriftynick
Görüntü bulunamadı

The client needed a directory site of financial advisors scraped for all names, companies, suburbs, and emails. The site had emails saved as .jpg image files that needed to be converted back to text using OCR (Optical Character Resolution). I wrote custom object-oriented software in PHP to scrape the site using cURL. The data was targeted using regex. The email image files were downloaded and passed into Tesseract OCR to convert them back to text. Once all target data was gathered for a profile, it was added as a new record into a local mySQL DB. After the scrape was finished, I cleaned up the data in mySQL before exporting it to an excel spreadsheet and formatted nicely. This was my first Webscraping project and I very much enjoyed it. I've since started learning a few tools to make this type of job much easier (Goutte, Laravel Dusk, Guzzle, etc.)

image of username Thriftynick Flag of United States Gobles, United States

Hakkımda

I have a solid understanding of the foundations of Computer Science and I have strong problem solving skills. Check my portfolio for examples of the wide range of projects I've completed. I'm looking forward to getting started on your project! Specializations: -Web-applications in Laravel -Wordpress plugins -Database design, implementation, optimization -Linux sysadmin -Web stack Experience: 6+ years OOP (including college and hobby-level game development) 4+ years relational database design & implementation (Entity-relationship diagrams, normalization, optimization) 3+ years web stack (HTML5, CSS3, JS, PHP, Vue.js, Laravel, WordPress) 2+ years Linux Sysadmin

$60 USD/sa

21 yorum
5.6

Etiketler