Job Description

Surgogroup is in search of a software engineer to head up our efforts to systematically retrieve data from potentially hundreds of web sources and maintain the entire process from building and testing the scripts to storing the data and scheduling the processes to run automatically. You will lead a small team of developers who will help build and troubleshoot the web scraping scripts. You will be responsible for organizing the overall process of web scraping and collaborate with Surgo’s data team to design the databases that will store the data. Data quality and reliability will be the top priority for this role. As such, being a proactive problem-solver will be a crucial job requirement.

Responsibilities

  • Web Scraping: develop web scrapers to systematically retrieve data from hundreds of different web sites on demand and on specified schedules, manage a team of developers who will help write and test scripts, pro-actively monitor each scraper for any adjustments that need to be made, ensure data quality by performing a number of data tests before committing data to the database
  • Data Storage: design the databases or data warehouses necessary for reliably storing and extracting data for further analysis, design a system that allows data to be checked for errors prior to committing to the final database
  • Monitoring/Maintenance: consistently monitor data quality by regularly reporting on statistical aggregations of each data pull, design an automatic alerting system for instances when a particular data pull results in poor data quality, maintain git repositories for all scrapers

Skillsets

Required: bachelor’s degree in computer science or other technical, engineering field, proven experience in web scraper development, deep knowledge of SQL and database management techniques, strong Python skills

Preferred: several years experience doing web scraping in a professional capacity, experience managing a team of developers, experience with several types of database technologies (MySQL, noSQL, relational, analytical, etc), experience with AWS and/or Google Cloud Platform, experience with R, understanding of web and mobile-based analytics, understanding of statistical analysis and data science