How we learnt to stop worrying and love web scraping
Nicholas J. DeVito, Georgia C. Richards & Peter Inglesby
Career Column: For Nicholas DeVito, Georgia Richards and Peter Inglesby, custom webscrapers have driven their research — and their collaborations. In research, time and resources are precious. Automating common tasks, such as data collection, can make a project efficient and repeatable, leading in turn to increased productivity and output. You will end up with a shareble and reproducible method for data collection that can be verified, used and expanded on by others — in other words, a computationally reproducible data-collection workflow. In a current project, we are analysing coroners’ reports to help to prevent future deaths. It has required downloading more than 3,000 PDFs to search for opioid-related deaths, a huge data-collection task. In discussion with the larger team, we decided that this task was a good candidate for automation. With a few days of work, we were able to write a computer program that could quickly, efficiently and reproducibly collect all the PDFs and create a spreadsheet that documented each case.