Participation is via Zoom, please use the link here.
This interactive workshop will cover how to use the Helena programming-by-demonstration tool (http://helena-lang.org/) to collect data from the web. Helena is a recent research project out of UW and UC Berkeley, focused on automating web scraping specifically from webpages with predictable structures — e.g., a list of movies on IMDB plus the list of actors for each movie, or a list of Craigslist posts plus the details from the post-specific webpages.
We will demonstrate how to use Helena to collect datasets that are spread over hundreds or thousands of webpages. The tool allows users to program a web scraper in a browser, by clicking on target data. Then, based on the demonstration of how to find the sample data, Helena writes the program for collecting the whole dataset from webpages that have similar structures. This makes Helena a relatively low-code option for web data collection.
Helena is a Chrome extension, so please come with the Chrome browser installed. We’ll install the scraping tool during the workshop.