Octoparse: scrape the web into Data

I wanted to compare zero sugar syrup pricing from several websites. However, typing or copy-pasting data would take a lot of time. So I asked Chat-GPT how to get Data from web-pages into a Google sheet. After several Google import types needing structured webpages, the scrapers were mentioned. So I tried out Octoparse.

Octoparse has a free tier that limits to just 10 tasks that need to be run from your computer and limits exports to just 10.000 records. Enough for what I’m looking for.

So I visited the supermarket website, and performed a product search for zero sugar syrups. Then created a new Octoparse tast on my local install, pasting the URL that presented me with the search results.

In a couple of minutes, I was able to scrape the webpages data, perform some cleaning, and gather the data in a table to my liking. Now I had the data, I wanted to import it into my Google sheet. That was possible! It needed a lot of steps into Google Web services and API’s, but the help was excellent and I was able to create an export connection. This enabled me to push the gathered data into the second sheet of my Google sheet. I used a second sheet because I want te review the data before it is added to my main table on sheet 1.

Example: Scraping AH products into usable Data

You can perform actions on any field. For instance these actions are taken on the Inhoud (content) field:

    1. Replace “l” with “”. So remove the small letter L (for liter)

    1. Remove all leading and training spaces

For the Prijs (Price) field, I only needed to replace the dots with comma’s to make it a valid value.

There are loads more possibilities in Octoparse, like for instance scan through the found objects and run a subtask on them.

Leave a Reply

Your email address will not be published. Required fields are marked *