homecourse
 
   
🔍

Web scraping with cheerio

January 19, 2024

Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.

Prerequisites

  • cheerio package is installed

  • HTML page is retrieved via an HTTP client

Usage

  • create a scraper object with load method by passing HTML content as an argument

    • set decodeEntities option to false to preserve encoded characters (like &) in their original form
    const $ = load('<div><!-- HTML content --></div>', { decodeEntities: false });
  • find DOM elements by using CSS-like selectors

    const items = $('.item');
  • iterate through found elements using each method

    items.each((index, element) => {
    // ...
    });
  • access element content using specific methods

    • text - $(element).text()

    • HTML - $(element).html()

    • attributes

      • all - $(element).attr()
      • specific one - $(element).attr('href')
    • child elements

      • first - $(element).first()
      • last - $(element).last()
      • all - $(element).children()
      • specific one - $(element).find('a')
    • siblings

      • previous - $(element).prev()
      • next - $(element).next()

Disclaimer

Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.

Course

Build your SaaS in 2 weeks - Start Now