Željko Šević | Node.js Developer

Web scraping with cheerio

January 19, 2024

Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.

create a scraper object with load method by passing HTML content as an argument
- set decodeEntities option to false to preserve encoded characters (like &) in their original form
```
const $ = load('<div></div>', { decodeEntities: false });
```
find DOM elements by using CSS-like selectors
```
const items = $('.item');
```

iterate through found elements using each method

items.each((index, element) => {
  // ...
});

Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.

import jsdom from 'jsdom';

fetch(URL)

.then((res) => res.text())

.then((response) => {

const dataVariable = 'someVariable.someField';

const html = response.replace(dataVariable, `var data=${dataVariable}`);

const dom = new jsdom.JSDOM(html, {

runScripts: 'dangerously',

virtualConsole: new jsdom.VirtualConsole()

});

console.log('data', dom?.window?.data);

});

import jsdom from 'jsdom';

fetch(URL)

.then((res) => res.text())

.then((response) => {

const dom = new jsdom.JSDOM(response, {

runScripts: 'dangerously',

virtualConsole: new jsdom.VirtualConsole()

});

const data = dom?.window?.document?.getElementById('someId')?.value;

console.log('data', JSON.parse(data));

});