Web scraping with jsdom
December 14, 2023Web scraping means extracting data from websites. This post covers extracting data from the page's HTML when data is stored in JavaScript variable or stringified JSON.
The scraping prerequisite is retrieving an HTML page via an HTTP client.
Examples
The example below moves data into a global variable, executes the page scripts and accesses the data from the global variable.
import jsdom from 'jsdom';fetch(URL).then((res) => res.text()).then((response) => {const dataVariable = 'someVariable.someField';const html = response.replace(dataVariable, `var data=${dataVariable}`);const dom = new jsdom.JSDOM(html, {runScripts: 'dangerously',virtualConsole: new jsdom.VirtualConsole()});console.log('data', dom?.window?.data);});
The example below runs the page scripts, and access stringified JSON data.
import jsdom from 'jsdom';fetch(URL).then((res) => res.text()).then((response) => {const dom = new jsdom.JSDOM(response, {runScripts: 'dangerously',virtualConsole: new jsdom.VirtualConsole()});const data = dom?.window?.document?.getElementById('someId')?.value;console.log('data', JSON.parse(data));});
Disclaimer
Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.
Course
Build your SaaS in 2 weeks - Start Now