homeresume
 
   
🔍

Browser automation with Puppeteer

August 26, 2023

Puppeteer is a headless browser for automating browser tasks. Here's the list of some of the features:

  • Turn off headless mode

    const browser = await puppeteer.launch({
    headless: false
    // ...
    });
  • Resize the viewport to the window size

    const browser = await puppeteer.launch({
    // ...
    defaultViewport: null
    });
  • Emulate screen how it's shown to the user via the emulateMediaType method

    await page.emulateMediaType('screen');
  • Save the page as a PDF file with a specified path, format, scale factor, and page range

    await page.pdf({
    path: 'path.pdf',
    format: 'A3',
    scale: 1,
    pageRanges: '1-2',
    printBackground: true
    });
  • Use preexisting user's credentials to skip logging in to some websites. The user data directory is a parent of the Profile Path value from the chrome://version page.

    const browser = await puppeteer.launch({
    userDataDir: 'C:\\Users\\<USERNAME>\\AppData\\Local\\Google\\Chrome\\User Data',
    args: [],
    });
  • Use Chrome instance instead of Chromium by utilizing the Executable Path from the chrome://version URL. Close Chrome browser before running the script

    const browser = await puppeteer.launch({
    executablePath: puppeteer.executablePath('chrome'),
    // ...
    });
  • Get value based on evaluation in the browser page

    const shouldPaginate = await page.evaluate((param1, param2) => {
    // ...
    }, param1, param2);
  • Get HTML content from the specific element

    const html = await page.evaluate(
    () => document.querySelector('.field--text').outerHTML,
    );
  • Wait for a specific selector to be loaded. You can also provide a timeout in milliseconds

    await page.waitForSelector('.success', { timeout: 5000 });
  • Manipulate with a specific element and click on some of the elements

    await page.$eval('#header', async (headerElement) => {
    // ...
    headerElement
    .querySelectorAll('svg')
    .item(13)
    .parentNode.click();
    });
  • Extend execution of the $eval method

    const browser = await puppeteer.launch({
    // ...
    protocolTimeout: 0,
    });
  • Manipulate with multiple elements

    await page.$$eval('.some-class', async (elements) => {
    // ...
    });
  • Wait for navigation (e.g., form submitting) to be done

    await page.waitForNavigation({ waitUntil: 'networkidle0', timeout: 0 });
  • Trigger hover event on some of the elements

    await page.$eval('#header', async (headerElement) => {
    const hoverEvent = new MouseEvent('mouseover', {
    view: window,
    bubbles: true,
    cancelable: true
    });
    headerElement.dispatchEvent(hoverEvent);
    });
  • Expose a function in the browser and use it in $eval and $$eval callbacks (e.g., simulate typing using the window.type function)

    await page.exposeFunction('type', async (selector, text, options) => {
    await page.type(selector, text, options);
    });
    await page.$$eval('.some-class', async (elements) => {
    // ...
    window.type(selector, text, { delay: 0 });
    });
  • Press the Enter button after typing the input field value

    await page.type(selector, `${text}${String.fromCharCode(13)}`, options);
  • Remove the value from the input field before typing the new one

    await page.click(selector, { clickCount: 3 });
    await page.type(selector, text, options);
  • Expose a variable in the browser by passing it as the third argument for $eval and $$eval methods and use it in $eval and $$eval callbacks

    await page.$eval(
    '#element',
    async (element, customVariable) => {
    // ...
    },
    customVariable
    );
  • Mock response for the specific request

    await page.setRequestInterception(true);
    page.on('request', async function (request) {
    const url = request.url();
    if (url !== REDIRECTION_URL) {
    return request.continue();
    }
    await request.respond({
    contentType: 'text/html',
    status: 304,
    body: '<body></body>',
    });
    });
  • Intercept page redirections (via interceptor) and open them in new tabs rather than following them in the same tab

    await page.setRequestInterception(true);
    page.on('request', async function (request) {
    const url = request.url();
    if (url !== REDIRECTION_URL) {
    return request.continue();
    }
    await request.respond({
    contentType: 'text/html',
    status: 304,
    body: '<body></body>',
    });
    const newPage = await browser.newPage();
    await newPage.goto(url, { waitUntil: 'domcontentloaded', timeout: 0 });
    // ...
    await newPage.close();
    });
  • Intercept page response

    page.on('response', async (response) => {
    if (response.url() === RESPONSE_URL) {
    if (response.status() === 200) {
    // ...
    }
    // ...
    }
    });

Boilerplate

Here is the link to the boilerplate I use for the development.