Posted on (↻ ).
Table of Contents
Repository
Links
- https://pptr.dev
- https://github.com/puppeteer/puppeteer
- https://npmjs.com/package/puppeteer
- https://github.com/puppeteer/puppeteer/blob/master/docs/api.md
Videos
- Web Scraping With NodeJS and Puppeteer
- The power of Headless Chrome and browser automation (Google I/O ‘18)
- Scraping Reddit with Puppeteer & NodeJs
Examples
Headless
const browser = await puppeteer.launch({
headless: false
})
Log in
await page.focus('#username')
await page.keyboard.type(config.username)
await page.focus('#password')
await page.keyboard.type(config.password)
await page.click('#log-in')
Evaluate
Note: The return value must be serializable.
const result = await page.evaluate(() => {
const elements = document.querySelectorAll('.item')
const items = Array.from(elements).map((element) => {
const property = {
url: element.href,
description: element.textContent
}
return property
})
return items
})
Shortcut to document.querySelectorAll
:
const result = await page.$$eval('.item', (elements) => {
const items = elements.map((element) => {
const property = {
url: element.href,
description: element.textContent
}
return property
})
return items
})
See also:
$eval
: Same as$$eval
, but withdocument.querySelector
.
Intermediate results:
const elements = await page.$$('.item')
const items = []
for (const element of elements) {
const url = await element.evaluate((element) => element.href)
const description = await element.evaluate((element) => element.textContent)
const property = { url, description }
items.push(property)
}
See also:
$
: Same as$$
, but withdocument.querySelector
.
Waiting for something
await page.waitForSelector('#something-taking-time-to-appear')
await page.waitFor(() => {
return document.querySelector('#something') !== null
})