History

Tyler Perkins c950b38b26 Add base scraper		2023-04-25 22:42:02 -04:00
..
LICENSE	Add base scraper	2023-04-25 22:42:02 -04:00
package.json	Add base scraper	2023-04-25 22:42:02 -04:00
README.md	Add base scraper	2023-04-25 22:42:02 -04:00

README.md

WebDriver BiDi for Chromium

CI status

This is an implementation of the WebDriver BiDi protocol with some extensions (BiDi+) for Chromium, implemented as a JavaScript layer translating between BiDi and CDP, running inside a Chrome tab.

Current status can be checked at WPT WebDriver BiDi status.

BiDi+

"BiDi+" is an extension of the WebDriver BiDi protocol. In addition to WebDriver BiDi it has:

Command `cdp.sendCommand`

CdpSendCommandCommand = {
  method: "cdp.sendCommand",
  params: ScriptEvaluateParameters,
}

CdpSendCommandParameters = {
   cdpMethod: text,
   cdpParams: any,
   cdpSession?: text,
}

CdpSendCommandResult = {
   result: any,
   cdpSession: text,
}

The command runs the described CDP command and returns result.

Command `cdp.getSession`

CdpGetSessionCommand = {
   method: "cdp.sendCommand",
   params: ScriptEvaluateParameters,
}

CdpGetSessionParameters = {
   context: BrowsingContext,
}

CdpGetSessionResult = {
   cdpSession: text,
}

The command returns the default CDP session for the selected browsing context.

Event `cdp.eventReceived`

CdpEventReceivedEvent = {
   method: "cdp.eventReceived",
   params: ScriptEvaluateParameters,
}

CdpEventReceivedParameters = {
   cdpMethod: text,
   cdpParams: any,
   cdpSession: string,
}

The event contains a CDP event.

Field `channel`

Each command can be extended with a channel:

Command = {
   id: js-uint,
   channel?: text,
   CommandData,
   Extensible,
}

If provided and non-empty string, the very same channel is added to the response:

CommandResponse = {
   id: js-uint,
   channel?: text,
   result: ResultData,
   Extensible,
}

ErrorResponse = {
  id: js-uint / null,
  channel?: text,
  error: ErrorCode,
  message: text,
  ?stacktrace: text,
  Extensible
}

When client uses commands session.subscribe and session.unsubscribe with channel, the subscriptions are handled per channel, and the corresponding channel filed is added to the event message:

Event = {
  channel?: text,
  EventData,
  Extensible,
}

Dev Setup

`npm`

This is a Node.js project, so install dependencies as usual:

npm install

pre-commit.com integration

Refer to the documentation at .pre-commit-config.yaml.

Starting the Server

This will run the server on port 8080:

npm run server

Use the PORT= environment variable or --port= argument to run it on another port:

PORT=8081 npm run server
npm run server -- --port=8081

Use the DEBUG environment variable to see debug info:

DEBUG=* npm run server

Use the CLI argument --headless=false to run browser in headful mode:

npm run server -- --headless=false

Use the CHANNEL=... environment variable or --channel=... argument with one of the following values to run the specific Chrome channel: stable, beta, canary, dev.

The requested Chrome version should be installed.

CHANNEL=dev npm run server
npm run server -- --channel=dev

Use the CLI argument --verbose to have CDP events printed to the console. Note: you have to enable debugging output bidiMapper:mapperDebug:* as well.

DEBUG=bidiMapper:mapperDebug:* npm run server -- --verbose

DEBUG=* npm run server -- --verbose

Starting on Linux and Mac

TODO: verify if it works on Windows.

You can also run the server by using script ./runBiDiServer.sh. It will write output to the file log.txt:

./runBiDiServer.sh --port=8081 --headless=false

Running

Unit tests

Running:

npm test

E2E tests

The E2E tests are written using Python, in order to learn how to eventually do this in web-platform-tests.

Installation

Python 3.6+ and some dependencies are required:

python3 -m pip install --user -r tests/requirements.txt

Running

The E2E tests require BiDi server running on the same host. By default, tests try to connect to the port 8080. The server can be run from the project root:

npm run e2e

Use the PORT environment variable to connect to another port:

PORT=8081 npm run e2e

Examples

Refer to examples/README.md.

WPT (Web Platform Tests)

WPT is added as a git submodule. To get run WPT tests:

Check out and setup WPT

1. Check out WPT

git submodule update --init

2. Go to the WPT folder

cd wpt

3. Set up virtualenv

Follow the System Setup instructions.

4. Setup `hosts` file

Follow the hosts File Setup instructions.

4.a On Linux, macOS or other UNIX-like system

./wpt make-hosts-file | sudo tee -a /etc/hosts

4.b On Windows

This must be run in a PowerShell session with Administrator privileges:

python wpt make-hosts-file | Out-File $env:SystemRoot\System32\drivers\etc\hosts -Encoding ascii -Append

If you are behind a proxy, you also need to make sure the domains above are excluded from your proxy lookups.

5. Set `WPT_BROWSER_PATH`

Set the WPT_BROWSER_PATH environment variable to a Chrome, Edge or Chromium binary to launch. For example, on macOS:

# Chrome
export WPT_BROWSER_PATH="/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary"
export WPT_BROWSER_PATH="/Applications/Google Chrome Dev.app/Contents/MacOS/Google Chrome Dev"
export WPT_BROWSER_PATH="/Applications/Google Chrome Beta.app/Contents/MacOS/Google Chrome Beta"
export WPT_BROWSER_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
export WPT_BROWSER_PATH="/Applications/Chromium.app/Contents/MacOS/Chromium"

# Edge
export WPT_BROWSER_PATH="/Applications/Microsoft Edge Canary.app/Contents/MacOS/Microsoft Edge Canary"
export WPT_BROWSER_PATH="/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge"

Run WPT tests

1. Make sure you have Chrome Dev installed

https://www.google.com/chrome/dev/

2. Build Chromedriver BiDi

Oneshot:

npm run build

Continuously:

npm run watch

3. Run

./wpt/wpt run \
  --webdriver-binary runBiDiServer.sh \
  --binary "$WPT_BROWSER_PATH" \
  --manifest wpt/MANIFEST.json \
  --metadata wpt-metadata/mapper/headless \
  chromium \
  webdriver/tests/bidi/

Update WPT expectations if needed

1. Run WPT tests with custom `log-wptreport`:

./wpt/wpt run \
  --webdriver-binary runBiDiServer.sh \
  --binary "$WPT_BROWSER_PATH" \
  --manifest wpt/MANIFEST.json \
  --metadata wpt-metadata/mapper/headless \
  --log-wptreport wptreport.json \
  chromium \
  webdriver/tests/bidi/

2. Update expectations based on the previous test run:

./wpt/wpt update-expectations \
  --product chromium \
  --manifest wpt/MANIFEST.json \
  --metadata wpt-metadata/mapper/headless \
  wptreport.json

How does it work?

The architecture is described in the WebDriver BiDi in Chrome Context implementation plan .

There are 2 main modules:

backend WS server in src. It runs webSocket server, and for each ws connection runs an instance of browser with BiDi Mapper.
front-end BiDi Mapper in src/bidiMapper. Gets BiDi commands from the backend, and map them to CDP commands.

Contributing

The BiDi commands are processed in the src/bidiMapper/commandProcessor.ts. To add a new command, add it to _processCommand, write and call processor for it.

Publish new `npm` release

Open a PR bumping the chromium-bidi version number in package.json for review:
```
npm version patch -m 'Release v%s' --no-git-tag-version
```
Instead of patch, use minor or major as needed.
After the PR is reviewed, create a GitHub release specifying the tag name matching the bumped version. Our CI then automatically publishes the new release to npm based on the tag name.

README.md

WebDriver BiDi for Chromium

CI status

BiDi+

Command cdp.sendCommand

Command cdp.getSession

Event cdp.eventReceived

Field channel

Dev Setup

npm

pre-commit.com integration

Starting the Server

Starting on Linux and Mac

Running

Unit tests

E2E tests

Installation

Running

Examples

WPT (Web Platform Tests)

Check out and setup WPT

1. Check out WPT

2. Go to the WPT folder

3. Set up virtualenv

4. Setup hosts file

4.a On Linux, macOS or other UNIX-like system

4.b On Windows

5. Set WPT_BROWSER_PATH

Run WPT tests

1. Make sure you have Chrome Dev installed

2. Build Chromedriver BiDi

3. Run

Update WPT expectations if needed

1. Run WPT tests with custom log-wptreport:

2. Update expectations based on the previous test run:

How does it work?

Contributing

Publish new npm release

Command `cdp.sendCommand`

Command `cdp.getSession`

Event `cdp.eventReceived`

Field `channel`

`npm`

4. Setup `hosts` file

5. Set `WPT_BROWSER_PATH`

1. Run WPT tests with custom `log-wptreport`:

Publish new `npm` release