Node.js + Selenium + PlayStation.Store (part 1 of 3)

I will write here about my experience with Node.js, Selenium web driver(Chrome) and PlayStation.Store site here as blogger platform where I used to post technical articles is running straight to hell 🙂

Why I did it? Did you ever tried to find the PS®Move game for PS4 online? As of today PlayStation.Store site search returns just ONE result and it is not a game:

If you Google around, you will find a bunch of PS3 games, and surprise MF! it is not compatible with PS4, so I basically ended up with my two PS®Move controllers useless.

But wait! Browsing around the PlayStation.Store I have noted that there are some games compatible with PS®Move, but there is no filter whatsoever and the pages load dynamically with JavaScript and the PS®Move could be “PS Move” or “PS®Move” or something else and placed on the page randomly. So what? Selenium WebDriver to the rescue!

I will write a 3 short articles:

  1. How to get all game URLs (this one)
  2. How to get game description
  3. How to look for PS®Move

How to get all game URLs?

Simple! 🙂 I have macOS and decided to use JavaScript with Selenium. I already had Homebrew package manager installed, so just have installed node.js with

brew install node

and downloaded Google Chrome Driver for Selenium and installed node.js libraries fro MySQL where I will be storing my data and selenium web driver:

npm install mysql
npm install selenium-webdriver

dropped the downloaded chromedriver binary to the same directory where I had mysql & selenium-webdriver node packages installed and we are ready to start playing around with the code:

var mysql = require('mysql');
var webdriver = require('selenium-webdriver');
 
var baseURL = 'https://store.playstation.com/#!/en-us/all-ps4-games/cid=STORE-MSF77008-PS4ALLGAMESCATEG/';
var totalPages = 55;

// CREATE TABLE game_urls ( id int(11) AUTO_INCREMENT, url varchar(256), PRIMARY KEY (id));
var connection = mysql.createConnection({
  host     : 'localhost',
  user     : 'tolik',
  password : 'i_love_psn',
  database : 'psn'
});

// a class="permalink" href="..."

for (var i=1; i<=totalPages; i++) {
  let url = baseURL + i;
  console.log('Processing: ', url);
  let browser = new webdriver.Builder().usingServer().withCapabilities({'browserName': 'chrome' }).build();
  browser.get(url);
  browser.wait(webdriver.until.elementLocated(webdriver.By.className('permalink')), totalPages * 600 * 1000).then(function(elm) {
    console.log('Game links found on page: ', i);
    browser.findElements(webdriver.By.className('permalink')).then(function(results){
      for (let j=0, n=results.length; j<n; j++) {
        results[j].getAttribute("href").then(function(value){
          console.log(value);
          var params = [value];
          var query = connection.query('INSERT INTO game_urls(url) VALUES(?)', params, function(err, result) {
            console.log(err);
            console.log(result);
            if (i==totalPages && j==(n-1)) connection.end();
          });          
        });
      }
      browser.quit();
    });
  });
}

The code is pretty straightforward – init packages, connect to the DB, prepare the URLs for each of the current 55 pages with PS4 games, load URLs one by one and wait until it renders the game grid, then grab the permalinks to the individual game pages and put it in the DB. End of story.

node list.js

You guessed it right, I have named the script about list.js 🙂

Test run loaded all current 1637 games from 55 pages for PS4:

Next is how to get the game description.

Опубліковано у other | Теґи: , , , , , . | Додати в закладки: постійне посилання на публікацію.

Залишити відповідь