Nike Agent Demo

built the Nike agent hi welcome today I'm going to be going

over how to scrape data on nike.com so first we're going to load up nike.com as

our starting URL

uh you can press the navigating web browser option first to navigate manually to

some of the shoes that we want

let's go to men's lifestyle

there's under this category you can see that we have 234 products here currently

if I scroll down the page you can see that more products gets

loaded as I scroll down the page so let's check out our activity module to see what is going on behind the scenes

active activity module in the bottom right corner and we can try to see if

any data is being loaded asynchronously I'm gonna scroll down

foreign that there's data being returned we're

just going to click on that see if it's the data that we actually

want

and yep it's the call that Returns the products

items title Nike Air Force price employee price okay so this is the call

that we want to make in order to return all of the products we can click on the

URL to see what URL and headers we need to return this data

under the web request for CSV and simple inputs we can press the test button to see if this call actually works

and it Returns the data that we actually need then right here we can just copy this

call to our clipboard exit out of this

create a new navigate URL command

and paste it in let's edit under action discover action

browser choose Json parser press save and navigate

here we can scroll down until we find a list of products

seems to be under items so we can left click to select one of the items

hold down shift press left click again and you can see

that the collection selection count is now 60. so we can click once again to capture the entire

list or table and you and you can see that content Grabber has automatically generated all

of the fields within this category and wall content card raw price local price

some of these don't seem to be necessary but for this demonstration we'll just

leave everything in there for now

so we want to navigate directly to the product itself because this link this

URL is only navigating to the listings page so you can scroll down to PDP URL which

we already captured PVP URL

say that this is the link to the product page so we can scroll down press add command

add a navigator URL command unselect use default input

for the data provider select capture data and under data columns so like PDP

URL press save edit again action discover

action and for browser we're gonna open up a new browser but this time in

dynamically just to see what happens initially

as you can see product pager gets loaded along with available sizes as well

and along with some other categories that we might need such as

color and style so we'll capture that on this page right now

excited out

left click once left click twice capture text and now we can rename this to be color

showing light white we can also

highlight that press generate transformation and you can see that content Grabber automatically generates

a regular expression in order to parse out this information also click on Style

left click again capture text

we name this style and perform the same action

generate transformation press save so we also want to capture the available

sizes as well but since it's this is using a dynamic browser it's not going to be efficient so we would try to

optimize this agent so move back to the second tab we'll navigate URL and their action could rename this to

navigate to product actually browser and using HTML parser instead

per se and navigate here using an HTML parset you can see

that the sizes here aren't actually available on this page but we're going to create a new capture

command uh left click the I command button at a

web content and change the XPath to HTML

press save press edit come back here and to extract HTML now I'm going to press

transformation Scripts to see the entire HTML that's being loaded onto the page

it's quite a lot let's go through

but there is one category there's one script that actually runs

to give us the availability of the sizes

no it's not the script

find the script right now but script

was called uh windows

wait where did you cut off oh so the name of the script was Windows

Dot initial redox state so then we can just change our X path directly to it

contains HTML window Dot

I'm going to show redox State press apply

you can see that there's one selection so in this script actually exists and press save edit configure

than other transformation script you can see that the script is loading

all of the availability of the shoe sizes

it will be easier to visualize this in a Json parser in a second so it's going

to close this out add a navigate link

oh how to navigate link oh could use the navigate link

use XPath copy the six path over

okay now under configure action URL

so default URL

okay actually we're going to use the navgate URL command and delete that real quick

not get URL command and select data provider or capture data

and web content I should probably rename that press save

for now rename this web content to script

right click don't export data because we don't actually want this in our export data then navigate

can rename this and navigate to sizes

press edit in the transformation script here

you can see that the data here is actually turned in a form of Json

so we just need to parse this data out correctly and then parse it into Json so

in our regular expression we could return Windows Dot

initial redox state equals

test transformation and you can see that we have the opening brackets

but the data here also has a closing bracket and the script which we also want

to remove so just type that in explicit link

press test transformation and you can see that the data is now returned fully in Json so instead of

just returning this data we're going to add Json colon dash dash

test transmission permission save now under actions

unselect discover action browser new and now load this up in a Json parser

press save now execute

now in this data we can see all of the sizes that are available

you can scroll down until we find the data that we're looking for

products product ID

so this three one five one two three 001

this seems to be the style of the shoe and then different shoes would have

different styles as well so then this shoe here three one five one two three three one

five one two three zero zero one I'm just gonna

check out the shoe and you see that it's the black one so I'm just going to create a list

right click add Command Web element list and here I'm going to

navigate directly to the products because it seems like the first product

has three one five one two three zero zero one as a as a node

in the second product has three one five one two three one one as a node so then that corresponds to the white shoe so

I'm just going to create an x-path that captures the white shoe because that was what

the previous page was capturing so uh this web element let's press edit

let's try to get a next PATH it was product

let me scroll back up again since this next node changes dynamically

which is gonna see if I can find anything but it doesn't have any products oh

product yes sorry about that products now we need to get the logo name

of the next node and then we need to say that it is equal to

find data of the data that we got previously so in this navigate products

our style press edit configure it was this so I'm just going to copy

this number oh my copy that number over but style is the name of the note that we really previously had

find Theta Style

slash you want

SKU yep skus skus

press apply

can use

yep so I think this is not showing up any selection count right now because it's

doing it dynamically in the end when the agent is running but we'll leave this in for now and then explain what you just

said about the selection cap construction camps oh yep in the bottom

left corner there's assume they know what that is in the bottom left corner there's a selection count and that's usually how

many nodes are being selected currently so right now you can see that the section count is zero but I think that's

because this fine data Style is actually runs dynamically so that it

needs to capture style from the previous one I

mean from the parent command list style from here and that's why it's not

showing any selection count right now because it's not doing it in design time but when we run it

it should work so I'm just going to rename this list to list of skus

and we're just going to capture from here

SKU ID yep skuid

which changed expand directly SKU ID

and rename it to skuid

and probably the sizes as well so Nike size

you can't do that as well press add the name to size

changed expat to Nike size press a

this will give us SKU of all of the shoes and the sizes but it will now give us the availability yet because that is

actually at the bottom so I'm gonna continue scrolling down

in this section you can see that there's available skus so these are skus that

are actually available like in stock so then if the SKU that we captured

before is not in this list then it's not available

as a default value I'm going to set this to not in stock

press save I'm going to add a web capture command

also name it availability

okay to XPath and for this XPath we would want to

find if skus actually exists in this SKU list or not

so we need to go back to our root because this is within a different

parent node than the list of skus so we'll go to ancestor

go back to the default root command so root then we can start off from

available skus skus

slash skuid that's what we want s-q-u-i-d

equals what we found before so find oops no quotes

find data of our SKU ID that we captured before

SKU ID press save

and then in our transformation script

when you want to return the fact that it is in stock

so if the SKU exists it's going to execute this transformation script and turn in stock otherwise it's going to be

left blank if it's left blank it's going to execute this next availability command

which will override the previous one if it's empty well not in stock

and that's how we will get the availability for the issues all right I will save this agent

store liking press save

navigate back to see if I'm missing anything so right now currently it only does one

category the one that I navigated directly before

so then let's just make sure that it goes through all the categories so then I'm gonna press edit for my navkit URL

I'm going to rename this navigate to listings

press edit uh let's not use the default one just press transformation script

see what we can change in the call to navigate to the other ones as well so you can see HTP Nike

so that the grid wall path is men's lifestyle shoes followed by

this number followed by PN page that's probably the

second page we're on so I'm going to change that to one for the first page and then we can set up a pagination as

well and then the prefer

which was the previous page where all it gets that page from how I change that as well

but okay so then it seems like what we need is to capture the great wall pack

from this page from the parent page itself so then let's just go back to

nike.com

so from here we want the XPath for

all the categories like men's running training in gym basketball Jordans women's and kids

these three categories so then let's press tools uh browser tools

from here we can try to get the XPath of our navigation to these links

navigation menu

clicks on our first navigation and you can see that this enough corresponds to the entire link

Ally so in correspond to new releases and customize we don't want those

okay so let's come back to our agent let's add uh

web element list it's called list of products

what's up navigations Maybe doesn't really matter what you name it

XPath edit we want to start with the x-path of the

first link so div class

Li class primary respond to XPath

at class equals

oh you can press apply so you guys selects all five in our bottom left corner it says

section count five so so selecting all five of these but we don't actually want all five of them we only want

the middle three of them so then we can just specify that uh

position is not equal to one position as a function so position not

equal to one and position [Music] not equal to five

now you can see that selection count is now three and those two have been unselected so let's go further down

just did that navigation menu zero

another one here is the ID that navigation menu.1

so then this divided actually changes for each different Navigation menu so

I'm just gonna select the

contains at ID equals the application menu

it's really long to be another day

so it seems to follow another day

and this one here leads to new men's items so it's two dips in

followed by two division press apply

oh

we don't

invalid oops I think I Miss added something

foreign

[Music]

that's not supposed to be an equal sign

I can see this box is all of the ones that are being selected so let's now get

on to Second div

and in our main category we don't actually want

first two categories as well when we want two clothing and accessories and equipment to begin this shopping

collection is basically like a subset of all of these so I think this is going to

be duplicates same with the first one as well so we're just going to exclude those two

come back to this navigations uh position not equal to one and position

not equal to four that's going to eliminate those two

and now all we want is this link pretty much that has the new items and this code in

the end so then let's go a at class equals nav menu item

for supply and like I said there's 91 selections on the bottom which is all the categories

but there's actually some categories that we don't want as well like under men's all shoes and the

sneaker launch Calendar because this page seems to be of a different page of

items that aren't actually available yet so we're going to exclude those two as well

press edit not

contains

all and not contains

launch okay press save and I can see our

selection count now is down to 74. that's exactly what we need so now we can add a webcatcher command

come over here to configure it instead of extracting

that will extract the URL press save rename this to

navigation URL and don't export data we don't actually

want that in the end no we could edit and under transformation

script we don't want anything that's behind the PW

so that emotions PW turn

and this is what we want so press save now for navigator listings move it

within the list of navigation command press edit press

data provider capture data navigation URL

and that's what we wanted from before but I'm just going to copy

what we had from here copy this

come over edit how to command notification URL

and paste that with thin hair so that we can see we need men's

lifestyle shoes and this number after the group wall path so then delete the

stuff behind the grid wall path type in our assign one that replaces it

with the current selection and also for the referrer as well we can eliminate that put it in Dallas

time one and now we can return this

Let's test transmission and press save save and now navigate to listings

now this navigation will actually navigate through all of the categories listed above

and in our list we'll capture everything that is required on the listing page

then it will navigate to the products well this and this thing is actually only the first page as well we're

actually in the pagination so let's come back to navigate to listings

transformation and let's make it so that it navigates the first page first

under PN pn1 the transformation save

so now this will navigate to the first page so we're watching to set up a

patronation second page and Nike is pretty nice they already give us the patient for the next page

here directly on their next page data services so you can just click on that

oh press add

follow-up navigation pagination and then we can set that

XPath directly to this next page data services click on that press save

under action URL

you can see that the text here now becomes the second page but we still need the information in the front

which is store.nikey

nugget products here

it's text HTML services so press transformation script

and we just need to return http.s store.nikey.com

in front of it press test transmission and this should lead to the second page press save save execute

now third page

now we need to see if our navigation actually ends or not

afterwards this is men's lifestyle shoes

you can see after one of the pages it just stops so there's nothing on that

page so navigation for the pagination will end by itself that's great that's exactly what we need

and navigate the products we'll go to the sizes and now inside this all right so it seems like this station is ready

for debugging so go to save the agent press the bug and start

I'm going to slow down the debugging process up here there's a debug speed which can control the speed

of the department

and increase this part actually sales Channel 2 uh that's not always

going to be available for all the commands so we're going to select optional command simple sales Channel 4 optional command

error processing script text was longer than 4000 characters right so we're going to press stop debugging

so we can see that the script here is over four thousand characters so if I'm compress edit the script properties

scroll down and under data type instead of short text and convert it to long text

to handle over 4000 characters move similar pieces and press log start

again

I think the command availability is selecting well that's not exist optional command because the availability not

might not always be available

NikeiD select optional command

seems like for some products none of these are actually existing

and for these you can press optional command for all of them

I think this is happening because some of the shoes directly on the site

are not actually for sale yet so it's just an ad there

that's why none of these actually exist

and so that we can't navigate to that product if it doesn't exist so I'm just going to ignore that error for now

I can show you directly on the Nike page if I start I'm going to pause debugging

go directly to the Nike page pause debugging go directly to my key

page since you're going to start the plugin first all right come directly to Nike

page under men's lifestyle

you can see that it's this ad over here that's causing all of none

of these fields to be found so that it's folder shut the data for these three shoes but then for this one it's not

actually the issue just an ad that's why it's causing those problems there but now we can view our export data

and we'll see that everything has been extracted including the availability of the shoes

availability for assume not installed in stock yep

a lot of these fields are actually not required so we could just delete a lot of these fields as well and that is how

we built this Nike agent initially