foreign
[Music]
sorry I seem to be having trouble with
my sound
there we go
great
um we've got a bunch of people on the
webinar today thank you for joining us
um if you have been following our
webinar Series this quarter
um we have been covering the broader
topics as related to sequentum's Best in
Class web data extraction platform today
we're going to dive into one of our key
differentiators and something that
people don't always notice right away
about our Tech it's a it's a it's a
subtler point but it makes a huge
difference in your overall operation
and that is uh the fact that with
sequenton's CG Enterprise software you
can mix and match your protocols meaning
for one request you can launch a full
browser for the next request you can
launch
um a lower level parser which consumes
far fewer resources on your servers and
this is a really Critical differentiator
with our software
we built our software with a with a
custom version of Chrome inside our tool
rather than putting our tool inside a
browser and it's a key architectural
um just choice that we made and a
differentiator for us and now the the
problem that that people have when
they're running large scale scraping
operations is often that their server
infrastructure
requires a lot of memory a lot of
CPU and a lot of disk space
um and it's uh and then the servers are
expensive
um and so we wanted to architect our
solution to be as efficient as possible
when it comes to the those server
resources while maintaining the
incredible ease of use that our software
has for for people running large scale
operations now when you're dealing with
a website let's say a retail site
um there's a lot of scraping going on in
the retail sector
um pretty much not going to compete
um online unless you have real-time uh
competitive price scanners running so
retail sites have become Savvy about
this and they've added blocking which
often requires that you need a full
browser you need to launch a full
browser at some point in the beginning
of your scrape now if you if you think
about what it takes to launch a full
browser you're you've got something
that's orchestrating that browser you're
launching the full browser you're
monitoring whether that browser is
running there's all kinds of
attributes of that browser that you need
to set like cookies and headers there's
session and state there is a client-side
JavaScript engine there's a lot of
complexity in launching a full browser
and pulling down web pages inside that
browser right web pages that are You
Know Rich
content sites that they'll have
JavaScript code that has to get
downloaded compiled instantiated
rendered you know there's a lot going on
in in pulling that page down and we have
our own version of Chrome that we've put
inside our software to manage this
process and it works seamlessly now once
you launch that full browser and and
just put this in context if you're if
you're doing this on using open source
Technologies for example if you're if
you're trying to launch a full browser
using selenium
um
you're going to have to stand up a
selenium grid you're going to have to
define a lot of information about your
browser State you're going to send all
of that request over all that
information over the over a network and
then you're going to launch this browser
through selenium grid you're going to
have all kinds of
resources that are monitoring
um the the state of that browser in case
it crashes right browsers are notorious
for leaking memory web pages are
notorious for leaking memory
um you know there's problems with with
browsers crashing you have to have
monitors in place that are looking for
this
um and it's it's just an incredible
amount of uh memory and state that you
have to keep track of on your servers
now once you've launched that selenium
browser if you're using open source
then you it's pretty much diffic you
know it's just really hard to then take
all of that state and pass it to some
lower level protocol it's hard to do
that correctly it's hard to do that
nimbly
um and there's a lot of different
libraries that you have orchestrate to
do that there's a lot of overhead a lot
of coding
um it's going to be a senior engineer
that's putting all this together for you
and um there's just going to be a lot of
overhead and you're not going to be able
to run very many of these on your
servers it's going to be very expensive
operation to staff and to operate
so with content Grabber what we do is we
allow you to load a page in a full
browser it's extremely simple to
configure and then your next request can
easily be in a Json Plus or HTML parser
so you you immediately tear down that
browser it's completely seamless to the
engineer who's writing and maintaining
these agents they don't have to worry
about any orchestration overhead all the
monitoring of that browser is done
automatically in the software if it does
crash which you know I mean you know
just from manually browsing the web
sometimes your browser tab crashes on a
particular site the software notices
that it it restarts picks up where it
left off all of this is automated for
you so this is while this seems like a a
very detailed
um topic and a lower level point to make
about our software it's really a key
differentiator it's going to save
endless time
um labor time and effort it's going to
allow you to be much more efficient with
your server resources and it's going to
help get you that high quality data on a
schedule that that your business
analysts need so with that I'm going to
hand it over to our engineer zijang
who's going to demonstrate how to mix
and match your protocols when writing an
agent
thanks
all right see take it away
hi guys
hi guys
um so over here I have a Nike agent open
that mixes and matches protocols in
order to optimize the agent itself
so for the Nike agent firstly it loads
up um the entire
first page inside a dynamic browser so
if you ever want to change what type of
protocol you want to use just navigate
directly to your navigation link and
then under action there's a browser tab
that you can click on and then you can
specify what type of browser type that
you want to use for the agent so for the
first page we're going to use a dynamic
browser because on Nike sometimes there
is blocking and sometimes it requires
you to load up a full browser instead of
just like an HTML parser
so then afterwards we have this
navigation list of all products where we
grab all this navigation URL which then
we use for a Json call in order to
retrieve the data in Json so that we
press the execute button
you can see that uh
the information here is loaded in Json
but you could actually just also use a
under action browser use a default or
like a Dynam by default it defaults to
the its parent browser so then if the
parent browser is a dynamic one then by
default it will also be a dynamic one so
if I execute this directly you can see
that the layout will be different
it might be like harder for you to like
write a script in order to extract
information from here directly so then
what makes and so the Json parser makes
it so that it's easier to point and
click
on elements that you want to extract
so from here I can just like point and
click on this navigation I mean a
pagination element up here if I wanted
to but then afterwards um
from this data that's being returned you
can see that there is a list of products
here being returned along with some
additional Fields such as uh price sale
price and so forth and colors
from here I would also want to navigate
directly to the products page in order
to get some additional information that
was not actually returned from this page
like available sizes and so forth so
from here
um I could navigate directly to the
products Details page
just using uh this PDP URL that's being
returned here
now for this navigate to products detail
page I don't I wouldn't have to load
everything up directly in a dynamic
browser itself so from here under action
browser I'm actually loading this up in
an HTML parser because all the
information there
is there present on that page just you
in HTML so if I navigate directly to the
products page you can see that it loads
up the products page and then there's
some additional Fields here that we're
collecting
such as item group ID description image
link conditions and then on this page
itself there's a script that runs that
returns all of the available sizes
itself like if I come directly to this
page I'm just going to copy it and then
open up this page in Chrome
you can see here on the side that
there's some additional
images and sizes that will be available
like select sizes and so forth if I load
this up directly in HTML parser this
information does not actually get loaded
in but there's actually a script that
runs on their backend that actually
loads up this information so then I
could just execute that script directly
so then from here I can go under action
browser and then load up the script
directly into an HTML parser or just
press execute here
now you can see that uh there will be a
list of sizes available under
available skus
let me find that real quick
available skus so from here these are
the skus that are available on this page
that's KU I think we actually extracted
directly from
this navigate to products detail page
directly
and then afterwards we can basically
compare the SKU to the ones that we
extracted on the previous page to see
which one is available and which one is
not so back in the details page
sometimes this page will not load
successfully inside an HTML parser and
it will you'll get an access denied so
in this in this case it actually loaded
up successfully but then some additional
cases where an HTML parser does not work
you can add it in a if condition in
order to reload this page inside a
dynamic browser in order for that to be
in order for it to load successfully due
to like blob blocking
or something so then here I can just run
the agent real quick
okay this agent actually takes a while
to run so I'm just going to let this run
in the background and show you guys a
second agent that I actually built as
well uh so this is the Victoria's
Secrets agent
so then in this Victoria's Secrets agent
um
I'm loading up the first page inside a
dynamic browser and then afterwards I
can select some cookies that I'm
receiving on the site directly
that I will need for my request then um
I'm also collect I'm also collecting
this collection ID that's available
inside the dynamic browser I know it's
got the collection ID of this clearance
section
I will also need this directly for uh
this URL command that I'm going to
navigate to directly which is also a
Json parser so then if I want to just go
action
browser yep Json browser under common
you can see the I'm using a Victoria
Secret API directly in order to uh load
up this next page
so if I press execute
you see both the cookie and the
collection ID that I collected on the
first page inside a dynamic browser I'm
able to successfully use the API in
order to return back Json data
and from here it just collects a
all the data that's available on this
previous page like so instead of
scraping from this page directly and
scrolling and loading more additional
elements from the bottom I can just load
up everything directly at once then
collect some ID name rating price some
sale price
just run this agent real quick as well
this one runs faster than the Nike one
there's less products
the agent has completed I can check out
the data and you can see that agent has
successfully extracted
um all of the clearance items from the
Victoria's Secrets section along with
price and sale price rating and names
and so forth
and this is the more optimal optimally
than instead of uh loading up the uh
this Dynamic browser and then scrolling
through all the products and capturing
all of sale prices from here
back to the Nike one seems like it's
still running a bit
you have to get all the data
okay so it seems like actually taking a
while for this agent to actually run so
now I can actually go back to uh the
Victoria's Secrets One and show you guys
how
um this API
um how we actually got this uh
request here from the API directly so
then when we load up this first page
inside a dynamic browser we can actually
just put click on the activity module on
the bottom right
in order to see all of the calls that
are being made in order to load up this
dynamically so in here there's one call
that basically returns all the data via
Json
to look through all of this it's one of
the API ones
you can also always press the test
button in order to see uh the test
results that appear
[Music]
so I think it's this one
Pi print Chrome
actually I can just take a look at the
request and see which one it is
it's just Stacks V6 brands
Stacks physics Brands oh since we this
one then we press test
I can see that this is all the
information that's being returned
and then from here we can just make this
request in order to make this of course
we can just copy this directly to our
clipboard
um the request here comes with a bunch
of headers as well so we can just copy
this directly to our clipboard
and then if I just want to add in a new
command I can just come to the agent add
a command add a navigate URL command
under actions click on action on select
discover action click on browser Json
parser and then head back to Common and
then paste in this request that I just
copied from my clipboard copy and then I
can press execute
and from here this is all the data that
we extracted previously and that's how
we would go about finding
um API calls directly from certain sites
after we load a note for dynamic browser
so it seems like this negation has
attracted some data I'm just going to
stop it real quick
why is it taking so long
um Asian export is exporting data right
now agent is incomplete now I can just
view the data that the agent has
actually extracted and you can see that
um it's gone through the first page
where it got the item group id id title
descriptions this was all found under
the this API call here directly under
navigate Json and then afterwards when
it navigates to the products page it got
some additional information such as link
um brand G10 MPN color and then from
this last request that we got from the
script directly under this low Json
script load Json script it got condition
I mean not condition availability of all
of the colors of colors and sizes of all
shoes
and other products on Nike
and that's how we will go about mixing
and matching protocols in order to
optimize resources on our servers
does anyone have any questions or
anything else they'd like us to cover
today
I see we have a bunch of folks on the
webinar today
I don't see any questions
all right well if we don't have any
questions then we will
um we will
looks like maybe Chris has a question oh
Mary Mary does okay
um let's see
can I see that if access denied command
yeah so for the if access to nine
command all we're doing is checking to
see if um this particular X path here
exist or not
oh sorry
all we're doing is checking to see if uh
this particular X path here exists or
not under um exists denied I mean access
denied and if it exists then we're
loading um
this backup inside a dynamic browser
under action browser you can see that it
reloads the page inside a dynamic
browser
under this particular same URL
thanks c
um can you also show how you reload the
page
so for reloading page all it's doing is
it's capturing the current URL and then
it's just
HTML document.url which is the current
URL that we're currently on and then
it's reloading it in a dynamic browser
so now if I press execute
oh it seems like um it's not actually
happening here in uh
debug mode since I'm not getting this
access denied option
if it does get access denied with I mean
if um this XPath exists then it would
execute this command and we load it in
Dynamic browser
all right great are there any other
questions that you have based on what
you've seen today
Chris it looked like you might have a
question
um there is a way to type in your
question
I could try taking you off of mute
okay Chris you're off of mute
did you have a question
oh no you muted yourself okay so you
don't have a question excellent
um well I think that's it for today
thank you so much for joining
please send any further questions you
have to support contentforever.com and
we'll be happy to answer those for you
um
thanks again we look forward to
to our webinar next week which is all
about blocking
website blocking and how our software
gets you passed
past those blocks
thanks again and talk to you soon
bye-bye