certainly can
okay yep
all right excellent
so after you install the agent controls
and this is the first screen you're
going to see which is the dashboard
um obviously when you're first installed
it's going to be empty I've just been
running a fuel engines throughout the
day just to populate some data into it
and the first thing you need to do to
use the to be able to run the agents
from the agent control cylinder is to
add servers and I obviously have one
already because I've been running agents
uh yeah before but if I go and have a
look at the service
because you have one server here
and I can basically just add servers
from anywhere what I'll do right now is
I'm gonna add once which is a cloud
instance from Amazon
um I'll set up one one image in in
Amazon which is just a generic image so
it has CG Enterprise installed on the
image but it's not activated so it's
easy to share around
um so if I try and add that instance now
and then you need to specify the license
key because I said the the contact
driver Enterprise on that so on the
cloudiness that is not activated yet
yep
and then the IP address
and if you go for Amazon for example
then you would need to also associate a
fixed IP address with the instance
and this is the same for all for cloud
instances no matter if it's Amazon if
it's your
but obviously the uh in turn of
emotionally use indicator servers
ible
and then now I've got the
instance or the VM in my Aden control
center
but I haven't Associated it with any
organization so the 80 control center is
organized into organizations I'll set up
one organization called demo
in this area control center I'm just
gonna select that one and you can see I
have only one server available to me
at all all organizations can share
servers that are available today in
control center
so I can select the one I've just added
and add it to my organization
and now this server is available for me
and I can start running agents so I'm
disabled but what do you normally do is
you would go into clusters
and you can see I have a cluster here
called test cluster and there's only one
server in the cluster
so what I would normally do a green and
edit the cluster here and then I would
add this server to my cluster
um so I have to at least you normally
would want to have at least two sales in
the cluster so that you can if one close
cluster goes down you still have another
server that can can run agents and also
the aiding console center is here load
or split load between the servers in the
Clusters
and finally you can take out a class SRS
0 from the cluster if you need to do
maintenance on the
on the
on the server for example if you want to
update Windows or something like that
awesome
right now because I have a lot of eight
and sitting in the US on this us2 server
and as soon as I add another server to
the cluster is going to synchronize
that's that serial with the clusters of
all the agents are installed on all the
servers in the cluster and because this
is a very small uh Cloud instance I'm
not quite sure how well that's gonna
it's gonna go
um
and of course you can
go back to the service
you can remove your
experience as well
this is just been moving you from there
from the organization still sitting in
my head Control Center and I can then if
I wanna
release the license I can go in I can
delete and deactivate a Content Grabber
on the on the instance
okay
and then you're basically back to where
we started
so if you go back to the agents now
you can see I have all these agents here
in my repository and the repository is
basically organized like a normal file
system
so you have directories where you can
organize your agency into
you can see I have run all these
agencies here some of them have failed
some of them are succeeded that's two
ways you can you can run an age and you
can run it as a job or you can run it as
a single one
and the job is basically just a
collection of single runs that are
tracked as a single entity
so for example if a job runs 10 Single
runs a job will not complete before all
10 runs have completed and it will not
succeed before all 10 runs have
succeeded
okay got it
um and this one you see you have failed
so you can go and have a look at the job
history and see what happened
so in this case the job has not uh
satisfied the success criteria so I've
said a 666 criteria I can go and have a
look actually at the success criteria to
go in
to the job settings
you see here I've set the success
material to 500 data cards so it needs
to to extract 500 data items for this to
succeed and if I go into the Run history
oh it's a job history again
you see this particular job here only
extracts 200 data count so that's why
it's failing so we are very basically
just forced the superhero for this demo
is there any kind of extraction or
reporting on this that our team would be
able to utilize it all
uh there is uh an API
where you can get this from but there is
no actual reporting uh built into this
yet okay no problem thank you
what we have in here the next I will
show you is the ticket so if something
fails and that direction is configured
to generate tickets
you would get a ticket
and it will then explain what what has
gone wrong
um staff and come in and have a look at
the at the agents and see if there's
something that needs to change
and obviously emails may be sent out as
well
with the people who are assigned this
particular directory
okay so it's outside and check in and
run an agent from scratch so what
happened the director being called
Amazon
and I've got an agent
here called Amazon demo I'll just check
it into there
the Repository
I'm gonna go in and select the Amazon
directory
so now I've checked it into the
repository which you should so up here
for free thus
and now I have the agent sitting in the
repository but it's not yet deployed
anywhere so I can't if I go in there I
can't run it
or what I can do I can deploy it to my
cluster
I'll go ahead and do and I'd only have
this one cluster with this one server
and now you can see the agent has been
deployed the version 100 CMP being
displayed deployed to the test cluster
and now what you normally do when you
run jobs is that you add schedules to it
to the to your job one job can consist
of one or more schedules
in this case I'm just going to add a
single schedule this job
and you have the option of just a basic
schedule or a Crown schedule
uh just hit the lock level to high
and here on the session I'm gonna run
this particular agent in in three
sessions which basically just means that
uh the 18 controls and the startup three
instances of the same agent who process
all the data that's available
uh you don't always need to start up
multiple instances of an agent because
each instance will also process data in
parallel it's just that
sometimes especially when you have
websites where you need to use uh full
Dynamic browsers it's more efficient to
have
separate processes processing the data
okay
all depends on the the books that the
Target website in this case I just run
three of them
so now I have the schedule here that
runs every day
and I'm just going to start it manually
this time oh right now
let's get it running
and go back have a look at the Run
history
you see I have these three instances of
the agent running now
and you can see they're all
packed by the same run ID so the job is
not complete before they've all
completed
a little bit for it to complete it
didn't take too long
okay so that you're done now and you see
the the data the extracted data has been
Consolidated into one single data file
that has delivered 234 data rows
I can go ahead and download the data if
I want to have a look at it but normally
your agents would be configured to
distribute the data to some
somewhere like an STP if SFTP drive or
an S3 bucket or anywhere really or to a
database but you will still have that
you still have a backup of the data file
of the data extracted which you can
download and have a look at even if you
are if you are delivering data to a
database
so the data file that you're downloading
now is that the internal database that's
mentioned in the instructions and the
manual sorry
or is that stored on online
that's separate to the internal one um
James that this is just a backup
okay awesome
yeah maybe I mean you can't we normally
configure so you want to uh to create a
backup for a CSV backup file of
everything that but that that's
happening uh so and with the retention
period of 30 days we have 30 days of
data in CSV settings
uh on our servers which gives us the the
uh the ability to look at the data by
the way without having to go to database
if we are exporting the database not
reviewing export data to maybe to a
stream
Ico storage
sure
this data though just just out of
curiosity each time that you do a backup
or open this up is it a pending age time
or is there say if you had 30 days of
scrapes there'd be 30 CSV somewhere
the 30 days retention is each each run
uh creates a backup of the data okay
yes it is it puts in the zip file
basically exports to CSV and puts it in
your zip file and then it sits there you
can do whatever you want you can copy
that to somewhere else but what we do
right now here on the demo service it
just sits on the sale of 30 days and
then gets deleted that means I have 30
days to look back in my history and look
at the data that I expected
that's good to Handy tonight
and you can also have a look at the
locks let me make us if the Locking a
little too high but if something goes
wrong you can
download the locks
have a look
and of course if any of the runs fails
you can you can restart or retry errors
and another thing you can do is this
particular agent here I've uploaded here
it goes through an input file of urls
um
and if you wanted to
if you want the agent to expect
something else upload another input file
for example or process another input
file for example
you can go in and you can upload another
file
of
so this is just a small trial of URLs
but it could be anything really that you
wanted to use for as input data
this is just emails
and then I could go in and run that
directly
on the cluster
and I'll just change the input
that file I just uploaded
and it shouldn't take long to finish
because I think I only had like five
urls
a seven urls
and again you can do the data but it
will be the same kind of data but only
for this year seven URLs I put in my
info files
and that's really the basics of the of
the control center
um then there are other things yes since
Sarah has already explained like like
rate limits
if you want to make sure that you don't
see the website too hard for example you
can put in a rate limit which you can
then assign to all the agents you have
that hits the particular website
for example if I want to go in and make
a rate limit for Amazon I could go in
and
say I don't want to allow more than 10
concurrence distance to hit the website
at the time
and maybe I want to only limit it to 100
000 pesos or over a 24 hour duration
then I will go back we'll go back into
my
agent here and then for all the agents
that hits the Amazon website and go in
and assign that rate limit
to the agents
um then we have proxies of course
very important when it comes to web
scraping
and this is just a management of boxes
you can see here I have only one proxy
provider in here but we will have an
intern we would have you know hundreds
of occupieders
uh this one it's just really just a list
of
epoxies sometimes it's an API it doesn't
have to be a list of property could also
be a a
definition of an API uh obviously some
boxing fighters live right there their
properties here and API so it can be a
definition of that that API to get the
list of of
foxes
and then you just assign your your agent
to use this particular URL as proxy
provider
and the last thing is the connections
so these are external storage
connections uh if you want to
deliver data to an S3 bucket or to
assure storage
you can define those connections and
upload them up to the controls and they
will then automatically be distributed
to all your servers in the organization
and all your agents and organization
will then have access to those
external storages
okay so I just want to clarify that so
if I download say a thousand PDFs from
10 websites I can actually designate a a
cloud storage bucket location for them
to download Straight from from the ACC
uh well it will it'll take it yeah it'll
take like sorry if you extracted uh 10
you expected the the data you're
extracting would be uh would be uploaded
to uh
to the storage
is that what I meant sorry yeah yeah
that's awesome um do you believe that
it's not you don't have Google cloud yet
do you still has to be an S3
um we don't have Google cloud yet it's
on the plans we have Google drive but
not the cloud storage do you have an
SFTP
sorry
do you have a FTP at all FTP and SFP as
well yeah okay excellent now that's one
that's not a problem and Google can be
done too James we just have to set up a
script for it so but it's just not a
default setting that's all
okay
um
I will absolutely tell you that I'm
still learning all of this so a um we do
have some people here that would be able
to help me with that anyway but we still
have a default S3 that I can stick them
in for now so that's not a not a
consideration at all that's perfect
okay cool
that's all I have Tony
foreign
sorry James
I was going to say thank you very much
for taking the time to show me that uh
it's it's definitely something that
we're going to have to put some extra
thought into
um
I think this is actually a lot easier
and means a lot more people can be
making use of it rather than just a
little old me
yeah I think the thing that's really
really powerful about it is it is
Simplicity because you know you can see
everything in one place and what's
happening
um you can upgrade versions of content
Grabber on one server and not others and
you can upgrade agents and you can move
them between servers and it really just
adds a lot of efficiency to what you're
doing
um obviously the Version Control part's
important but just having the compliance
um ticked off as well an audit trail of
everything in one place
um it just saves you a lot of pain
particularly for big operations
um and I don't know that we we looked at
this but you've also got control over
who can access what as well from a user
rights point of view
um so so that's pretty important as well
and that's defined under under the
organization right so on
yeah that's that's true and and one of
the things that's also becomes really
difficult when you have a lot of Agents
scheduled uh oh a lot of different
services to work out how you schedule
them on on particular servers to utilize
uh the the power of the service equally
or a week for example
uh that's something we have had a lot of
issues with anyway uh and that's where
the the Clusters comes in very handy
because it will spread the load uh
automatically rather than you having to
figure out where should I schedule my
agents to get the best value out of my
my service
so you don't have to architect that
solution yourself for load balancing
it's built into the software management
yeah
that's awesome
I'm gonna have to go back and ask my
boss for more money
yeah that's all good but um but yeah I
mean it we did used to sell this
software
um for 75k by itself so we've completely
changed our model and you now get it as
part of when you buy a server license
which is much more accessible to to most
customers now so um so yeah and we're
quite excited and you're seeing
obviously the new release that's about
to come out
um late this month so where yeah it's
it's working really well
wow that's awesome
um I I I didn't see it and I'm assuming
it's not in here um but Tony and I were
discussing something
um a little while ago and I I don't know
if it might be on the future roadmap if
it's not as it's not a huge deal but it
would be cool to see is um database
mapping so if I give you gentlemen an
example I can scrape say 10 websites and
they're all exactly the same
and I can actually get the same targeted
information from each of those 10
websites so I can get say an address
column a date submitted column and so
forth uh one of the one of the good
things that my current software does do
is that in their version of a control
center I can flag those multiple agents
in the center and then map all of their
columns to go to the same column so
while One agent might have a address one
the next website might have addressed
too I can actually tell the agents to
put both of those into the same column
inside the same database so you just get
a bit of parity and uniformity in what
you're doing I don't know if that's
something that you can put in the
suggestion box but it's it's really
really powerful and for someone who's
very very doing repetitive work and
having to go back in and remap multiple
agents over and over is it is a huge
Time Saver
but I'm still in love with this thing
anyway so if Yeah
we actually uh discussed a while back uh
Department with data mapping is they can
very quickly become extremely
complicated
um you have you have big software
packages that does nothing else than and
data mapping
um so I think you're right it will be
it'll be good to have something uh that
that at least does simple mapping
uh but the problem is it can very very
easily if you have child tables and and
other things the data mapping can really
really easily be extremely complicated
to do uh in a visual way anyway
[Music]
that's why we have because you know we
can't we at this point in time we
haven't been able to build a visual
package that can do you know uh the more
complicated things so that's why we say
okay let's just save a scripting for now
uh because you know you do something
that has simple mapping you know and
then then you would have users saying
right but why can't it do this and this
and this
um
and edit it easily it can easily become
extremely complicated
um
and and many of our users they won't
understand why why can't we do this
mapping that we can do this mapping uh
yeah so so that's why we have stayed the
script for now uh but yeah I agree it
would be good to have the data I'm
having as well uh yeah and as I said
this that's just a a future suggestion
or something like that we've I've
already figured out how I'm gonna get by
without it anyway so that's not a
problem
um at least there is an option though
there is an option to uh
restore to use the same uh external
table from multiple agents
as long as they have the same fields
okay um
yeah you can't you can it doesn't have
you know you can have commands with one
name and you can export uh to a field
with another name uh so you can make
those very very simple mappings
and as long as as one as all the agents
exports the same Fields with the same
names uh then then they can all export
to the same table
okay that's good to know because that
that is literally what we do
um every single uh website that we
scrape for a particular field has to
have the same column names anyway it's a
mandatory practice on our end so that
that's good to know I'll go and have a
bit more of an explore with that and see
if I can figure it out
yeah
otherwise just give me the ticket it
might be buried down somewhere in the in
the setting somewhere but it's hard to
find
awesome all right well um I won't take
any of it anymore's time because it is
Friday night
um Tony I'll get i'll I've actually I've
spent most of this week just doing some
tests no problem thanks James
you too thanks have a great weekend bye