CG Enterprise ACC demo

certainly can

okay yep

all right excellent

so after you install the agent controls

and this is the first screen you're

going to see which is the dashboard

um obviously when you're first installed

it's going to be empty I've just been

running a fuel engines throughout the

day just to populate some data into it

and the first thing you need to do to

use the to be able to run the agents

from the agent control cylinder is to

add servers and I obviously have one

already because I've been running agents

uh yeah before but if I go and have a

look at the service

because you have one server here

and I can basically just add servers

from anywhere what I'll do right now is

I'm gonna add once which is a cloud

instance from Amazon

um I'll set up one one image in in

Amazon which is just a generic image so

it has CG Enterprise installed on the

image but it's not activated so it's

easy to share around

um so if I try and add that instance now

and then you need to specify the license

key because I said the the contact

driver Enterprise on that so on the

cloudiness that is not activated yet

yep

and then the IP address

and if you go for Amazon for example

then you would need to also associate a

fixed IP address with the instance

and this is the same for all for cloud

instances no matter if it's Amazon if

it's your

but obviously the uh in turn of

emotionally use indicator servers

ible

and then now I've got the

instance or the VM in my Aden control

center

but I haven't Associated it with any

organization so the 80 control center is

organized into organizations I'll set up

one organization called demo

in this area control center I'm just

gonna select that one and you can see I

have only one server available to me

at all all organizations can share

servers that are available today in

control center

so I can select the one I've just added

and add it to my organization

and now this server is available for me

and I can start running agents so I'm

disabled but what do you normally do is

you would go into clusters

and you can see I have a cluster here

called test cluster and there's only one

server in the cluster

so what I would normally do a green and

edit the cluster here and then I would

add this server to my cluster

um so I have to at least you normally

would want to have at least two sales in

the cluster so that you can if one close

cluster goes down you still have another

server that can can run agents and also

the aiding console center is here load

or split load between the servers in the

Clusters

and finally you can take out a class SRS

0 from the cluster if you need to do

maintenance on the

on the

on the server for example if you want to

update Windows or something like that

awesome

right now because I have a lot of eight

and sitting in the US on this us2 server

and as soon as I add another server to

the cluster is going to synchronize

that's that serial with the clusters of

all the agents are installed on all the

servers in the cluster and because this

is a very small uh Cloud instance I'm

not quite sure how well that's gonna

it's gonna go

um

and of course you can

go back to the service

you can remove your

experience as well

this is just been moving you from there

from the organization still sitting in

my head Control Center and I can then if

I wanna

release the license I can go in I can

delete and deactivate a Content Grabber

on the on the instance

okay

and then you're basically back to where

we started

so if you go back to the agents now

you can see I have all these agents here

in my repository and the repository is

basically organized like a normal file

system

so you have directories where you can

organize your agency into

you can see I have run all these

agencies here some of them have failed

some of them are succeeded that's two

ways you can you can run an age and you

can run it as a job or you can run it as

a single one

and the job is basically just a

collection of single runs that are

tracked as a single entity

so for example if a job runs 10 Single

runs a job will not complete before all

10 runs have completed and it will not

succeed before all 10 runs have

succeeded

okay got it

um and this one you see you have failed

so you can go and have a look at the job

history and see what happened

so in this case the job has not uh

satisfied the success criteria so I've

said a 666 criteria I can go and have a

look actually at the success criteria to

go in

to the job settings

you see here I've set the success

material to 500 data cards so it needs

to to extract 500 data items for this to

succeed and if I go into the Run history

oh it's a job history again

you see this particular job here only

extracts 200 data count so that's why

it's failing so we are very basically

just forced the superhero for this demo

is there any kind of extraction or

reporting on this that our team would be

able to utilize it all

uh there is uh an API

where you can get this from but there is

no actual reporting uh built into this

yet okay no problem thank you

what we have in here the next I will

show you is the ticket so if something

fails and that direction is configured

to generate tickets

you would get a ticket

and it will then explain what what has

gone wrong

um staff and come in and have a look at

the at the agents and see if there's

something that needs to change

and obviously emails may be sent out as

well

with the people who are assigned this

particular directory

okay so it's outside and check in and

run an agent from scratch so what

happened the director being called

Amazon

and I've got an agent

here called Amazon demo I'll just check

it into there

the Repository

I'm gonna go in and select the Amazon

directory

so now I've checked it into the

repository which you should so up here

for free thus

and now I have the agent sitting in the

repository but it's not yet deployed

anywhere so I can't if I go in there I

can't run it

or what I can do I can deploy it to my

cluster

I'll go ahead and do and I'd only have

this one cluster with this one server

and now you can see the agent has been

deployed the version 100 CMP being

displayed deployed to the test cluster

and now what you normally do when you

run jobs is that you add schedules to it

to the to your job one job can consist

of one or more schedules

in this case I'm just going to add a

single schedule this job

and you have the option of just a basic

schedule or a Crown schedule

uh just hit the lock level to high

and here on the session I'm gonna run

this particular agent in in three

sessions which basically just means that

uh the 18 controls and the startup three

instances of the same agent who process

all the data that's available

uh you don't always need to start up

multiple instances of an agent because

each instance will also process data in

parallel it's just that

sometimes especially when you have

websites where you need to use uh full

Dynamic browsers it's more efficient to

have

separate processes processing the data

okay

all depends on the the books that the

Target website in this case I just run

three of them

so now I have the schedule here that

runs every day

and I'm just going to start it manually

this time oh right now

let's get it running

and go back have a look at the Run

history

you see I have these three instances of

the agent running now

and you can see they're all

packed by the same run ID so the job is

not complete before they've all

completed

a little bit for it to complete it

didn't take too long

okay so that you're done now and you see

the the data the extracted data has been

Consolidated into one single data file

that has delivered 234 data rows

I can go ahead and download the data if

I want to have a look at it but normally

your agents would be configured to

distribute the data to some

somewhere like an STP if SFTP drive or

an S3 bucket or anywhere really or to a

database but you will still have that

you still have a backup of the data file

of the data extracted which you can

download and have a look at even if you

are if you are delivering data to a

database

so the data file that you're downloading

now is that the internal database that's

mentioned in the instructions and the

manual sorry

or is that stored on online

that's separate to the internal one um

James that this is just a backup

okay awesome

yeah maybe I mean you can't we normally

configure so you want to uh to create a

backup for a CSV backup file of

everything that but that that's

happening uh so and with the retention

period of 30 days we have 30 days of

data in CSV settings

uh on our servers which gives us the the

uh the ability to look at the data by

the way without having to go to database

if we are exporting the database not

reviewing export data to maybe to a

stream

Ico storage

sure

this data though just just out of

curiosity each time that you do a backup

or open this up is it a pending age time

or is there say if you had 30 days of

scrapes there'd be 30 CSV somewhere

the 30 days retention is each each run

uh creates a backup of the data okay

yes it is it puts in the zip file

basically exports to CSV and puts it in

your zip file and then it sits there you

can do whatever you want you can copy

that to somewhere else but what we do

right now here on the demo service it

just sits on the sale of 30 days and

then gets deleted that means I have 30

days to look back in my history and look

at the data that I expected

that's good to Handy tonight

and you can also have a look at the

locks let me make us if the Locking a

little too high but if something goes

wrong you can

download the locks

have a look

and of course if any of the runs fails

you can you can restart or retry errors

and another thing you can do is this

particular agent here I've uploaded here

it goes through an input file of urls

um

and if you wanted to

if you want the agent to expect

something else upload another input file

for example or process another input

file for example

you can go in and you can upload another

file

of

so this is just a small trial of URLs

but it could be anything really that you

wanted to use for as input data

this is just emails

and then I could go in and run that

directly

on the cluster

and I'll just change the input

that file I just uploaded

and it shouldn't take long to finish

because I think I only had like five

urls

a seven urls

and again you can do the data but it

will be the same kind of data but only

for this year seven URLs I put in my

info files

and that's really the basics of the of

the control center

um then there are other things yes since

Sarah has already explained like like

rate limits

if you want to make sure that you don't

see the website too hard for example you

can put in a rate limit which you can

then assign to all the agents you have

that hits the particular website

for example if I want to go in and make

a rate limit for Amazon I could go in

and

say I don't want to allow more than 10

concurrence distance to hit the website

at the time

and maybe I want to only limit it to 100

000 pesos or over a 24 hour duration

then I will go back we'll go back into

my

agent here and then for all the agents

that hits the Amazon website and go in

and assign that rate limit

to the agents

um then we have proxies of course

very important when it comes to web

scraping

and this is just a management of boxes

you can see here I have only one proxy

provider in here but we will have an

intern we would have you know hundreds

of occupieders

uh this one it's just really just a list

of

epoxies sometimes it's an API it doesn't

have to be a list of property could also

be a a

definition of an API uh obviously some

boxing fighters live right there their

properties here and API so it can be a

definition of that that API to get the

list of of

foxes

and then you just assign your your agent

to use this particular URL as proxy

provider

and the last thing is the connections

so these are external storage

connections uh if you want to

deliver data to an S3 bucket or to

assure storage

you can define those connections and

upload them up to the controls and they

will then automatically be distributed

to all your servers in the organization

and all your agents and organization

will then have access to those

external storages

okay so I just want to clarify that so

if I download say a thousand PDFs from

10 websites I can actually designate a a

cloud storage bucket location for them

to download Straight from from the ACC

uh well it will it'll take it yeah it'll

take like sorry if you extracted uh 10

you expected the the data you're

extracting would be uh would be uploaded

to uh

to the storage

is that what I meant sorry yeah yeah

that's awesome um do you believe that

it's not you don't have Google cloud yet

do you still has to be an S3

um we don't have Google cloud yet it's

on the plans we have Google drive but

not the cloud storage do you have an

SFTP

sorry

do you have a FTP at all FTP and SFP as

well yeah okay excellent now that's one

that's not a problem and Google can be

done too James we just have to set up a

script for it so but it's just not a

default setting that's all

okay

um

I will absolutely tell you that I'm

still learning all of this so a um we do

have some people here that would be able

to help me with that anyway but we still

have a default S3 that I can stick them

in for now so that's not a not a

consideration at all that's perfect

okay cool

that's all I have Tony

foreign

sorry James

I was going to say thank you very much

for taking the time to show me that uh

it's it's definitely something that

we're going to have to put some extra

thought into

um

I think this is actually a lot easier

and means a lot more people can be

making use of it rather than just a

little old me

yeah I think the thing that's really

really powerful about it is it is

Simplicity because you know you can see

everything in one place and what's

happening

um you can upgrade versions of content

Grabber on one server and not others and

you can upgrade agents and you can move

them between servers and it really just

adds a lot of efficiency to what you're

doing

um obviously the Version Control part's

important but just having the compliance

um ticked off as well an audit trail of

everything in one place

um it just saves you a lot of pain

particularly for big operations

um and I don't know that we we looked at

this but you've also got control over

who can access what as well from a user

rights point of view

um so so that's pretty important as well

and that's defined under under the

organization right so on

yeah that's that's true and and one of

the things that's also becomes really

difficult when you have a lot of Agents

scheduled uh oh a lot of different

services to work out how you schedule

them on on particular servers to utilize

uh the the power of the service equally

or a week for example

uh that's something we have had a lot of

issues with anyway uh and that's where

the the Clusters comes in very handy

because it will spread the load uh

automatically rather than you having to

figure out where should I schedule my

agents to get the best value out of my

my service

so you don't have to architect that

solution yourself for load balancing

it's built into the software management

yeah

that's awesome

I'm gonna have to go back and ask my

boss for more money

yeah that's all good but um but yeah I

mean it we did used to sell this

software

um for 75k by itself so we've completely

changed our model and you now get it as

part of when you buy a server license

which is much more accessible to to most

customers now so um so yeah and we're

quite excited and you're seeing

obviously the new release that's about

to come out

um late this month so where yeah it's

it's working really well

wow that's awesome

um I I I didn't see it and I'm assuming

it's not in here um but Tony and I were

discussing something

um a little while ago and I I don't know

if it might be on the future roadmap if

it's not as it's not a huge deal but it

would be cool to see is um database

mapping so if I give you gentlemen an

example I can scrape say 10 websites and

they're all exactly the same

and I can actually get the same targeted

information from each of those 10

websites so I can get say an address

column a date submitted column and so

forth uh one of the one of the good

things that my current software does do

is that in their version of a control

center I can flag those multiple agents

in the center and then map all of their

columns to go to the same column so

while One agent might have a address one

the next website might have addressed

too I can actually tell the agents to

put both of those into the same column

inside the same database so you just get

a bit of parity and uniformity in what

you're doing I don't know if that's

something that you can put in the

suggestion box but it's it's really

really powerful and for someone who's

very very doing repetitive work and

having to go back in and remap multiple

agents over and over is it is a huge

Time Saver

but I'm still in love with this thing

anyway so if Yeah

we actually uh discussed a while back uh

Department with data mapping is they can

very quickly become extremely

complicated

um you have you have big software

packages that does nothing else than and

data mapping

um so I think you're right it will be

it'll be good to have something uh that

that at least does simple mapping

uh but the problem is it can very very

easily if you have child tables and and

other things the data mapping can really

really easily be extremely complicated

to do uh in a visual way anyway

[Music]

that's why we have because you know we

can't we at this point in time we

haven't been able to build a visual

package that can do you know uh the more

complicated things so that's why we say

okay let's just save a scripting for now

uh because you know you do something

that has simple mapping you know and

then then you would have users saying

right but why can't it do this and this

and this

um

and edit it easily it can easily become

extremely complicated

um

and and many of our users they won't

understand why why can't we do this

mapping that we can do this mapping uh

yeah so so that's why we have stayed the

script for now uh but yeah I agree it

would be good to have the data I'm

having as well uh yeah and as I said

this that's just a a future suggestion

or something like that we've I've

already figured out how I'm gonna get by

without it anyway so that's not a

problem

um at least there is an option though

there is an option to uh

restore to use the same uh external

table from multiple agents

as long as they have the same fields

okay um

yeah you can't you can it doesn't have

you know you can have commands with one

name and you can export uh to a field

with another name uh so you can make

those very very simple mappings

and as long as as one as all the agents

exports the same Fields with the same

names uh then then they can all export

to the same table

okay that's good to know because that

that is literally what we do

um every single uh website that we

scrape for a particular field has to

have the same column names anyway it's a

mandatory practice on our end so that

that's good to know I'll go and have a

bit more of an explore with that and see

if I can figure it out

yeah

otherwise just give me the ticket it

might be buried down somewhere in the in

the setting somewhere but it's hard to

find

awesome all right well um I won't take

any of it anymore's time because it is

Friday night

um Tony I'll get i'll I've actually I've

spent most of this week just doing some

tests no problem thanks James

you too thanks have a great weekend bye