CG Enterpise Webinar

Center in a little bit um so let's see let's go ahead on over

to the agent control center so now I think I'm going running a little bit long so I'm

not going to spend too much time here but basically you've got this aging

control center where you set up your organizations and your users and who has access to what

um and then you've set up your you check in your agents and we have again it's a

purpose-built Version Control repository so all the all the things the Myriad

things that go into your agent so your inputs um you if it's you're writing the same

job every day your all your reference data your third-party libraries

um your uh you know your your SQL schemas all those things that your

engineer has to track independently and somehow to coordinate

that that tracking um so that all the changes to these agents move forward in lockstep we have

built a purpose uh purpose-built Version Control repository that allows you to

package up all of those different things into a single agent version check it in

to the agent control repository and you can actually see uh the versions of that

um let me see if I can show you versions let's see this doesn't really have a lot of versions so let me see

download the latest no that's not what I want um

let's see no I don't I only have one version in here so you can basically

um track all the different versions of your agent see who made those changes you can

have comments based on each change and you can get the latest um version or you can deploy an earlier

version just from from right clicking it's brilliant and based on top of that

we can set schedules we can set rate limits for these agents so what we do as

a service we'll go to Sim a website like similarweb.com which tells you what the

monthly visits are on average for that website and what the page loads are per

monthly visit so then we just basically calculate an average daily volume and

we'll stay under one percent of that volume for our customers and so you know

we'll set rate limits and make sure that we stay within those those ranges and

that's how we make sure that we never put excessive volume on a website and

because we're tracking Version Control and we have an audit log through our agent control center

we can basically uh you know just prove to a compliance

group or a manager um or anyone else that uh that these rate limits have been followed in the

operation um you can also set up your servers and your clusters

um and yours and the Aging control center will even add servers to the cluster if you're starting to get really

busy which is great if you're in a cloud setting um again you can add your providers and

your pools and there's tickets based on each agent so one thing I didn't show you was when

you're creating your agent you also configure the uh ticketing system into

the agent so that when something goes wrong it can go ahead and create a ticket we have our own ticketing system

that ships with the agent control center but you don't have to use it you can use jira or whatever your internal system is

if it has a rest API that you can configure this to communicate with you can do it that way whatever works for

your operation you are not hemmed in um what else can I show you

um so that that's basically it and then you can see um based on uh you know what all the

runs are in the Aging control center you can you can kick off a run or you can

um look at the job settings here's you know more ways to set a page load limit

um uh success criteria you can set the level of the job you can look at the job schedules you can look at the job

history um you know Etc

um and then when you actually go into um the job itself you can actually see

let's see what have I got here um I should be able to get some data

there is seated there we go so this is sorry the agent

control center is brand new this month it's really my first webinar showing it

so I can download the data directly from this web portal which is incredibly

useful um for teams that have uh diverse skill

sets so some are programmers and some are um just Ops managers basically everyone

has access to the data everyone has access to the files there's really no

site we're not creating any silos here um everyone can see it

um so everyone has access so if this were outputting to 10 different formats you

would get in 10 different formats that's fine um and if also if you were writing it to S3 for example you would have it show up

there um you know it'd be very easy for everyone to access the data the other thing about

this um product is that we have our manual written in zendesk so in addition to

having a knowledge base and a fact we have a very detailed help manual

um which uh it's actually public now to anyone who has the link but it's not

officially supposed to be done until June 30 so you're getting a little preview

um and you know in here they have a section on

the API which I want to share with you so this agent control center the way

that the let's go back to this diagram the way that the desktops communicate

with the servers is through the Aging Control Center and you saw the portal and you saw the ticketing system and you

saw all these components of the Aging control center but actually everything's connected via

API and you also have access to make those API calls

um so you basically um you know you use you configure that

agent for API access you get your API key

um your token and this basically walks you through the steps you need to do to

make every single API call work and if you prefer

you know this is written out for you in great detail but if you prefer you can just open those calls in Postman and you

can go ahead and test them out and run them in Postman against your you know evaluation ACC version or against your

own local installation of the server the ACC comes bundled with the server

and so any CG Enterprise server will have this

capability and you'll be able to use these API calls to integrate very deeply

with it um so at that now I'm going to look

again if there are any questions and I'd love to I'm going to unmute

everyone uh somehow

I don't see any questions let's see how do I unmute everyone

okay there we go all right everyone is no longer muted

um looks like James MacArthur has a question James why don't you go ahead

question is pain so you see the SQL integration yes so

it's actually not an ACC to SQL integration in in our Paradigm your SQL

integration comes from how you configure your database so if your export Target

is an actual database then you can enable database export to

you know any of these different databases SQL Server MySQL Oracle ol any

oledb like you know postgres or whatever um mongodb Azure Cosmos or of course you

can write any script that you want um it's the and so there's an internal

database which is what it's using on the Fly

um so for example if you're in Azure you're going to want to use the Azure SQL Server you're going to have to make

sure that um you know you've got enough you know dtus in there to to handle the volume of

agents that are all pointing to the same database um and and then you're good to go so

there's an internal database and there's an external database

um and so that's that's basically how you do it you can integrate in in either or or both

can you show a little more on the ACC SQL integration

um let me see oh gosh I don't know how to make this bigger oh it's tiny hello

hi honey yeah we have a question here okay

we're wondering where is this a scg hosted can it be hosted in the cloud or

is a software as a service well yes um all of the above so it can be

installed on premise it can be hosted by us in the cloud or it can be hosted by

you in the cloud the software is licensed um so if we sell you a license you'll

pay a single annual fee for that license and then you'll operate it on your own infrastructure or you can have us set it

up for you and then we'll manage it for you and that's the SAS offering

so for the SAS are offering meaning to say it's not only sdg the entire like

the entire uh the desktop version also will be need to be hosted I mean managed

by by your side right yeah so if we we have uh three pillars

to our business one of them is software licensing the second is Services where we provide

TurnKey services okay and the third is data products so

okay if you wanted us to write everything for you and provide services we do that on a daily basis if you

wanted us to consult we would charge 150 an hour and it would take us you know as

you see it would take us probably two to four hours to write a simple agent um end to end you know then we could

write the agents and you could run them or the other way around we could license

you the desktop software and you could write the agents and then we could run them in our cloud

there's a lot of flexibility right all right thank you

um let's see and and the other thing about the the cloud is that it basically it's just

Windows software so it can run in any cloud it can run in Azure it can run a Google Cloud AWS we have customers in all of

those clouds what we normally do is we will run it we will run the agent

control center in the cloud on highly available infrastructure and then you know highly

redundant and then we'll run our servers on Data Center

physical servers there's a huge cost difference

and then we'll we'll have our you know databases and file systems in the cloud and that that really helps us so we have

full redundancy we've rolled over because the AG control center is managing that and then we have

um we we have our low-cost Hardware plus we have Hardware in the area in the region

of the world where we need proxies so we'll set some up in Australia we'll set some up in in China or the UK

and we won't have to spend a lot on proxies so here's a question from James oh a

little more on this ACC SQL engine integration so James did I answer your question on

the on adequately on the SQL integration

for some reason I'm having a hard time opening this question box

hi how you going this is James here how you doing today hi James how are you

good thank you I am I was just actually just trying to get a bit more of a follow-up to the the SQL integration

um I've built 20 agents in a test profile now each of them seem to go to a

different table what I'm trying to figure out is if there is a way to scrape multiple create multiple agents

but then have all of the data going back to a single table because the multiple agents are all getting the same

information just from different websites oh yeah so whatever

yeah what I would recommend is first of all when you um when you set up your internal

database I would actually set up a separate export database because as you see with this when you when you set up

oh sorry I never ran this is that right

to run it um to create the tables but when it creates the tables it it's creating data

structures behind every parent child um tab in the workflow and then it's

creating tables to hold the data collected from each capture command so you've got I

don't know this is it thinks it has proxies no it's not running and device setup proxies yes I did stop

that it didn't fully set them up um so let me just so there's the

internal tables that are set up and then the internal tables when they're exported they go to that single export

database so when I go to that export database you can either configure a different export

database and point them all at the same database or um you can write a custom export script

that puts them all in the same database the the thing that you're wrestling with

is that by default the tool is assuming

that you don't want to mix data from multiple agents even though you may have the same structure so it's keeping track

of the unique agent um do it internally and it's keeping those things

separate so in that case you're probably going to want to just write a custom script and

to put them all into the same table and our support can supply a sample of

that that's right we can still do it externally anyway it was just a matter

as if if there was an easy option somewhere that's not a humongous problem no it sounds like it it could be a

feature request and we could add in a you know a screen to configure that

it is something we run into every now and then uh obviously I can't say too much about

a competitor but the the current system that we use now if we used their version of an ACC H agent that we manage we go

into the portal and we say okay uh the the job ID number from this from this

agent goes into this column of the SQ called um table the product description from

each of the agents goes into the the product column of that SQL table and and so on and so forth so it was just a

matter of just if there was some sort of parity so if there's a feature that could be worked on later that'd be great but it's not life-threatening to our to

our product or services ah so it sounds like it's uh there's a some some data mapping screens like

pipes yep exactly who are actually configuring to show how the data flows

Downstream and then where it ends up correct so mapping's probably the perfect word for it we map where each of

the data points from H to the agent goes into a SQL database and and it's Associated tables

oh yeah that's nice that's very nice it would be a great

great feature to have but it's it's it was just more if it already existed that it'd be great if not we we can just

write some script internally that's not a problem at all right

um I mean this tool is automatic automating a lot of that for you right it's yes

this thing is yeah it's it's it's very powerful this

tool and it's basically do you don't have to think about you don't even have to think about that step because it's

automatically exporting your data in the in the format of the capture command so

let's see let's say I didn't like the uh

uh let's say I didn't like the way that these fields were getting um captured I wanted to change the order I'm just

dragging and dropping the order and that's going to change the way that it gets written out at the end

and I could just click here and change the name and it's automatically tracking that on the back end in SQL and it's

going to come out that way you know so you there's a lot of stuff that you can do you just don't realize

it's so intuitive and obvious in the UI you don't realize that you're achieving the same the same purpose without having

a big mapping exercise yes I wish we'd found this software about three years ago before we wrote

everything out it's very very as a new customer I can tell you that we're very impressed with what it's done so far oh

we're so excited I literally was talking to a customer of ours earlier today who

was so thrilled he asked if he could be uh an investor

buying five like five new licenses he's been growing a team and buying like five

new licenses every other month I mean it's just really exciting yeah I mean I joined this company in 2017 but

um you know these our Founders are in Australia they've been at it since 2010 and it's it's definitely

um very exciting to see very mature yeah um sometimes in subtle ways but anyway

let me look at these other questions saw that earlier I've made several books and go to oh yeah this is your question

I can't seem to get this this uh uh

this window of questions to get any bigger for some reason but

is there a way to send foreign

way to send multiple agents to a single table okay we covered that one it doesn't look like there are any other

questions um I have one more if you do if you have

some time yeah absolutely um the only other sort of feedback or

problem that I've had with this software is what to do if the pagination option

doesn't come up so I've got a website at the moment with 10 pages it's got a just

a normal next button at the bottom of the screen but when I click it the

pagination option in the pop-up window doesn't appear and I can't seem to find

a way to to ask it to consider a pagination option it plays when I debug it basically just starts running

so then you can just go into into the top level um parent on that page and click this

plus sign add command and then you get this window that pops up

and you can scroll down and click on any of these commands including navigation

I know how to do that um it's not doing it um it's not it's

I'm just wondering if there's an extra way or something a little bit more advanced when it's not working so I can

add a pagination it's not protecting the X path properly or uniquely

um so let's say there isn't of course pagination on this particular field but let's say there is and I've gone and

clicked on single next link and now I'm trying to figure out what the

what the um you know I'm just pointing and clicking on that next button you can

go over here and see what the XPath is that's been collected and then you can

go down here and see in the selection count how many elements on that page are

mapping to that XPath and you can see if there's um if the the XPath is not properly or

cleanly identifying that single next button then you can look at it in visual

debug mode and you can go page to page and you can pause and you can see if

it's actually getting the correct next button when it goes to

the next page because sometimes they they'll do that they'll make it you know

just a little bit harder for you they'll change the next button you know from

page to page they'll change the XPath so you actually have to use a different identifier than the default that the

tools okay so sometimes it'll it'll click next on the first page but the second one has

something slightly different so you'll you'll have to look for the next uh

button image you know or something like that okay awesome

I can definitely yeah um so you can basically you know play with that and and if you have any

trouble please send your questions to support at contentgrabber.com

what's up yes I've already spoken with your support team for uh for a bit of code

that would I can if anyone else is listening this company really does look after you with their support as well yeah it's in about six hours

oh that's great yeah no they um they're really much better than I am and they

um they're just very committed yep I mean it's it's a nice tool people kind

of fall in love with the products they work on and that definitely I think um

you know our support staff is uh we look after them and they seem to look after everyone else so

you know let us know if that's ever not the case as I said as a new customer can say I'm

very very happy with the server so far oh I'm so glad what company are you from

uh core logic in Brisbane Australia all right very nice that's fantastic

what a company um well what else can I answer for you today I I

think I've given you kind of a broad tour um are there any other things that you're

working on that we can help with um well from I don't know how many other

people are listening for me it would be handy if you had any instructions or a read-over of the ACC I've got the

instructions for the the desktop software but not the ACC as of yet

um we're just trying to figure out if there's a business case for that to be useful for us sure

um this is our support guide

so this isn't technically going live for another 12 days so that's funny they

kill me that I'm sending it to you but I think that it looks pretty done there's obviously

um things we know of that we're still adding and massaging but it's it's uh

it's going to give you a good idea of where we're headed with our support and if there are any topics that you feel

are missing or questions that are not answered let us know and we will just add it we've got

um you know a growing knowledge base a growing fact and the manual has been has gotten the you know the Lion's Share of

our Focus over the last couple of months um but yeah there's a section in here at

the end for the agent control center um and it just explains you know how to

set up your organizations how to set up your servers and clusters your cluster deployment what to do with jobs right

there's a concept of runs and there's you know multiple concurrent sessions in a run and then and then you can have

multiple runs that go into a job um you know so that's an important sort

of structure to understand um and you know if you wanted to do a single server deployment schedules

ticketing you know all of that kind of thing the thing for us we run very large scale

um you know web scraping operations so the Aging control center is critical to us and the reason why it's it's critical

is really because of this um agent control Repository

um so let me just pause this for a second get into the Asian control repository

yeah that's fine um just never know exactly what folder I'm

in um so for this uh this folder I'm going to show that there are multiple versions

for this agent that have been checked in um you know there's a little comment

with each of these and I can basically go roll back I can get a previous version or compare with the local

version um you know Etc and then once I have my versions in there then I can configure

all of these things I can configure job settings including rate limits

um schedules I can look at job history you know Etc I can look at that all the

information for the run I can I can um you know basically deploy the latest

to a cluster version so for example let's say you know if you're like us you have thousands of agents that you're

running and you've got uh you know all of these individual agent developers

that you have a whole process you want to follow with a Dev and a QA and a production server

you know you want to make sure that your data collection you know all those risks all those areas where your data

collection could go wrong or mitigated so this tool is basically helping to manage that whole process so you package

up your entire agent check it into the Version Control repository and then you have controls over

you know what gets deployed what version gets deployed where and all of the dependent

um you know all the dependencies that go with that agent that changes to your database schema and you know the newer

version of the you know tesseractocr library or whatever you're using

um you know that those things are are updated maybe you have inputs you have a list of cities and or zip codes and

you've updated them those inputs you know they they can be you know packaged

up with the same agent those are all things that are areas that really go go wrong and then managing your your

servers and deployments is the next big thing um there's an audit Trail so that you can

go back and figure out like what actually went wrong um you know if if anything did go wrong

so you can see in detail exactly what happened um there's a

um you know you can move uh these Agents from one repository to another or one

folder to another folder um Etc you can also go in and configure

um let's see so I'm logged in as demo so I can't see a lot of

information um so there's there's servers behind

here that I can't actually manage the leg Finish Line let's see

okay so I'm sorry the the tool is a bit new um and uh I can't what I'm trying to

show you um and I'm not I'm failing to show you is you can actually um like imagine you have huge beefy

servers and they're running hundreds of Agents on each um and then you need get to a place

where you need to upgrade your server software so how do you manage that process

um we can you can just literally on a server basis because it's all database backed you can right click pause

collection on that server right click upgrade the content Grabber

software right it supports remote upgrades of yours of your software

and then again just right click and resume and you know yeah so it's it's it's just

for us it's incredibly easy to manage our operation and the proxies you know

the proxy providers they all have different problems at different times so we always configure

them all separately and then we set up all of our pools um you know specific to you know what

the what the agent needs so if they need Au residential then they'll be an au residential pool

and then separately we'll be managing all the providers and the proxies so sometimes certain providers go down and

we'll just take them out of the pool and uh and we'd make that change in one place and all of the agents are updated

you know so it's so the way that we've componentized everything and and centralized the management of each each

of the different pieces really helps us um the other thing I wanted to say is if

you look in here in this folder we've got an agent we've got a config file

which um let me show you um so in here you can basically set up

in your config file you can set up all the types of things that you'll want to

set up on you know maybe there's maybe there's 50 agents in this folder but

they're all using the same config um so you can basically set up a config file per folder so this really helps you

manage your environment standardize your environment and enforce standards if this is how you're working because if

there is a config file the agent is going to inherit from the config file

um so this is a really nice thing as well to help manage a large scale

operation and then of course any credentials this this agent happens to

be writing to um you know an S3 bucket

um you know so I've got my S3 credentials stored here in the agent

repository but none of my agent developers actually

know what they are right so we have these hedge fund

um clients for example and they um they you know that they're they get

tremendous value out of the data that we provide um but you know the only the first

person who puts these credentials in here is is going to be the one who knows how to access that S3 bucket to get all

the data we don't you know for those guys we don't store the data we just offload it every day

um you know so we don't you know we don't have to worry about you know anybody making making a

bad choice we just don't don't really have to think about it we have them the credentials in the agent repository and

they're encrypted and uh and that's it the agent is going

to use that yeah thank you it's just all these

really convenient things and and things you know just to support a large scale operation and we're we're you know we

launched it originally last August and we've had an amazing response

um and we this is our second version it's basically taking the ACC out of the desktop software and putting it in the

browser um and that has had a great response and we're just gonna you know keep

innovating and keep um you know stretching ourselves and growing so if you have ideas on what uh

areas we could help you with and and we took in your feedback on on mapping and

pipes um that seems like an interesting module that might be useful uh

um I've got lots of feedback for no time unfortunately yeah well anytime you have

feedback just send it over and we'll we'll keep track of

it um just again just quickly before I go but thank you so much for the demo

um one one thing that would be incredibly handy to have now I don't know if it's just because I'm still on a

trial before I talk with Tony um is the how do I say this

uh some common Transformations just in the the transformation tool just you

know a couple of common ones so like uh something like split on a dollar symbol

to a another column or something like just some very basic transformations

can do that feel right now um let's go to let me get this out of the way are

you still seeing my screen yes let me well first of all let me get rid of this

navigation group so it doesn't matter um

actually it does because I want it to run well I'll show you so let's go down to something let's see

there's no price in here is there uh uh let's see

if you can do something else something simple something simple with a price

I'm looking at all these so I know I know about the Highlight certain things that you keep and then like there is

sort of some some magic in the background that happens um there's more when you go to do the

transformation screen itself like with your regex and c-sharp and everything like that if there was you don't have to

do that you don't have to do C sharp or anything like that let me see

um if I can just load these

uh now this is old basically what you're gonna do is you're gonna go into

um let me open this Amazon thing again you're going to go into here and you're

going to go into one of these um one of these things and you're

basically going to let's see if there's anything in here oh let's take out let's take out

let's work with the job path one and first of all I'm going to basically copy and then I'm going to paste

and then let's see I have to paste it up here don't I

want to paste it and then I'm going to you know put them next to each other so I don't get confused

okay and in the first one I'm gonna go in and I'm gonna write a simple uh

transformation and I'm basically going to say you know I don't want

you know I want to pull uh well again here it's like if you wanna if you want

to just pull this out you can basically write the regex for that now if you wanted to strip out all of the

um you know slashes from here and just have the text um I don't actually in that I think you

have to basically write um uh some simple regex for which I don't

actually uh have in my hand um and then so so one of these things could

be just you know you could pull out just the currency symbol or just the the

Slash and then the next one would be um you know without so you'd basically

duplicate the the currency field and then one of them you would you would

highlight that currency symbol generate the transformation pulling that out and the next one you would highlight only

the price without the currency symbol and um and generate the regex automatically

for that and highlight that out so you'd have basically you'd be splitting it into two Fields but um extracting you

know the two different portions of the price yeah so the feedback would be is that

all of all of that technically works it would be good if we had a repository of

just commonly used ones on sort of like examples or power um one of the ones that I've had to

write by hand is to join two strings together to to create a file name um so I had to do a bit of playing

around but just if there was somewhere that a repository of codes that your guys have you know had in but the past

or even if we have you know string and code examples on on the FAQ or development page

um because I have found that out in the greater world uh you you can get C sharp

and you can get regex but it sometimes doesn't quite work with what we've got here so if you had some internalized you

know scripts and samples that'd be just somewhere that'd be fantastic as well yep

will do anything else I I unfortunately no

unfortunately I have to run for another meeting but thank you so much the the demo on that was fantastic we're very very very happy

oh great oh I'm so glad please keep the questions coming and let us know if there's any other way we can help you

not a problem at all thank you very much for taking the time to share me that yep thank you glad to do it Heidi how about

you do you have anything else any other questions

so far no thank you for your demo okay thank you so much let us know if you

have any follow-on questions yeah sure okay good night

good night