Guardium Reports Platform understanding (1)

Part I –¬†timestamps, main entity and entity relations in Access domain

I receive many questions about the correct report definition tied with misunderstanding the data relations in audit database and main entity selection.

The simplest answer is: the main entity defines relation between all entities inside Guardium query but I think that is not still clear for most readers ūüôā

The Guardium query presents values from reporting domain. Please remember that query can refer only to one domain (for instance Access, Exception). If report should present data from more domains the Guardium allows do that using custom domain (it is not important here).

Inside domain the audited data are stored in fields which are grouped inside entities.

For example, domain Access contains 19 entities, each of them can contain dozens fields. Simplifying, we can imagine this structure as the fields which are the columns in the tables (entities) which are the part of tablespace (domain) and query can refer to one tablespace only. This comparison is very accurate because we have strict relations between entities: 1-1, 1-N, or N-1.

Example 1 – Timestamps and 3 main entity relations

Please assume that the Guardium policy audits SQL activity using “LOG FULL DETAILS” action in all examples here.

I created a new query in the Access domain with the main entity set to FULL SQL
2017-10-05_16-49-29Report defintion

Then I connected 3 times to postgres database and executed simple “select now()” command
2017-10-05_16-39-44Here we have the report points this activity based on my query (filtered by user name)2017-10-05_16-43-17I put in my report four timestamps – from Access, Session (Timestamp and Session Start) and FULL SQL entities. You can notice that Timestamp from Access domain has this same value not strictly related to execution time of my SQL’s. What does exactly points this value?
To understand this we need to treat the Access domain as a dictionary (referential data) of tuples which include information about Client IP, Server IP and DB user name where Timestamp inside points date when the particular tuple has been registered (appears first time) on the appliance. So, if I am focusing on SQL’s activity this timestamp has no value for me because does not point any information about connection time or sql execution. The “foreign key” matching other entities with Client/Server has name Access Id.
2017-10-05_17-45-24GN – Example 3 is the reports based on Client/Server entity and exemplifies connection with Session entity.

This same relation exists between Session and FULL SQL entities based on “foreign key” – Session ID

Finally we can present the relations inside 3 main entities this way
gn1If the new connection (session) is started the information about it is registered as a new record in Session entity. The access related information (IP addresses, user name, port, etc.) are referred from Client/Server (Access ID) and each new SQL from session stream is stored in FULL SQL with reference to Session by Session ID.

Now it should be clear that Timestamp from Full SQL is related to exact time when SQL was processed by database (my NTP configuration works well ūüėČ )
2017-10-05_20-40-54What about Timestamp is Session entity? Hmmm, even Guardium documentation suggests do not focus on it – “When tracking Session information, you will probably be more interested in the Session Start and Session End attributes that Timestamp attribute“.
I agree with that and suggest use the Session Start and Session End which point information about connection beginning and closure respectively.

Timestamps summary:

  • Timestamp from Client/Server entity – not related to session and SQL, refers to first appearance access description on Guardium appliance
  • Timestamp from Session entity – changeable during session lifetime, limited value to defined exact time of particular activity in the session
  • Session Start from Session entity – points when session started
  • Session End from Session entity – points when/if session was closed
  • Timestamp from FULL SQL entity – points when SQL has been executed

Example 2 – main entity selection

Now I have created the query with main entity – Client/Server
2017-10-05_21-41-22It means – that report will show data from Client/Server entity and any fields from other entities will work in the appropriate relation. You know that relations between Client/Server->Session and Client/Server->FULL SQL (indirect) are 1:N so we cannot present values and counter of events is suggested for Session Id and FULL SQL fields.2017-10-05_21-49-14Looks good but¬†we will face two problems at once ūüôā
This report suggests that shown¬†Access ID tuple has been referred in 4 sessions and only two SQL’s are related to them – it is strange, how is it possible? Technically it is, but not here
2017-10-05_21-44-22The fact is that this tuple is related to 2 sessions and four SQL’s – opposite to shown values in my report!?

We received this output because my report counted values from two, external entities (Session and FULL SQL) and there is no direct relation between Client/Server and FULL SQL. The indirect relation was counted first with DISTINCT clause (value 2) and then sessions were summarized without DISTINCT (value 4).
This kind of problems are common situation if we do not understand the relations between entities.
I modified my query and removed FULL SQL counter2017-10-05_22-19-36and now the report shows correct number of sessions.

Now we can switch to the second challenge. Most of you have probably wondered why in most reports there is a time-based data filter
2017-10-05_22-27-31And now everything should be clear :). The time selection bases on the main entity Timestamp and appears if the entity contains it.
I noticed before that Timestamp in Access/Client entity points to information when Access tupple has been registered in the audit database on the appliance (not relevant in 99% situations). So, if I am looking for number of sessions in last hour the result set in my reports will look like this
2017-10-05_22-44-37because the tuple was created much more earlier.
It is the effect when we selected the incorrect main entity for our purpose. So, if we would like to display session (connection) related information our query should relies on this entity.
However, it was mentioned also, the Timestamp in Session entity is not valuable Рis changing during session and does not provide well define point of time in the session lifetime. That is why the Guardium provides two virtual main entities in the Access domain
2017-10-05_22-54-27corresponding to Session Start and Session End inside session entity. Now I can create report which will count sessions in defined time based on Session Start timestamp
2017-10-06_15-31-32My query does not contain any field from session entity because my goal is to count sessions, so there is no sense put session details inside. I added sessions counter using the Add Count flag which indirectly adds a fourth field (Count of Sessions)
2017-10-05_23-02-16You see that my report based on the session entity lists all sessions started last hour (left) and report based on Client/Server entity is empty because referential data (Access ID) are stored earlier when particular connection information was identified first time (right).

My examples can lead to opinion that queries based on Client/Server entity are not valuable at all. Definitely there are usable but for well defined cases. For instance we would like to identify new connection profiles on database system Рnew tuples which were never before connected to our system. Base on this information we can identify anomaly in the access to protected resource Р new database clients have not seen before.

Client/Server, Session and Full SQL entities are base of most reports focused on detailed SQL activity. I hope that this article allows you create the requested query faster and deliver expected results.

In next article about reporting I will explain the difference between FULL SQL and SQL entity


DAM in GDPR context

The rumor about GDPR provides to situation that customers receive messages that all existing security solution¬†have “something” for that :). It is good sale strategy but definitely painful tactics for Security Officers with limited budget and hard nut to consume before 25 May, 2018.

Here I would like to review GDPR requirements (AS-IS, because still the European Data Protection Council did not provide certification guideline) from DAM perspective and review the most popular questions tied with DAM in the GDPR context.

Where DAM cover GDPR requirements?

  • Article 5.1(f) – Data protection principles assumes protection against unauthorized or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organizational measures
    DAM is dedicated solution to monitor SQL stream Рgranular policies can narrow access only to accepted vectors, behavioral analysis identifies anomalies, prevention rules block suspicious activity,  SQL analysis dynamically masks data and even stops execution of dangerous commands.
    Administrative fines and possible civil actions should change the PI administrators approach and consider the manual organizational measures as not sufficient.
  • Article 5.2 – Data protection demonstration¬†based on manual processes is not efficient and sufficient.
    Only proactive or automatically reacting on correlated events solutions can cover GDPR requirements. DAM in addition to the SQL logging provides information about activity context (who, when, what), strong reporting capabilities to review analyzed incident quickly, policies which identify PI’s processing, quantitative analysis simplifies abnormal behave and self-learning engine discovering anomalies in standard access to monitored system.
    DAM blocking capabilities are unique to provide full control on privileged accounts and implement access control covering the segregation of duties demand.
  • Article 9 – Processing of special categories of personal data
    Sensitive personal data (racial or ethnic origins, political opinions, religious beliefs, genetic and biometric data and health condition or sexual orientation) included in PI’s administrator databases change strength of GDPR requirements in 2 places:

    • Article 30.5 – even company employs less that 250 workers, PI’s processing has to be recorded
    • Article 83.5(a) – administrative fines related to lack of compliance on silo with sensitive information are doubled

DAM data classification engine can identify sensitive information with minimum number of false positive results based on catalog, regular expression, dictionary or custom searches. Achieved results allow to focus on the most critical assets from GDPR perspective.
Classification and database discovery processes executed on schedule rapidly identify changes inside database schema and network assets.
Awareness where sensitive data are located is crucial to confirm efficiency of working processes for data pseudonymization and minimization.
Data lake monitoring can be implemented only in the largest corporation, the knowledge what should be protected is the first step before we spend the limited budget.

  • Article 24.1 and 24.2 – Data administrator duties
    These two articles impose a data protection obligation on the data controller as an auditable and controlled process. If we consider databases, data-warehouses, big data and file repositories the DAM was exactly created for this.
  • Article 28 – Data processor duties
    According to data processing on behalf of data controller (very common situation) the processor must guarantee that the access to PI’s takes place on written administrator authorization. Only data access monitoring can provide real access registry.
  • Article 30 – Records of processing activities – puts the requirement of the personal information access accountability
    Small companies will implement this goal by creating simple registry, based on manual data access description, sometimes enriched by approval workflow.
    However the low cost solution is tied with complexity of reporting and lack of non-repudiated registry so you should be considered better mechanism to register access to GDPR protected data.
  • Article 32.1(d) – Security of processing points vulnerability assessment and system hardening
    Popular platforms dealing with vulnerability assessment treat the relational databases harshly. DAM originated from RDBMS world provide rich checks and not only focus on CVE’s and standards (CIS, STIG). Based on years of experience it includes also analysis of SQL traffic, influence the configuration changes on the risk score, authorization snapshots and excessive rights identification.
    For most critical systems the DAM extension to existing VA solution in your environment can be very helpful.
  • Article 33.3(a) – Data breach notification¬†imposes on the subject not only the requirement for immediate notification (3 days).
    Breach notification should contain information about scale of the leakage or other type of incident. Only DAM solutions can identify this scope (SQL audit) and minimize damages related with data owners notification and possible fines.
    Be aware that:

    • DLP’s (agent and network) covers only data on workstation and remote acceses. What about local session on servers, are you sure that your DLP provides this same SQL structure and session context analysis as specialized to this purpose DAM solutions?
    • PIM’s monitor access of privileged users to production systems. They are not aware of SQL syntax and session context. PIM should be considered in GDPR compliance program but the real value is visible when DAM and PIM are integrated together (directly or on SIEM level).
  • Article 34 – Communication of a personal data breach to the data subject
    Technically DAM solutions are able to parse output of SELECT’s but usability of this functionality is limited. The size of outgoing stream is unpredictable and can lead to situation that monitoring system should have more hardware resources that monitored one (especially on data-warehouse).
    However DAM can provide list of SQL instructions executed inside suspicious session and simplify the recognition of the attack range.¬†In case of data modification (DML’s) audited SQL activity can directly identify changes and required remediation.

Does DAM provide protection for applications in GDPR context?

The 3-tier architecture of most applications (web client, application server, data store) anonymizes access to data on silo level. So we cannot identify application user on the SQL level basis only on the database user name which points the account from the pool of connections. However DAM can be configured to extract this information from SQL, JDBC encapsulation message, Web Server logs and other streams. In most cases this kind of integration requires additional implementation effort including in the worst case the application code change.
So, if the application user context is visible on DAM level we can utilize exactly it this same way like described earlier with two objections:

  • Never kill the session in the pool of connection because content of SQL stream inside belongs to many application users. Killed session will raise exceptions on application layer and reinitialize application session for thousands clients.
  • Never mask data or rewrite SQL in the pool of connection. Masked data in most cases will have inappropriate format and will lead to application exceptions. Even the masked data will have accepted format (data tokenization) the information receiver will not have idea about this fact and can made business or law decisions based on incorrect information – data masking for application should be implemented on application or presentation layer.
    The rewritten SQL inside SQL transaction can change it essence and leads to lost of data consistency.

DAM without application user context is still valuable in this stream to identify anomalies, errors, behavioral fluctuations using quantitative analysis.

Can DAM implement data pseudonymization?

Hmm, we should start from basic question – what pseudonymization is?
I saw many web articles which directly equals this word with data masking but I disagree with this approach.

GDPR defines pseudonymization as the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.

I treat this definition as¬†a consequence and continuation of the data minimization process. Briefly, if PI’s data will be separated from transactional ones (data minimization) the natural relation between these two stores (for example customerID) should be utilized in whole data process flow. Only on demand and with approval the customerID can be translated to form which identify the person.
Can DAM help here? – NOPE.
However the implementation of data minimization and pseudonymization for existing systems is tied with complete application redevelopment – who can afford it? So, only new GDPR-ready applications will come with this kind of functionality on board.

For existing systems we try to avoid the personal information identification using data masking and here DAM can be also helpful:

  • preproduction (test) data – why DAM instead of data tokenization?
  • access outside application stream:
    • masking of SELECT output – most DAM’s provides this functionality but efficiency is the main problem
    • query rewrite – very suitable and provides possibility to tokenize and encrypt data instead simple masking
  • access from application stream – what I mentioned earlier, application masking should be implemented on application or presentation layer only

Member states implementation of GDPR

GDPR is the regulation and unifies law in the European Union but in few regulation articles we can find some derogations. The good example is Article 9.4 where health records can be managed different way according to member state decision. Does it mean that my decision about scope and type of protection should be postponed until parliament implementation of the law?
Definitely, you should not wait because your data may contain personal information about citizens from another EU country and you can be sued based on his state law.

DAM and “right to be forgotten”

It is common question raised during DAM discussions.
The Article 17 introduces subject right to remove its personal information on request. DAM is monitoring solution and does not cooperate directly with DB engine (to cover SoD requirement) and has no authorization to modify data. So this simple explanation leads to only one correct answer on the titled question – DAM is not component which can be useful in case of implementation the citizen right to be forgotten.
By the way, who will agree with the data removal on system with thousands relations what can lead to lose data consistency which can be discovered one year later? I think that only new systems with fully implemented data minimization and psedonymization principles will be able to identify this right easy way. If all personal information are separated from transactions, the PI’s removal or simple encryption will provide suitable solution without any additional effort.

Administrative fines mantra

Have you seen any GDPR related article without remark about “huge fines up to 20 billions euro or 4% company turnover”?
Do you believe that your government will decide to kill local, average or small companies because of GDPR?
If your answers are negative you should consider much more interesting case. In the Article 82 the GDPR introduces citizen right to compensation with a body of appeal attached directly to EU Council. Many organizations seriously consider the costs of civil actions and its possible influence on business.

New type of ransomware
Standard ransomware based on data encryption is not efficient because victims pay rarely (privates are not able to pay large amount of money, backup exists, block of bitcoin account).
With GDPR the stolen data can be a simple way to force ransom from an organization wishing to avoid penalties and massive civil actions.
I think that data gathered actually from unaware companies are stored somewhere in the darknet to be starting package for new type of “business” next year. ūüė¶


DAM definitely should be considered as the important element of any GPDR compliance program because of:

  • PI processing monitoring
  • data classification
  • data masking
  • unauthorized data access protection
  • vulnerability assessment

and achieves the best value when it is integrated with PIM, IAM, Encryption and SIEM.


GIM video guideline

This video covers Guardium Installation Manager installation, configuration and administration.

Chapters timeline:

  1. Introduction 0’00”
  2. GIM installation 3’14”
  3. GIM self-upgrade 7’36”
  4. GIM deinstallation 12’07”
  5. GIM failover configuration 13’42”
  6. GIM deinstallation from data node 17’09”
  7. GIM in listener mode 18’14”
  8. GIM listener discovery and group activation 20’41”
  9. GIM reconfiguration 23’46”
  10. GIM report and GRDAPI calls 27’19”
  11. GIM Authentication 34’12”
  12. Installation with auto_set_tapip 42’35”
  13. Modules management 44’03”
  14. GIM on Windows 47’30”
  15. GIM troubleshooting ¬†– network problems 53’52”
  16. GIM¬†troubleshooting – GIM¬†restart 54’37”
  17. GIM troubleshooting – configuration file modification 55’12”
  18. GIM¬†troubleshooting – central log 57’03”
  19. GIM troubleshooting – managing standalone STAP installation by GIM 59’13”
  20. GIM¬†troubleshooting – global parameters 63’00”
  21. GIM¬†troubleshooting – process respawn 64’19”
  22. GIM¬†troubleshooting – IP-PR status 66’26”
  23. Dynamic groups in GIM – 67’45”



GIM is very useful service. Eases Guardium implementation, administration and has effect of lowering TCO. It is implemented secure way in Client-Server architecture.

Some portal places wait for rebuilding to use new framework – module parameters settings especially.


Guardium definitions – GIM reports (Clients Status and Installed Modules) with assigned GRDAPI functions and mapped attributes, GIM Dashboard (All 4 important reports together, refer to reports attached as first position)

Network ports list used in the Guardium communication –¬†

GIM module states¬†(Querying module states) –¬†

Using sudo during GIM installation¬†–¬†

Agent convention naming –¬†

Operating system upgrade –¬†

How To Install GIM Client On Unix Server? –¬†

Uninstall Guardium UNIX S-TAP and GIM manually –¬†

GIM Server Allocation –¬†


GIM is not available on z/OS (zLinux is supported) and iSeries (aka AS/400).

Hidden reporting domains, no possibility to modify reports and create alerts based on it.



I am using sudo to install GIM – the sudoers file configuration is not a part of Guardium.


We have 2 GRDAPI commands responsible for module uninstallation:

gim_uninstall_module Рallows remove pointed module in define date on clientIP

If date is omitted the module is set for deinstallation but second command can initiate it –¬†gim_schedule_uninstall – in defined date


It¬†is also possible the installation of STAP from command line with self-registration in GIM service. This article describes it and assumes that GIM client has been installed and registered before –¬†



Appliance installation and configuration video guideline

This video contains set of appliance (collector, aggregator) installation scenarios and covers Guardium configuration in standalone and enterprise architecture.

I would not like to split it to many small parts so the specific tasks are pointed below with time:

  • Introduction – 0’00”
  • VM Template – 2’47”
  • Simple collector installation – 4’38”
  • Installer boot options – 8’05”
  • Appliance with software disk encryption – 9’36”
  • Appliance with software RAID – 12’20”
  • Simple aggregator installation – 15’44”
  • Basic network configuration – 16’38”
  • Time and timezone configuration – 20’03”
  • Hostname and domainname setup – 21’50”
  • VMWare tools installation – 22’58”
  • License installation in standalone configuration – 24’41”
  • Personal administration account creation – 28’20”
  • Manual appliance patching – 30’49”
  • Central Manager configuration – 40’22”
  • License installation on CM – 41’50”
  • CM backup configuration – 43’48”
  • Shared Secret – 45’31”
  • Unit registration – 46’36”
  • Remote patching from Central Manager – 48’11”
  • Summary – 52’18”

If are you looking for guidelines in other areas leave me message.

Direct link:


  • On hardware appliances (delivered by IBM) the default passwords are changed from “guardium” to mentioned in this technote¬†(added 17-01-2017)
  • The largest disk space manageable by appliance is 16 TB (added 27-01-2017)

Guardium 10.1.2 – review

GPU installation

Similar to most patches it has to be installed from top to down within existing Guardium domain:

  1. Central Manager
  2. Backup Central Manager (do synchronization)
  3. Aggregators
  4. Collectors

The GPU 200 requires that the healthcheck patch 9997 is installed. 10.1.2 update can be installed on the top of any version of Guardium 10.

GPU will reboot appliance. Existing VM Tools will be automatically assign to new RedHat kernel.

Note: Consider appliance rebuild in case to use EXT-4 filesystems introduced with new ISO installer

View/Edit Mode in Dashboards

Now each dashboard opened in the GUI session works in View mode.


Dashboard in View mode

The view mode is useful in order to better use the GUI space for data, especially when dashboard is informational only.
From my point of view the Guardium administrators will not happy with that because it is not ergonomic in case of data investigation. However if dashboard has been switched to Edit mode this settings are saved in the current session.

Much more usable would be the possibility to store dashboard settings permanently per dashboard.

Deployment Health Dashboard extensions

Each new GPU adds more to Deployment Health view. Besides existed:
Deployment Health Table – notifies simple way the overall appliance status


Deployment Health Table

Deployment Health Topology – shows connectivity and topology


Deployment Health Topology

Enterprise S-TAP View – displays information about S-TAP’s across whole Guardium infrastucture


Enterprise S-TAP view

the new GPU provides:

System Resources – located in Manage->Central Management which collates information about key resources on appliances.


System Resources

Deployment Health Dashboard – customizable dashboard focused on appliance resources and performance statistics


Deployment Health Dashboard

Owing to Managed Unit Groups it is possible to create dynamic views filtered by group of appliances or focus on selected one. Statistics contain reference to Analyzer and Logger queues, buffer space, memory and disk usage and sniffer restarts.
Additionally Events timeline report presents discovered issues, it can be enriched by alerts gathered from appliances. The alert definition contains additional fields to set up result for dashboard:


Alert defintion

Data Classification Engine – task parallelization

In large environment with hundreds of databases the Guardium classification engine limitation to execute only one job in queue was very painful. Current version allows parallelize this tasks on appliance. In most cases the classification is managed on aggregators or central manager where CPU utilization is on low level, so now with new flag configured by GRDAPI we can faster and more frequently review data content.

grdapi set_classification_concurrency_limit limit=<your_limit>

The maximum limit has to be lower than 100 and not higher that numbers of available on appliance CPU cores multiplied by 2.

If you created classification policy based on many databases like this:


Classification datasources

you should change it to set of separate policies executed concurrently:


Separated datasources to different policies

Then if you start a few classification processes together they will executed parallel:


Classification Job Queue

File Activity Monitoring update

Policy builder for Files allows to create many actions per monitored resource. Now we can define different behavior in case of read, modify of file deletion.


File policy rule

The UID chain field from Session entity provides the context of user and process which is responsible for file operation.


File Activity Report

At least we have File Activity reports available out of the box


File Activity Reports

but I suggest to create the clone of the File Activities report and sort values in descending order using timestamp and sqlid (session timestamp does not ensure that events will displayed in correct order)


File Activity query definition

New appliance installer

New ISO installer simplifies the installation process of new appliances (no need to apply GPU 100 and 200). It also removes problem with new GDP licenses support on appliance below GPU 100.

The 10.1.2 installer creates EXT-4 linux filesystems and extends maximum size of supported storage. If you would like to use larger disks on the appliance the rebuild procedure is needed (GPU200 does not convert  EXT-3 to EXT-4).

FSM driver deactivation on Linux/Unix

New STAP’s for Linux/Unix supports support new TAP section parameter in guard_tap.ini:


where 0 means that FSM driver is not activated.

Only manual guard_tap.ini modification is supported at this moment.

Outlier detection (behavioral analysis) – new capabilities

Outlier detection is available for file activity now. On the appliance only one, DAM or FAM, functionality can be activated.

Behavioral analysis can be switched on aggregators. It allows analyze user behavior from wider view.

View, reports and new anomaly types introduced – significant update.

Entitlement Optimization

This GPU introduces completely new user authorizations analysis engine. Besides the old Entitlement Reports we can utilize the Entitlement Optimization tool which retrieves user roles and privileges based on direct connection to database and identified DDL commands. The tool presents the changes in the the database authorizations,


Entitlement Optimization – What’s New

reports all existing users and theirs authorizations,


Entitlement Optimizations – Users & Roles

recommends changes and vulnerabilities,


Entitlement Optimizations – Recommendations

shows entitlements per user, object or DML operation and provides possibility to analyze what-if scenarios.

Very promising extension which clarifies the view on authorizations. It supports MSSQL and Oracle (in first release) and the analysis is based from collector perspective.

GDPR Accelerator

New GDPR accelerator simplifies Guardium configuration to comply with new EU regulation which focuses on EU citizens rights in the protection of their personal data.

According to GDPR Guardium helps with:

  • personal data identification
  • monitoring of the personal data processing
  • vulnerabilities identification
  • identification of breaches
  • active protection of access by unauthorized users or suspicious sessions
  • keep the whole compliance policy updated and working as a process


    GDPR Accelerator

New Data Nodes support

GPU 200 introduced the STAP support for HP Vertica Big Data platform, Cloudera Navigator monitoring using Kafka cluster and HortonWorks with Apache Ranger – another step to supreme Guardium in Big Data platform monitoring.

Also MemSQL – very fast in-memory DB – is supported now.

Data in-sight

New type of audited data representation available – Data In-Sight – in the Investigation Board (formerly QuickSearch) . Data access in motion in 3D-view – simple example

Summary: Important step to manage data access monitoring easier and more transparent for non-technical users. GPU mainly focused on extensions exiting functionalities and make them more usable and stable.



Central Manager in HA configuration

Central Management is one of the key functionality which simplifies Guardium implementation and lowers TCO. Possibility to patch, update, reconfigure and report across hundreds monitored databases is strong advantage.

Guardium implements this feature by selection one of the aggregators as a Central Manager (CM). All other Guardium infrastructure units communicate with it and synchronize information. However the CM inaccessibility disrupts this process and does not allow normal environment management. To cover these problems from version 9 the Guardium introduced the CM backup feature.

It covers two main problems:

  • planned CM shutdown (patching, upgrade)
  • CM failure

The CM backup configuration and switching between primary and secondary units need to be managed correctly to avoid problems on collector and aggregator layer.

General consideration for backup CM:

  • main CM (primary) and CM backup (secondary) need to be accessible by all appliances in the administration domain
  • quick search and outlier detection configuration should be checked after changes on CM level
  • switching between CM’s sometimes requires reassigning licenses

Note: Examples in this article refer to simple Guardium infrastructure with 4 units:

  • CM Primary (cmp,
  • CM Backup (cmb,
  • Collector 2 (coll2,
  • Collector 3 (coll3,

CM Backup registration

This procedure sets one of the aggregators belonging to Guardium management domain as a backup CM and sends this information to all units.

Only aggregator with this same patch level as primary CM can be defined as backup CM. It means that the same general, hotfix, sniffer and security patches should be installed on both machines.


Patch list on CM primary (cmp)


Patch list on aggregator (cmb)

Screenshots above present that both units have exactly this same patches on board. If the patch level will not be this same the aggregator cannot be promoted to backup CM role.

Note: Patch level refers to this same version of Guardium services, MySQL, Redhat and  sniffer. If one unit was patched in sequence Р1,4,20,31,34 and the second Р20,31,34 they are on this same patch level because patches 1 and 4 are included in patch 20

To point aggregator as a backup CM on primary CM go to Manage->Central Management->Central Management and push Designate Backup CM button


Central Management view (cmp)

The pop-up window will display all aggregators which covers this same patch level with CM. Then select an aggregator and push Apply button


backup CM selection (cmp)

Simple message will inform that task tied with backup CM started and process can be monitored

Unfortunately “Guardium Monitor” dashboard does not exist in version 10. Simple summary of this process can be monitored in “Aggregation/Archive Log” or you can create report without any filters to see all messages.

Here link to query definition – Query Definition

This same information is stored in log turbine_backup.log on CM

mysql select SQLGUARD_VERSION result is 10.0
logme   act_name= 'CM Backup' act_success='1' act_comment='Starting system backup with CM_SYNC 0'  act_day_num='now()' act_dumpfile='' act_header='1' 
****** Sun May 22 10:40:00 CEST 2016 ************
function do_cm_sync
write md5 to cm_sync_file.tgz.md5
scp: /opt/IBM/Guardium/scripts/scp.exp cm_sync_file.tgz aggregator@

Synchronization can be monitored also on backup CM aggregator in import_user_tables.log

Sun May 22 12:56:05 CEST 2016 - Import User Tables started
unit  is secondary CM
 move /var/IBM/Guardium/data/importdir/cm_sync_file.tgz.tmp to /var/IBM/Guardium/data/importdir/cm_sync_file.tgz 
number of table in DIST_INT and DATAMART tables = 19
calling /opt/IBM/Guardium/scripts/
Sun May 22 12:56:13 CEST 2016 - Handle agg tables started
Sun May 22 12:56:14 CEST 2016 - Handle agg tables finished
Sun May 22 12:56:14 CEST 2016 - Import User Tables done

Synchronization is repeated with backup CM in the schedule defined under Managed Unit Portal User Synchronization

From this perspective the right thing to be considered synchronization repeated every few hours. In case of planned downtime of the CM I suggest invoke synchronization manually using Run Once Now button.

If the process finished successfully on the all units except backup CM the information about HA configuration will visible in Managed Unit list – IP addresses both CM’s

Important: To avoid “split brain” problems ensure that all managed units had possibility to refresh list of CM’s every time when IP address pair is changing

Information about list of managed units and their health status can be reached on primary CM within Central Management view

or inside Managed Units report

Promoting backup CM as a primary

Note: Switching CM functionality to a secondary server is the manual task but can be remotely instrumented using GRDAPI.

This task can be invoked from portal on a backup CM from Setup->Central Management->Make Primary CM


Confirmation the promotion CM as primary server

or from CLI using GRDAPI command

grdapi make_primary_cm

Output from this task is located in load_secondary_cm_sync_file.log on a backup CM

2016-05-20 22:56:11 - Import CM sync info. started
2016-05-20 22:56:11 -- invoking last user sync. 
2016-05-20 22:56:22 -- unit  is secondary CM, continue 
2016-05-20 22:56:27 -- file md5 is good, continue
2016-05-20 22:58:33 -- file decrypted successfuly, continue 
2016-05-20 22:59:10 -- file unzipped successfuly, continue 
2016-05-20 22:59:10 -- unzipped file is from version 10 beforeFox=0  
2016-05-20 22:59:28 -- Tables loaded to turbine successfully
2016-05-20 22:59:28 -- not before fox  
2016-05-20 22:59:48 - copied custom classes and stuff 
2016-05-20 22:59:50 -- Import CM sync info done

After a while portal on all managed units including promoted aggregator will be restarted and we are able to see new location of primary CM (old CM will disappear from this list)

also synchronization activity will be visible on new CM

The list of units on new CM does not contain old CM to avoid “split brain”

Warning: I randomly noticed on promoted CM lack of licenses but all previously licensed features were active. However if keys will disappear they should be applied immediately

Finally new CM has been defined and all managed units updated this information.

Reconfiguration the old primary CM to get backup CM role

If a new CM promotion has been made when CM primary was active and communicated with appliances it will stop synchronization and list managed appliances on it will be empty

If promotion is related to CM failure, the old CM after restart will communicate with new one and refresh information about current status of administration domain- after few minutes the list of managed units will be cleared too.

Guardium does not provide automatic role replacement between CM’s. It requires sequence of steps.

To remove CM functionality from orphaned CM the CLI command need to be executed

delete unit type manager

It changes the appliance configuration to standalone aggregator. Then we can join it to administration domain again but this time the domain is managed by new CM (below example of registration from CLI on cmp)

register management <new_CM_ip_address> 8443

Now the old CM has aggregation function and can be delegated to get backup CM role


backup CM selection

After this task both CM’s have reversed roles

Units patching process

Guardium administration tasks will require CM displacement only in case of the critical situation. There is no need to switch to backup CM in case of standard patching (especially when hundreds appliances will¬†switch between CM’s). Even¬†patch forces system reboot or¬†stop critical services on updated unit for minutes, the temporary unavailability of unit will not stop any crucial Guardium environment functions (except temporary managed units portal unavailability). So realistic patching process should look like:

  1. patch CM
  2. patch  CM backup
  3. synchronize CM and CM backup
  4. patch other appliances in the CM administration domain.

“Split brain” situation management

Primary CM failure is not managed automatically. However this situation will be notified on all nodes during access to portal

I suggest use your existing IT monitoring system to check health of CM units using SNMP or other existing Guardium interfaces to identify problems faster and invoke new CM promotion remotely by GRDAPI.

Standard flow for manage CM failure is:

  1. Analyze CM failure
  2. If system can be restored do that instead of switch to CM Backup (especially in large environments)

If system cannot be restored:

  1. Promote backup CM to primary role
  2. Setup another aggregator as CM backup

Despite limited portal functionality on orphaned nodes the backup CM allows promote it also from GUI

I have tested two “split brain” scenarios (in small test conditions):

  • CM failure and reassign it to backup CM
  • start the stopped collector when backup CM has been promoted and old one is still unavailable

In both cases after few minutes primary CM and collector identified situation and correctly managed connection to infrastructure.


Central Manager HA configuration is an important feature to avoid breaks in the monitoring. Its design and implementation is good however some issues with license management and new quick search features should be covered in new releases.

Data classification (Part 2) – Classification policy rules

Continuation of the article – Data classification (Part1)

Classification policy builder

In this place we can create a new classification policy which is an element of classification process. One policy can be a member of many different processes.

Classification policy groups rules and manages relationship between them. To add a new policy go to Discover->Classifications->Classification Policy Builder opens Classification Policy Definition window

Classification process structure

where Name and policy Description can be specified.

Tip: Policy is not directly related with database where it will be executed. Use for name the literal which describe the analysis logic (for example: Find Sensitive Data in SAP environments)

Tip: Category and Classification labels are element of event content generated by Action rules. Use them to simplify the distinction events on this level

Info: List of categories is managed by Categories group (Group Type: Category)

Select Category, define Classification literal and Push Apply button

New Classification Policy

New Classification Policy

then push the activated Edit Rules button (Roles allows to define access to this policy by defined group of users, Add Comments provides possibility to add remarks in case of policy change)

New Rule invocation

New Policy

Classification Policy Rules manages the current list of rules inside particular policy. We will focus on this in the another section of this article


List of classification rules

Classification policy management

The Classification Policy Finder window displays list all existing policies. For each policy we can add comment or go to rules edition


Policy list

Four icons above policy list1.PNGallow add new policy, edit, create copy or remove selected one respectively. Policy copying opens Classification Policy Clone window where name of the source policy is preceded by Copy of literal. Save Clone button adds new policy to the list


Policy clone

We can remove policy which is not attached to classification process. In case of removal policy related with process a message will be displayed1In this situation you must first remove the process related with this policy or change policy reference in process to another one.

Policies trailed by time stamp in square brackets originated from end-to-end discovery process scenario1

Classification policy rules in detail

Each rule contains some identification fields: Name, Category, Classification and Description. Classification rule is an atomic element and his name should strictly defines its functionality (for example: e-mail address, US zip code). Classification Rule Type defines type of data which will be analyzed using this rule


Rule description and type selection

In most cases our DAM classification policies will refer to Search for Data rule.

Rules types:

  • Search for Data – tables, views, synonyms content analysis
  • Catalog Search – check existence of particular table or column name
  • Search for Unstructured Data – CSV, Text, HTTP|S, Samba in no DAM audit data (it is not related with FAM functionality)

Info: Do not mix rule type in the classification policy. It is not forbidden but it does not make sense in most cases

This simple rule will match AMEX credit card numbers using regular expression in the all tables, views and synonyms inside columns defined as a text (any text type supported by DB). Apply button adds rule to the policy


Simple rule definition

It activates New Action button in the Classification Rule Action section. Actions are described in third part of this article. Button Back returns context to the list of rules in the policy


Rules Action section

Each rule visible in the rule list can be quickly reviewed using small plus icon (show details)


Classification Policy Finder


Rule review – show details

To modify existing rule select the pencil icon


Edit rule icon

The balloon icon allows to add comments to rule (very useful for change management process)


Add comment icon

Order of rules in the policy can be changed easily, using move up/move down icons. These icons are active when policy contains minimum two rules


Policy list

The standard policy behavior is the processing of rules from top to down and policy makes verdict when some rule matches pattern. If rule is matched, the rest of them is not evaluated for currently object. Additional rule parameters can change this concept.

Buttons Unselect All and Select All allow group or ungroup rules in the view – used for rules removal (Delete Selected button).

Collapse All and Expand All help with fast review all rules.

Rule parameters review

Logically we can split parameters into 3 groups:

  • search scope
  • pattern
  • search behaviour

Search scope parameters

Table Type – defines types of objects included in the analysis:

  • Tables
  • Views (consider the performance influence on production environment in case of existence a huge number of unused and complex views)
  • Synonyms (not available for some database types)
  • System Tables (includes system objects)

Table Name Like – limits scope of search to defined object name pattern. Two wildcards allowed – % means string of any length, _ refers to one sign. Examples:

  • CARS – object with exact name CARS
  • C% – object names started from C
  • CAR_ – object names started from CAR and ended with any other sign (CARS, CARO, CARP)

If this parameter is empty all tables are analyzed.

Data Type – defines data type of columns which will be analyzed. They correspond with supported data type inside particular database engine (binary objects type are not analyzed at all)

  • Date
  • Number
  • Text

Column Name Like – limits scope to column names covered by defined pattern. Two wildcards allowed: % and _. Empty fields refer to all columns in the table.

Minimum Length, Maximum Length – refer to defined size of column (is not related with length of data stored in particular row). Sometimes used together to point the particular column size. Good practice is definition of minimum length to reduce number of analyzed columns when the minimum length of searched value can be assumed (for example 16 characters in credit card number).

Exclude Schema – restricts the scan area defined by data source on schema level. The parameter value points the group (Application Type – Classifier or Public, Group Type – Schema) contains list of schemes excluded from search.

In this example credit cards have been detected in 3 columns in dbo and glo schemas


Classification report

Rule modification excludes glo schema from search scope


Classification rule and schema exclusion group

and changes the classification results (lack any objects from glo schema)


Classification report

Exclude Table – restricts list of scanned tables defined by data source (if Table Name Like parameter is used in rule it is evaluated on the list tables created after Exclude Table evaluation). Exclusions defined by group reference (Application Type – Classifier or Public, Group Type – Object).

The classification returns 3 columns in 2 tables


Classification report

and after rule modification which excludes CC_NOK table


Classification rule and table exclusion group

the results report contains only two records from one table


Classification report

Exclude Table Column – restricts list of scanned columns defined by data source (if Column Name Like parameter is used in rule it is evaluated on the column list created after Exclude Table Column evaluation). Exclusions defined by group reference (Application Type – Classifier or Public, Group Type – Object/Field).

The classification returns 3 columns in including table CC_1 with column CC


Classification report

and after rule modification which excludes CC column from CC_1 table


Classification rule and table column exclusion group

excluded column disappeared from results report


Classification report

Limitation: The wildcards % and _ are prohibited in the all exclusion groups

Pattern parameters

Info: Only one pattern parameter can be used in a rule. Behavioral parameters can provide functionality to analyze this same column using different patterns.

Search Like – simple pattern based on two wildcards (% and _). Useful for constants, specific values or the part a more complex analysis based on set of rules.

Search Expression – analysis based on regular expression compliant with POSIX 1003.2 specification. Description and some examples available in the internal Guardium Help system – https://<appliance_IP:>8443/guardhelp/topic/

Expression can be inserted directly to field or validated using Regular Expression builder invoked by RE icon


Regular Expression builder icon

In the Regular Expression field we can insert pattern and check it correctness – put the value in Text to match against area and press Test


Regular expression builder

Message Match Found indicates that evaluated expression matches string, otherwise the message No Match Found is displayed.
The Accept button adds expression to the rule


Regular expression in rule builder

Regular expression builder offers also predefined patterns for credit cards and citizen identification number (for several countries). Select category


Predefined expression categories

and then select one of defined expression


List of predefined expressions


Selected expression

Guardium offers also special pattern tests for limited types of data related to parity or sumcheck control. For example check of credit card number according Luhn algorithm. This functionality can be switched on using special naming of classification rule – name has to start from guardium://CREDIT_CARD string.

For example in the two tables CC_OK and CC_NOK

4556237137622336 4556237137622335
4929697443528339 4929697443528338
3484057858101867 3484057858101866
4824520549635491 4824520549635490
3767010431320650 3767010431320659
4532861697794380 4532861697794389
5352437717676479 5352437717676478
4539522376654625 4539522376654624
5547728204654151 5547728204654150
5292779270461374 5292779270461373

we have strings represent 16-long numbers. Table CC_OK contains credit cards with correct checksum according Luhn algorithm in the opposition to table CC_NOK.

The policy based only on regular expression only


Find Credit Card (regexp only)

discovers both tables as a credit card numbers


Classification process structure

For policy with additional check the Luhn algorithm conformity


Find Credit Card (with checksum)

only CC_OK table has been recognized as an object with valid credit card numbers


Classification process structure

Other special patterns in rule name are described in Guardium Help system https://<appliance_IP:>8443/guardhelp/topic/

Evaluation Name – the most powerful option in the classification analysis. It allows to create own validation function coded in Java (1.7 in G10 initial release) and implement any checks which cannot be covered by regular expressions.

For example we would like to find banking account numbers in IBAN notation (widely used in Europe) with control of sumcheck (modulo 97 from transformed number). This task cannot be managed by regular expression at all.

More about IBAN available on Wiki: IBAN

We need to create and compile class for package com.guardium.classifier.custom and implement interface Evaluation which must have one method evaluate() returning false or true.

This is example of code for IBAN evaluation

package com.guardium.classifier.custom;
import java.math.BigInteger;

public class iban implements Evaluation {
    public static final int IBANNUMBER_MIN_SIZE = 15;
    public static final int IBANNUMBER_MAX_SIZE = 34;
    public static final BigInteger IBANNUMBER_MAGIC_NUMBER = new BigInteger("97");
    public boolean evaluate(String accountNumber) {
        String newAccountNumber = accountNumber.trim();
        if (newAccountNumber.length() < IBANNUMBER_MIN_SIZE || newAccountNumber.length() > IBANNUMBER_MAX_SIZE) {
            return false;
        newAccountNumber = newAccountNumber.substring(4) + newAccountNumber.substring(0, 4);
        StringBuilder numericAccountNumber = new StringBuilder();
        for (int i = 0;i < newAccountNumber.length();i++) {
        BigInteger ibanNumber = new BigInteger(numericAccountNumber.toString());
        return ibanNumber.mod(IBANNUMBER_MAGIC_NUMBER).intValue() == 1;

Compiled class must be uploaded to appliance (Setup->Custom Classes->Evaluations->Upload). Insert class Description and point file with compiled class. Approve upload using Apply button


Custom class upload

confirmation message about success should be displayed1 I have in my database table glottery.glo.bank_accounts where American (non-IBAN) and Polish (IBAN) bank accounts appear


glottery.glo.bank_accounts table

Now we can create new rule to find IBAN’s (full name of class)


Classification rule

which correctly identifies bank accounts including sumcheck



Tip: Use self-design evaluations to build the best-fit policy of identifying sensitive data.

Compare to Values in SQL – allows compare values in the sample with respect to the dictionary defined by SQL query.

Limitation: Dictionary has to exist on database where classification process is executed

For example we would like to find columns which contain short name of US states. The table dbo.CC_MAIL_STATE contains STATE column


Inside this same database engine exist table glo.STATES with list all states


This classification rule uses the list defined by SQL instruction:

SELECT short_name FROM Glottery.glo.States WHERE country=1

Classification rule

and identifies STATE column


Classification results

Please notice that classification process worked on CLEXAMPLES database only (scope defined by data source) and the dictionary source table is not in the result because is located in GLOTTERY database.

Use SQL instruction here has some limitations:

  • must start from SELECT (you cannot send DML or DDL)
  • should not contain semi-colon (you cannot group instructions)
  • referred object must use fully qualified name (for example database.schema.object for MS SQL)

Compare to Values in Group – compares column values to the list stored in Guardium group. The group must belong to Application Type PUBLIC or CLASSIFIER and Group Type OBJECTS. Small icon at the right side of group list allows create or modify dictionary


Create/Modify group

In this example the group GL_US_STATES is a list of all US states


Dictionary group

referred inside classification rule


Classification rule

returns list of columns where US states appear


Classification results

Search behavior parameters

“Fire only with” Marker – allows identify tables where two or more columns fulfill certain conditions.

For example we have two tables: CC_MAIL with credit cards and mail adresses


and the table CC_NAME where user names exist instead of mail address


If we will create two independent rules looking for credit card and mail address


Classification policy

the classification process returns only CC columns from both tables


Classification results

because first rule matched table and second one was not evaluated.

This time Continue on Match flag has been switched on


Rules list

and all credit card and mail columns has been identified


Classification results

In next policy both rules has been updated with this same Marker – CC_AND_MAIL


Rules list

and classification policy returns credit card and mail address columns from CC_MAIL table because only this table contains this patterns together


Classification process structure

Hit Percentagedetermines the percentage threshold of values in a sample that must meet the pattern that the rule will be classified as satisfied. If this field is empty the column will be classified even only one value in the sample matches the pattern.

Important: This parameter allows minimize number of false positive results in process of data classification.

The use of this parameter also adds in the results the information about the number of unique values in the sample that fulfill the requirements of the rule


Classification results

Show Unique Values, Unique Value Mask – attach the matched values to classification report. Only unique values are displayed and maximum 2000 of them per column can be included in the report


Classification rule


Classification report

If the attached values have sensitive nature that Unique Value Mask field allows to mask this data.

Mask must be regular expression which cover expected values and strictly defines the part which should be visible. Regular expression builder is also available to define and check its correctness. Part of regexp inside brackets () defines content of value which will be displayed in the report (for example .*([0-9]{4})[ ]{0,20}$ means that only last four meaningful digits will be displayed)


Classification rule


Classification report

Continue on Match, One match per column – the classification process flow focuses default on the identification of tables with sensitive data. Please consider the table with credit card, mail address and state


and Classification Policy with 4 rules (Continue on Match is switched off)


Classification policy

Only one column from CC_MAIL_STATE table has been identified


Classification results

because first rule covered requirements and policy shift to next table. To change this situation the Continue on Match flags must be switched in the rule sequence on


Classification policy

what leads to expected behavior. All sensitive columns in CC_MAIL_STATE table have been discovered


Classification results

You should also notice that STATE column has been matched two times because two rules meet the requirements on it (what was expected here). However we can suppress multiple matching on one column using One match per column flag. To do that mark it in the first rule in the sequence worked on that column


Classification policy

Find State PL rule has not been matched the STATE column this time


Classification report

 Tip: In most cases the sensitive data classification procedure should point all columns where this type of data reside and Continue on Match flag should be switched for all rules in policy on.

Relationship discovery

Using simple trick we can also identify relationship between source data and other objects.

I have source table with users stored in glottery.glo.users table


glottery.glo.users table

where the primary key is id column and correct reference to users from other tables should refer to this value. I have created a rule


Rule classification

looking for numeric column with values must be matched with the list of id from source table (SELECT id FROM glottery.glo.users WHERE id<>0 AND id<>1). Clause WHERE omits values 0 and 1 which can be logical values in some referential tables. I have set the Hit Percentage on very high level 98% to ensure real relationship between analyzed object and users table.


Results clearly show that Users table is referred in 6 other tables


Classification results



Guardium provides many different techniques to identify sensitive data. Good implementation relies on that. If we know where critical data resides the real time policies, correlations alerts, SIEM events will work correctly and point real threats.

Article continuation:

  • Part 3 ‚Äď Action rules (soon)
  • Part 4 ‚Äď Classification process and data sources (tbd)
  • Part 5 ‚Äď End to End scenarios and Classification Automation (tbd)