GDP Policies – Part One

Guardium Data Protection Policy tutorial.

Not only basic stuff inside, should be helpful also for experienced Guardium administrators.

In Part One:

  • 00′:21” – Guardium Policy basics  – selective policy, rules, conditions and actions
  • 04′:46” – Simple policy – create, install, check
  • 08′:50” – Access control – technical account using inappropriate access vector
  • 13′:40” – Access control – identification access from unapproved workstation
  • 18′:30” – Access control – alerting the access of unknown user and DDL execution
  • 22′:43” – What exactly DB Name is?
  • 26′:25” – Access control – alerting DML’s in the application schema
  • 30′:48” – Masking sensitive information in the reports and alerts

The tutorial published on Guardiumnotes youtube channel  – GDE Policies – Part One


Guardium Reports Platform understanding (2)

Full SQL and SQL monitoring, deeper view on Access domain

For better understanding SQL entity we need to describe a little bit deeper the logging actions in Guardium policy.
My audit policy (selective audit trail) contains two rules.
2017-10-09_12-33-50It will log activity of syntaxuser1 using LOG action (LOG ONLY) and other traffic will be audited with details – LOG FULL DETAILS action.
I connected to postgreSQL database two times as a test and syntaxuser1
2017-10-09_13-08-34and these sessions are visible (right) but report based on Full SQL entity does not contain syntaxuser1 activity (left).
The reason is simple and understanding of this is very important to create accurate database monitoring policy and reports. The LOG ONLY action logs SQL constructs and does not audit full SQL body executed inside session.

So what does exactly LOG ONLY log?!

The LOG ONLY (it is also default action for non selective audit trail policies!) removes SQL parameter values from SQL body. For instance 3 SQL’s:

SELECT * FROM table WHERE columnX='value1'
SELECT * FROM table WHERE columnX='value2'
SELECT * FROM table WHERE columnX='value3'

are described as a one SQL construct

SELECT * FROM table WHERE columnX='?'

so audited activity based on LOG ONLY action allows identify syntax but it is not possible to present full body (if SQL contains parameters).
The main purpose of LOG ONLY action use is the meaningful decrease of disk space consumption by audited traffic because we do not need store each SQL and put only reference (Construct ID) to known by collector SQL constructs stored in SQL entity.

It is a good time to introduce very important entity – Access Period. Independently to FULL SQL flow (described in part 1) Guardium stores audited activity inside the hourly based sets named periods – we can visualize them as data partitions. Now we can discuss sense of this kind approach but it was historic decision (more that 10 years ago) based among others on cost of storage and CPU utilization.

Periods describe the all audited traffic on hour basis and simplify data partitioning and point executed in this timeframe SQL’s by Instance ID and Construct ID keys . So we can present data inside entities this way
gn23Data flow in 5 main entities:

  • policy makes decision to log activity (LOG ONLY or LOG FULL DETAILS)
  • if SQL is related to new session – new Session ID is registered and Access ID is attached to it or new connection profile is registered
  • system checks – is Session ID registered in the current period (current hour)?
    • False – new Instance ID is created in Access Period entity
  • if LOG ONLY action is used
    • the SQL is “anonymized” – parameter values are replaced by question mark
    • system checks existence of SQL construct in SQL Entity
      • False – new Construct ID is registered in SQL Entity
    • new record (access) is attached to current period (partition) in the Access Period entity – Access ID, relations to SQL (Construct ID) and Session (Session ID)
  • if LOG FULL DETAILS action is used
    • the SQL is registered in FULL SQL with reference to Session ID
    • SQL is “anonymized” and registered in Access Period (Instance ID) this same way like described for LOG ONLY action
    • Record in FULL SQL stores reference to Instance ID in Access Period

We should be aware some limitations if LOG ONLY action is used to audit session:

  • data are stored without parameter values
  • we cannot identify exact time of SQL execution, we can estimate time by reference to:
    • Session timestamps – between Session Start and Session End
    • Access Period – SQL execution inside partition, between Period Start and Period End
    • Access Period Timestamp – last execution of particular SQL construct inside period instance
  • SQL’s from one session can be located in many periods (partitions) if session involves many hours
  • Both logging actions can be used inside policy to audit activity from this same session (it is powerful) but it can lead to incorrect conclusions if we base on FULL SQL report only
  • The Period Start is used as a timestamp for Access Period Entity (the Timestamp field has not important value)

It should be also stressed that LOG FULL DETAILS action stores SQL in both entities Full SQL and SQL

So, from theory to practice 🙂

Example 1 – No space on disk, no data in reports

It happens when we use LOG ONLY action in our policies and we try to review activity in the report based on FULL SQL entity
SQL counter identifies thousands constructs (report based on SQL entity) but SQL syntax report is empty (based on FULL SQL) – my audit policy uses LOG ONLY action. If you log events using LOG ONLY action somewhere you should not report data based on FULL SQL to report them.

Example 2 – Where is timestamp for SQL entity?

I created a query based on main entity – SQL
and you should notice that there is no timestamp inside the available in SQL entity fields. What does it mean? Can we create report base on it and use time period specification for results?
Yes, we can, because query gathers timestamp from the closest (direct) relation if it does not exists in the main entity.
The direct relation for SQL entity is Access Period which use Period Start field as timestamp. It has a big influence on result 🙂

I connected to database two times and executed simple query in each session
2017-10-09_19-08-54then I tried to display my activity related to second session only (base on Start Date after 19:18:00)
and both sessions are visible. The explanation is simple – timestamp for this report bases on Period Start field what for the input value 19:18 defines the period 19:00-20:00 (7-8 am) where two Test user sessions happened.
There is no possibility to granular time different way (more in Example 5) because it is main entity dependent. How to display SQL’s belonging only to the second session? – we can use Session ID for example as a filter
2017-10-09_19-26-31I put Session ID of interesting me session in the added filter and “voila”2017-10-09_19-28-12

Example 3 – I do not see part of my SQL’s – situation 1

This time I executed 6 SQL’s but only two of them are displayed in my report based on SQL entity.
2017-10-09_22-16-45What I pointed before, the SQL entity stores constructs instead of FULL SQL body so SELECT 1, SELECT 2 and SELECT 3 are visible here as SELECT ? and 3 executions of SELECT now() also point only one SQL construct. Please notice also that Timestamp in Access Period entity points the time closest to the last execution of particular SQL construct.
Does it mean that we cannot identify the exact number of executed SQL’s if LOG ONLY action is used? – of course, we can 🙂 The number of occurrences of constructs are stored in Access Period and we can refer to it from SQL entity using entity counter (Add Count)
2017-10-09_22-29-07Now we have full information that my session contained two SQL constructs and each of them was executed three times.
The SQL entity does not store execution timestamp so order of executed constructs is unknown.

Example 4 – I do not see part of my SQL’s – situation 2

This time I executed 8 SQL’s inside session
2017-10-10_09-39-26and only four appear in the report. You should notice that my session lifetime covers 2 periods (08:00-09:00 and 09:00-10:00) but report time range refers only to the second one. In Session Start column we have information when session started and my period reference has to point to it if I would like to receive full session statistics

Example 5 – One hour granularity is not foxy

Guardium allows decrease the default access period time granularity from one hour to 1 minute even.
2017-10-21_10-55-31Do not forget Apply changes and Restart IE’s before. Here the Logging Granularity has been set to 10 minutes what is visible in the report below

Example 6 – SQL or Access Period as main entity?

The SQL entity as a main entity is used only when information about construct body is needed.  If we focus on quantitative analysis and interesting in the user behave the Access Period domain is much more efficient. Access Period is also the timestamp reference entity for very useful entities like Command and Object (I will focus on them in the next article about Guardium reporting)

Example 7 – How to see all SQL’s

It is common situation that Guardium policy mixes LOG ONLY and LOG FULL DETAILS actions inside rules. So only part of activity can be reported using FULL SQL entity. We do not need create complicated reports to summarize and analyze this diversified type of auditing because each fully monitored SQL is also stored inside SQL entity.
2017-10-21_11-28-35It should be now clear that FULL SQL refers to Access Period using Instance ID key and indirectly we can identify executed Construct ID.

Please remember that any audited activity in Guardium is always visible inside Access Period and SQL entities.
Any quantitative analysis should relies on them especially when not only LOG FULL DETAILS action is used inside policy rules.
The report data extraction works much more efficient if we use Access Period instead of heavy queries on FULL SQL entity.


Guardium Reports Platform understanding (1)

Part I – timestamps, main entity and entity relations in Access domain

I receive many questions about the correct report definition tied with misunderstanding the data relations in audit database and main entity selection.

The simplest answer is: the main entity defines relation between all entities inside Guardium query but I think that is not still clear for most readers 🙂

The Guardium query presents values from reporting domain. Please remember that query can refer only to one domain (for instance Access, Exception). If report should present data from more domains the Guardium allows do that using custom domain (it is not important here).

Inside domain the audited data are stored in fields which are grouped inside entities.

For example, domain Access contains 19 entities, each of them can contain dozens fields. Simplifying, we can imagine this structure as the fields which are the columns in the tables (entities) which are the part of tablespace (domain) and query can refer to one tablespace only. This comparison is very accurate because we have strict relations between entities: 1-1, 1-N, or N-1.

Example 1 – Timestamps and 3 main entity relations

Please assume that the Guardium policy audits SQL activity using “LOG FULL DETAILS” action in all examples here.

I created a new query in the Access domain with the main entity set to FULL SQL
2017-10-05_16-49-29Report defintion

Then I connected 3 times to postgres database and executed simple “select now()” command
2017-10-05_16-39-44Here we have the report points this activity based on my query (filtered by user name)2017-10-05_16-43-17I put in my report four timestamps – from Access, Session (Timestamp and Session Start) and FULL SQL entities. You can notice that Timestamp from Access domain has this same value not strictly related to execution time of my SQL’s. What does exactly points this value?
To understand this we need to treat the Access domain as a dictionary (referential data) of tuples which include information about Client IP, Server IP and DB user name where Timestamp inside points date when the particular tuple has been registered (appears first time) on the appliance. So, if I am focusing on SQL’s activity this timestamp has no value for me because does not point any information about connection time or sql execution. The “foreign key” matching other entities with Client/Server has name Access Id.
2017-10-05_17-45-24GN – Example 3 is the reports based on Client/Server entity and exemplifies connection with Session entity.

This same relation exists between Session and FULL SQL entities based on “foreign key” – Session ID

Finally we can present the relations inside 3 main entities this way
gn1If the new connection (session) is started the information about it is registered as a new record in Session entity. The access related information (IP addresses, user name, port, etc.) are referred from Client/Server (Access ID) and each new SQL from session stream is stored in FULL SQL with reference to Session by Session ID.

Now it should be clear that Timestamp from Full SQL is related to exact time when SQL was processed by database (my NTP configuration works well 😉 )
2017-10-05_20-40-54What about Timestamp is Session entity? Hmmm, even Guardium documentation suggests do not focus on it – “When tracking Session information, you will probably be more interested in the Session Start and Session End attributes that Timestamp attribute“.
I agree with that and suggest use the Session Start and Session End which point information about connection beginning and closure respectively.

Timestamps summary:

  • Timestamp from Client/Server entity – not related to session and SQL, refers to first appearance access description on Guardium appliance
  • Timestamp from Session entity – changeable during session lifetime, limited value to defined exact time of particular activity in the session
  • Session Start from Session entity – points when session started
  • Session End from Session entity – points when/if session was closed
  • Timestamp from FULL SQL entity – points when SQL has been executed

Example 2 – main entity selection

Now I have created the query with main entity – Client/Server
2017-10-05_21-41-22It means – that report will show data from Client/Server entity and any fields from other entities will work in the appropriate relation. You know that relations between Client/Server->Session and Client/Server->FULL SQL (indirect) are 1:N so we cannot present values and counter of events is suggested for Session Id and FULL SQL fields.2017-10-05_21-49-14Looks good but we will face two problems at once 🙂
This report suggests that shown Access ID tuple has been referred in 4 sessions and only two SQL’s are related to them – it is strange, how is it possible? Technically it is, but not here
2017-10-05_21-44-22The fact is that this tuple is related to 2 sessions and four SQL’s – opposite to shown values in my report!?

We received this output because my report counted values from two, external entities (Session and FULL SQL) and there is no direct relation between Client/Server and FULL SQL. The indirect relation was counted first with DISTINCT clause (value 2) and then sessions were summarized without DISTINCT (value 4).
This kind of problems are common situation if we do not understand the relations between entities.
I modified my query and removed FULL SQL counter2017-10-05_22-19-36and now the report shows correct number of sessions.

Now we can switch to the second challenge. Most of you have probably wondered why in most reports there is a time-based data filter
2017-10-05_22-27-31And now everything should be clear :). The time selection bases on the main entity Timestamp and appears if the entity contains it.
I noticed before that Timestamp in Access/Client entity points to information when Access tupple has been registered in the audit database on the appliance (not relevant in 99% situations). So, if I am looking for number of sessions in last hour the result set in my reports will look like this
2017-10-05_22-44-37because the tuple was created much more earlier.
It is the effect when we selected the incorrect main entity for our purpose. So, if we would like to display session (connection) related information our query should relies on this entity.
However, it was mentioned also, the Timestamp in Session entity is not valuable – is changing during session and does not provide well define point of time in the session lifetime. That is why the Guardium provides two virtual main entities in the Access domain
2017-10-05_22-54-27corresponding to Session Start and Session End inside session entity. Now I can create report which will count sessions in defined time based on Session Start timestamp
2017-10-06_15-31-32My query does not contain any field from session entity because my goal is to count sessions, so there is no sense put session details inside. I added sessions counter using the Add Count flag which indirectly adds a fourth field (Count of Sessions)
2017-10-05_23-02-16You see that my report based on the session entity lists all sessions started last hour (left) and report based on Client/Server entity is empty because referential data (Access ID) are stored earlier when particular connection information was identified first time (right).

My examples can lead to opinion that queries based on Client/Server entity are not valuable at all. Definitely there are usable but for well defined cases. For instance we would like to identify new connection profiles on database system – new tuples which were never before connected to our system. Base on this information we can identify anomaly in the access to protected resource –  new database clients have not seen before.

Client/Server, Session and Full SQL entities are base of most reports focused on detailed SQL activity. I hope that this article allows you create the requested query faster and deliver expected results.

In next article about reporting I will explain the difference between FULL SQL and SQL entity