Part I – timestamps, main entity and entity relations in Access domain
I receive many questions about the correct report definition tied with misunderstanding the data relations in audit database and main entity selection.
The simplest answer is: the main entity defines relation between all entities inside Guardium query but I think that is not still clear for most readers 🙂
The Guardium query presents values from reporting domain. Please remember that query can refer only to one domain (for instance Access, Exception). If report should present data from more domains the Guardium allows do that using custom domain (it is not important here).
Inside domain the audited data are stored in fields which are grouped inside entities.
For example, domain Access contains 19 entities, each of them can contain dozens fields. Simplifying, we can imagine this structure as the fields which are the columns in the tables (entities) which are the part of tablespace (domain) and query can refer to one tablespace only. This comparison is very accurate because we have strict relations between entities: 1-1, 1-N, or N-1.
Example 1 – Timestamps and 3 main entity relations
Please assume that the Guardium policy audits SQL activity using “LOG FULL DETAILS” action in all examples here.
I created a new query in the Access domain with the main entity set to FULL SQL
Then I connected 3 times to postgres database and executed simple “select now()” command
Here we have the report points this activity based on my query (filtered by user name)I put in my report four timestamps – from Access, Session (Timestamp and Session Start) and FULL SQL entities. You can notice that Timestamp from Access domain has this same value not strictly related to execution time of my SQL’s. What does exactly points this value?
To understand this we need to treat the Access domain as a dictionary (referential data) of tuples which include information about Client IP, Server IP and DB user name where Timestamp inside points date when the particular tuple has been registered (appears first time) on the appliance. So, if I am focusing on SQL’s activity this timestamp has no value for me because does not point any information about connection time or sql execution. The “foreign key” matching other entities with Client/Server has name Access Id.
GN – Example 3 is the reports based on Client/Server entity and exemplifies connection with Session entity.
This same relation exists between Session and FULL SQL entities based on “foreign key” – Session ID
Finally we can present the relations inside 3 main entities this way
If the new connection (session) is started the information about it is registered as a new record in Session entity. The access related information (IP addresses, user name, port, etc.) are referred from Client/Server (Access ID) and each new SQL from session stream is stored in FULL SQL with reference to Session by Session ID.
Now it should be clear that Timestamp from Full SQL is related to exact time when SQL was processed by database (my NTP configuration works well 😉 )
What about Timestamp is Session entity? Hmmm, even Guardium documentation suggests do not focus on it – “When tracking Session information, you will probably be more interested in the Session Start and Session End attributes that Timestamp attribute“.
I agree with that and suggest use the Session Start and Session End which point information about connection beginning and closure respectively.
- Timestamp from Client/Server entity – not related to session and SQL, refers to first appearance access description on Guardium appliance
- Timestamp from Session entity – changeable during session lifetime, limited value to defined exact time of particular activity in the session
- Session Start from Session entity – points when session started
- Session End from Session entity – points when/if session was closed
- Timestamp from FULL SQL entity – points when SQL has been executed
Example 2 – main entity selection
Now I have created the query with main entity – Client/Server
It means – that report will show data from Client/Server entity and any fields from other entities will work in the appropriate relation. You know that relations between Client/Server->Session and Client/Server->FULL SQL (indirect) are 1:N so we cannot present values and counter of events is suggested for Session Id and FULL SQL fields.Looks good but we will face two problems at once 🙂
This report suggests that shown Access ID tuple has been referred in 4 sessions and only two SQL’s are related to them – it is strange, how is it possible? Technically it is, but not here
The fact is that this tuple is related to 2 sessions and four SQL’s – opposite to shown values in my report!?
We received this output because my report counted values from two, external entities (Session and FULL SQL) and there is no direct relation between Client/Server and FULL SQL. The indirect relation was counted first with DISTINCT clause (value 2) and then sessions were summarized without DISTINCT (value 4).
This kind of problems are common situation if we do not understand the relations between entities.
I modified my query and removed FULL SQL counterand now the report shows correct number of sessions.
Now we can switch to the second challenge. Most of you have probably wondered why in most reports there is a time-based data filter
And now everything should be clear :). The time selection bases on the main entity Timestamp and appears if the entity contains it.
I noticed before that Timestamp in Access/Client entity points to information when Access tupple has been registered in the audit database on the appliance (not relevant in 99% situations). So, if I am looking for number of sessions in last hour the result set in my reports will look like this
because the tuple was created much more earlier.
It is the effect when we selected the incorrect main entity for our purpose. So, if we would like to display session (connection) related information our query should relies on this entity.
However, it was mentioned also, the Timestamp in Session entity is not valuable – is changing during session and does not provide well define point of time in the session lifetime. That is why the Guardium provides two virtual main entities in the Access domain
corresponding to Session Start and Session End inside session entity. Now I can create report which will count sessions in defined time based on Session Start timestamp
My query does not contain any field from session entity because my goal is to count sessions, so there is no sense put session details inside. I added sessions counter using the Add Count flag which indirectly adds a fourth field (Count of Sessions)
You see that my report based on the session entity lists all sessions started last hour (left) and report based on Client/Server entity is empty because referential data (Access ID) are stored earlier when particular connection information was identified first time (right).
My examples can lead to opinion that queries based on Client/Server entity are not valuable at all. Definitely there are usable but for well defined cases. For instance we would like to identify new connection profiles on database system – new tuples which were never before connected to our system. Base on this information we can identify anomaly in the access to protected resource – new database clients have not seen before.
Client/Server, Session and Full SQL entities are base of most reports focused on detailed SQL activity. I hope that this article allows you create the requested query faster and deliver expected results.
In next article about reporting I will explain the difference between FULL SQL and SQL entity