Data classification (Part 2) – Classification policy rules

Continuation of the article – Data classification (Part1)

Classification policy builder

In this place we can create a new classification policy which is an element of classification process. One policy can be a member of many different processes.

Classification policy groups rules and manages relationship between them. To add a new policy go to Discover->Classifications->Classification Policy Builder opens Classification Policy Definition window

Classification process structure

where Name and policy Description can be specified.

Tip: Policy is not directly related with database where it will be executed. Use for name the literal which describe the analysis logic (for example: Find Sensitive Data in SAP environments)

Tip: Category and Classification labels are element of event content generated by Action rules. Use them to simplify the distinction events on this level

Info: List of categories is managed by Categories group (Group Type: Category)

Select Category, define Classification literal and Push Apply button

New Classification Policy

New Classification Policy

then push the activated Edit Rules button (Roles allows to define access to this policy by defined group of users, Add Comments provides possibility to add remarks in case of policy change)

New Rule invocation

New Policy

Classification Policy Rules manages the current list of rules inside particular policy. We will focus on this in the another section of this article

1

List of classification rules

Classification policy management

The Classification Policy Finder window displays list all existing policies. For each policy we can add comment or go to rules edition

1

Policy list

Four icons above policy list1.PNGallow add new policy, edit, create copy or remove selected one respectively. Policy copying opens Classification Policy Clone window where name of the source policy is preceded by Copy of literal. Save Clone button adds new policy to the list

1

Policy clone

We can remove policy which is not attached to classification process. In case of removal policy related with process a message will be displayed1In this situation you must first remove the process related with this policy or change policy reference in process to another one.

Policies trailed by time stamp in square brackets originated from end-to-end discovery process scenario1

Classification policy rules in detail

Each rule contains some identification fields: Name, Category, Classification and Description. Classification rule is an atomic element and his name should strictly defines its functionality (for example: e-mail address, US zip code). Classification Rule Type defines type of data which will be analyzed using this rule

1

Rule description and type selection

In most cases our DAM classification policies will refer to Search for Data rule.

Rules types:

  • Search for Data – tables, views, synonyms content analysis
  • Catalog Search – check existence of particular table or column name
  • Search for Unstructured Data – CSV, Text, HTTP|S, Samba in no DAM audit data (it is not related with FAM functionality)

Info: Do not mix rule type in the classification policy. It is not forbidden but it does not make sense in most cases

This simple rule will match AMEX credit card numbers using regular expression in the all tables, views and synonyms inside columns defined as a text (any text type supported by DB). Apply button adds rule to the policy

2

Simple rule definition

It activates New Action button in the Classification Rule Action section. Actions are described in third part of this article. Button Back returns context to the list of rules in the policy

1

Rules Action section

Each rule visible in the rule list can be quickly reviewed using small plus icon (show details)

1

Classification Policy Finder

1

Rule review – show details

To modify existing rule select the pencil icon

1

Edit rule icon

The balloon icon allows to add comments to rule (very useful for change management process)

2

Add comment icon

Order of rules in the policy can be changed easily, using move up/move down icons. These icons are active when policy contains minimum two rules

1

Policy list

The standard policy behavior is the processing of rules from top to down and policy makes verdict when some rule matches pattern. If rule is matched, the rest of them is not evaluated for currently object. Additional rule parameters can change this concept.

Buttons Unselect All and Select All allow group or ungroup rules in the view – used for rules removal (Delete Selected button).

Collapse All and Expand All help with fast review all rules.

Rule parameters review

Logically we can split parameters into 3 groups:

  • search scope
  • pattern
  • search behaviour

Search scope parameters

Table Type – defines types of objects included in the analysis:

  • Tables
  • Views (consider the performance influence on production environment in case of existence a huge number of unused and complex views)
  • Synonyms (not available for some database types)
  • System Tables (includes system objects)

Table Name Like – limits scope of search to defined object name pattern. Two wildcards allowed – % means string of any length, _ refers to one sign. Examples:

  • CARS – object with exact name CARS
  • C% – object names started from C
  • CAR_ – object names started from CAR and ended with any other sign (CARS, CARO, CARP)

If this parameter is empty all tables are analyzed.

Data Type – defines data type of columns which will be analyzed. They correspond with supported data type inside particular database engine (binary objects type are not analyzed at all)

  • Date
  • Number
  • Text

Column Name Like – limits scope to column names covered by defined pattern. Two wildcards allowed: % and _. Empty fields refer to all columns in the table.

Minimum Length, Maximum Length – refer to defined size of column (is not related with length of data stored in particular row). Sometimes used together to point the particular column size. Good practice is definition of minimum length to reduce number of analyzed columns when the minimum length of searched value can be assumed (for example 16 characters in credit card number).

Exclude Schema – restricts the scan area defined by data source on schema level. The parameter value points the group (Application Type – Classifier or Public, Group Type – Schema) contains list of schemes excluded from search.

In this example credit cards have been detected in 3 columns in dbo and glo schemas

3

Classification report

Rule modification excludes glo schema from search scope

1

Classification rule and schema exclusion group

and changes the classification results (lack any objects from glo schema)

2

Classification report

Exclude Table – restricts list of scanned tables defined by data source (if Table Name Like parameter is used in rule it is evaluated on the list tables created after Exclude Table evaluation). Exclusions defined by group reference (Application Type – Classifier or Public, Group Type – Object).

The classification returns 3 columns in 2 tables

1

Classification report

and after rule modification which excludes CC_NOK table

3

Classification rule and table exclusion group

the results report contains only two records from one table

2

Classification report

Exclude Table Column – restricts list of scanned columns defined by data source (if Column Name Like parameter is used in rule it is evaluated on the column list created after Exclude Table Column evaluation). Exclusions defined by group reference (Application Type – Classifier or Public, Group Type – Object/Field).

The classification returns 3 columns in including table CC_1 with column CC

1

Classification report

and after rule modification which excludes CC column from CC_1 table

2

Classification rule and table column exclusion group

excluded column disappeared from results report

1

Classification report

Limitation: The wildcards % and _ are prohibited in the all exclusion groups

Pattern parameters

Info: Only one pattern parameter can be used in a rule. Behavioral parameters can provide functionality to analyze this same column using different patterns.

Search Like – simple pattern based on two wildcards (% and _). Useful for constants, specific values or the part a more complex analysis based on set of rules.

Search Expression – analysis based on regular expression compliant with POSIX 1003.2 specification. Description and some examples available in the internal Guardium Help system – https://<appliance_IP:>8443/guardhelp/topic/com.ibm.guardium.doc/discover/regular_expressions.html

Expression can be inserted directly to field or validated using Regular Expression builder invoked by RE icon

1

Regular Expression builder icon

In the Regular Expression field we can insert pattern and check it correctness – put the value in Text to match against area and press Test
button

1

Regular expression builder

Message Match Found indicates that evaluated expression matches string, otherwise the message No Match Found is displayed.
The Accept button adds expression to the rule

1

Regular expression in rule builder

Regular expression builder offers also predefined patterns for credit cards and citizen identification number (for several countries). Select category

1

Predefined expression categories

and then select one of defined expression

2

List of predefined expressions

3

Selected expression

Guardium offers also special pattern tests for limited types of data related to parity or sumcheck control. For example check of credit card number according Luhn algorithm. This functionality can be switched on using special naming of classification rule – name has to start from guardium://CREDIT_CARD string.

For example in the two tables CC_OK and CC_NOK

CC_OK CC_NOK
4556237137622336 4556237137622335
4929697443528339 4929697443528338
3484057858101867 3484057858101866
4824520549635491 4824520549635490
3767010431320650 3767010431320659
4532861697794380 4532861697794389
5352437717676479 5352437717676478
4539522376654625 4539522376654624
5547728204654151 5547728204654150
5292779270461374 5292779270461373

we have strings represent 16-long numbers. Table CC_OK contains credit cards with correct checksum according Luhn algorithm in the opposition to table CC_NOK.

The policy based only on regular expression only

1

Find Credit Card (regexp only)

discovers both tables as a credit card numbers

1

Classification process structure

For policy with additional check the Luhn algorithm conformity

1

Find Credit Card (with checksum)

only CC_OK table has been recognized as an object with valid credit card numbers

1

Classification process structure

Other special patterns in rule name are described in Guardium Help system https://<appliance_IP:>8443/guardhelp/topic/com.ibm.guardium.doc/protect/r_patterns.html

Evaluation Name – the most powerful option in the classification analysis. It allows to create own validation function coded in Java (1.7 in G10 initial release) and implement any checks which cannot be covered by regular expressions.

For example we would like to find banking account numbers in IBAN notation (widely used in Europe) with control of sumcheck (modulo 97 from transformed number). This task cannot be managed by regular expression at all.

More about IBAN available on Wiki: IBAN

We need to create and compile class for package com.guardium.classifier.custom and implement interface Evaluation which must have one method evaluate() returning false or true.

This is example of code for IBAN evaluation

package com.guardium.classifier.custom;
import java.math.BigInteger;

public class iban implements Evaluation {
    public static final int IBANNUMBER_MIN_SIZE = 15;
    public static final int IBANNUMBER_MAX_SIZE = 34;
    public static final BigInteger IBANNUMBER_MAGIC_NUMBER = new BigInteger("97");
    public boolean evaluate(String accountNumber) {
        String newAccountNumber = accountNumber.trim();
        if (newAccountNumber.length() < IBANNUMBER_MIN_SIZE || newAccountNumber.length() > IBANNUMBER_MAX_SIZE) {
            return false;
        }
        newAccountNumber = newAccountNumber.substring(4) + newAccountNumber.substring(0, 4);
        StringBuilder numericAccountNumber = new StringBuilder();
        for (int i = 0;i < newAccountNumber.length();i++) {
            numericAccountNumber.append(Character.getNumericValue(newAccountNumber.charAt(i)));
        }
        BigInteger ibanNumber = new BigInteger(numericAccountNumber.toString());
        return ibanNumber.mod(IBANNUMBER_MAGIC_NUMBER).intValue() == 1;
    }
}

Compiled class must be uploaded to appliance (Setup->Custom Classes->Evaluations->Upload). Insert class Description and point file with compiled class. Approve upload using Apply button

1

Custom class upload

confirmation message about success should be displayed1 I have in my database table glottery.glo.bank_accounts where American (non-IBAN) and Polish (IBAN) bank accounts appear

1

glottery.glo.bank_accounts table

Now we can create new rule to find IBAN’s (full name of class)

2

Classification rule

which correctly identifies bank accounts including sumcheck

 

 

Tip: Use self-design evaluations to build the best-fit policy of identifying sensitive data.

Compare to Values in SQL – allows compare values in the sample with respect to the dictionary defined by SQL query.

Limitation: Dictionary has to exist on database where classification process is executed

For example we would like to find columns which contain short name of US states. The table dbo.CC_MAIL_STATE contains STATE column

1

Inside this same database engine exist table glo.STATES with list all states

2

This classification rule uses the list defined by SQL instruction:

SELECT short_name FROM Glottery.glo.States WHERE country=1
1

Classification rule

and identifies STATE column

1

Classification results

Please notice that classification process worked on CLEXAMPLES database only (scope defined by data source) and the dictionary source table is not in the result because is located in GLOTTERY database.

Use SQL instruction here has some limitations:

  • must start from SELECT (you cannot send DML or DDL)
  • should not contain semi-colon (you cannot group instructions)
  • referred object must use fully qualified name (for example database.schema.object for MS SQL)

Compare to Values in Group – compares column values to the list stored in Guardium group. The group must belong to Application Type PUBLIC or CLASSIFIER and Group Type OBJECTS. Small icon at the right side of group list allows create or modify dictionary

1

Create/Modify group

In this example the group GL_US_STATES is a list of all US states

3

Dictionary group

referred inside classification rule

1

Classification rule

returns list of columns where US states appear

1

Classification results

Search behavior parameters

“Fire only with” Marker – allows identify tables where two or more columns fulfill certain conditions.

For example we have two tables: CC_MAIL with credit cards and mail adresses

2

and the table CC_NAME where user names exist instead of mail address

1

If we will create two independent rules looking for credit card and mail address

1

Classification policy

the classification process returns only CC columns from both tables

1

Classification results

because first rule matched table and second one was not evaluated.

This time Continue on Match flag has been switched on

1

Rules list

and all credit card and mail columns has been identified

1

Classification results

In next policy both rules has been updated with this same Marker – CC_AND_MAIL

1

Rules list

and classification policy returns credit card and mail address columns from CC_MAIL table because only this table contains this patterns together

1

Classification process structure

Hit Percentagedetermines the percentage threshold of values in a sample that must meet the pattern that the rule will be classified as satisfied. If this field is empty the column will be classified even only one value in the sample matches the pattern.

Important: This parameter allows minimize number of false positive results in process of data classification.

The use of this parameter also adds in the results the information about the number of unique values in the sample that fulfill the requirements of the rule

1

Classification results

Show Unique Values, Unique Value Mask – attach the matched values to classification report. Only unique values are displayed and maximum 2000 of them per column can be included in the report

1

Classification rule

2

Classification report

If the attached values have sensitive nature that Unique Value Mask field allows to mask this data.

Mask must be regular expression which cover expected values and strictly defines the part which should be visible. Regular expression builder is also available to define and check its correctness. Part of regexp inside brackets () defines content of value which will be displayed in the report (for example .*([0-9]{4})[ ]{0,20}$ means that only last four meaningful digits will be displayed)

1

Classification rule

2

Classification report

Continue on Match, One match per column – the classification process flow focuses default on the identification of tables with sensitive data. Please consider the table with credit card, mail address and state

1

and Classification Policy with 4 rules (Continue on Match is switched off)

Document

Classification policy

Only one column from CC_MAIL_STATE table has been identified

1

Classification results

because first rule covered requirements and policy shift to next table. To change this situation the Continue on Match flags must be switched in the rule sequence on

1

Classification policy

what leads to expected behavior. All sensitive columns in CC_MAIL_STATE table have been discovered

2

Classification results

You should also notice that STATE column has been matched two times because two rules meet the requirements on it (what was expected here). However we can suppress multiple matching on one column using One match per column flag. To do that mark it in the first rule in the sequence worked on that column

1

Classification policy

Find State PL rule has not been matched the STATE column this time

1

Classification report

 Tip: In most cases the sensitive data classification procedure should point all columns where this type of data reside and Continue on Match flag should be switched for all rules in policy on.

Relationship discovery

Using simple trick we can also identify relationship between source data and other objects.

I have source table with users stored in glottery.glo.users table

1

glottery.glo.users table

where the primary key is id column and correct reference to users from other tables should refer to this value. I have created a rule

1

Rule classification

looking for numeric column with values must be matched with the list of id from source table (SELECT id FROM glottery.glo.users WHERE id<>0 AND id<>1). Clause WHERE omits values 0 and 1 which can be logical values in some referential tables. I have set the Hit Percentage on very high level 98% to ensure real relationship between analyzed object and users table.

 

Results clearly show that Users table is referred in 6 other tables

1

Classification results

 

Summary:

Guardium provides many different techniques to identify sensitive data. Good implementation relies on that. If we know where critical data resides the real time policies, correlations alerts, SIEM events will work correctly and point real threats.

Article continuation:

  • Part 3 – Action rules (soon)
  • Part 4 – Classification process and data sources (tbd)
  • Part 5 – End to End scenarios and Classification Automation (tbd)