Guardium Insights 2.0.2 – installation cookbook on bare metal

The central reporting of activity in DAM systems is measured against the problem of huge amounts of data and the necessity of long data retention enforced by regulations, as well as correct identification of anomalies in user behavior through quantitative analysis.
Guardium Insights (GI) is a response to these needs and is designed as a container service implementable in both private and public clouds.
The following procedure brings together the various installation and configuration steps required to install Guardium Insights 2.0.2 from the scratch. The whole is divided into 7 main tasks:

1. Infrastructure

Guardium Insights 2.0.2 is a containerized application working on OpenShift (OCP) 4.3 cluster. For it to work properly we have to use existing OCP infrastructure or create it from scratch. The following procedure assumes only availability of DNS and NTP server in IT infrastructure. All other elements including OCP cluster will be installed by us.

Previous versions of Guardium Insights worked on the basis of OCP 3.11. Due to the lack of upgrade procedure for version 4.x it requires installing a cluster from scratch. It is possible to transfer data from GI 2.0.1 to version 2.0.2.

We need 9 machines (in my case a virtual environment) with the requirements specified in the table below.
In my environment there is already a Windows machine running, which will be LDAP and DNS service provider.
The bootstrap machine is only needed temporarily and its resources will be released.

load balancer – must work on layer 4, for test purposes this installation will use HA Proxy
bastion – dedicated workstation for installation, configuration and management of GI environment (Open Shift installation utility is supported only for Linux and MacOS)
bootstrap – machine responsible for setup OpenShift cluster Control Plane, will be removed after cluster installation
masters – 3 machines responsible for storing OpenShift cluster configuration and provides cluster management infrastructure
workers – GI requires minimum three machines to serve its services
nfs server – audited by GI data are managed in DB2 Datawarehouse which is storing them on NFS share

rolevCPURAMstorageOS
load balancer
(HAProxy)
14 GB/ 20 GBany
bastion14 GB/ 80-120 GBlinux, mac os
(oc tools, docker)
bootstrap4 16 GB/ 80 GBRHCOS
3 x master416 GB/ 120 GBRHCOS
3 x worker2472+ GB/ 120 GB
raw 300-400 GB
RHCOS or RHEL 7
nfs server28 GB/ 20 GB
raw 200+ GB
any

All commands or their sequences entered with command line assume that they are executed from the root user’s home directory on the bastion machine. If a different context is required, this will be clearly indicated.

2. DNS setup

Proper name resolution is key to the operation of services in the cluster. Cluster nodes must have a dedicated submission, so I suggest using a private DNS with forwarding public queries. Due to simplicity of configuration I chose Windows environment but you can use for example bind6 server.
My private domain is guardium.notes where I created ocp4 subdomain.

The following table shows the list of A records of machines used in this procedure:

rolenamedomainiphas PTR
dnsadpguardium.notes192.168.100.10Y
bastionbastionguardium.notes192.168.50.11Y
load balancerproxy
api
api-int
*.apps
ocp4.guardium.notes
ocp4.guardium.notes
ocp4.guardium.notes
ocp4.guardium.notes
192.168.50.10
192.168.50.10
192.168.50.10
192.168.50.10
Y
Y
Y
Y
nfs serverginfsguardium.notes192.168.100.31Y
bootstrapbootocp4.guardium.notes192.168.50.200Y
master 1m1
etcd-0
ocp4.guardium.notes
ocp4.guardium.notes
192.168.50.201
192.168.50.201
Y
N
master 2m2
etcd-1
ocp4.guardium.notes
ocp4.guardium.notes
192.168.50.202
192.168.50.202
Y
N
master 3m3
etcd-2
ocp4.guardium.notes
ocp4.guardium.notes
192.168.50.203
192.168.50.203
Y
N
worker 1w1ocp4.guardium.notes192.168.100.101Y
worker 2w2ocp4.guardium.notes192.168.100.102Y
worker 3w3ocp4.guardium.notes192.168.100.103Y

Note that each master node has two A records. Names etcd-0, etcd-1, etcd-2 must indicate the machines performing the management function. However, PTR (reverse domain) entries should not be created for these records as this may cause them to be used to register node names in the cluster.

Additionally, each master node must have a service locator record (SRV) – _etcd-server-ssl defined.

In DNS Manager, to add such a record you have to right-click on the cluster domain level and select Other New Records… and then select Service Location (SRV). Enter the service name (_etcd-ssl-service), priority (0), weight (10) and port (2380). In the Host offering this service field, enter the full cluster node names for configuration services. The operation is repeated for each name (etcd-0.ocp4.guardium.notes, etcd-1.ocp4.guardium.notes, etcd-2.ocp4.guardium.notes).

3. Bastion setup

The bastion is a safe place for cluster administration. In Openshift 4.x, most installation and configuration tasks do not require direct login to the nodes.
Additionally in our procedure the bastion will be used as a file server and executor of Ansible automation scripts.
If we would use the bastion only for administrative purposes it can be implemented on Windows, Mac OS or Linux operating system. My scenario, however, assumes using the bastion also as a place to centralize the installation and additionally using as called by RedHat 7 (RHEL) operating system worker and therefore in this scenario also this operating system should be the base of our bastion.

I recommend that all installed machines should have a simplified filesystem scheme with one large root file system covering the entire disk space.
The procedure for installing RHEL with such settings can be found for example in cookbook for version 2.0.1.

Bastion machine specification:

We will store installation files on bastion so 80+ GB for root filesystem is recommeded (to simplify things I suggest set filesystem to 200 GB).
So the assumption is that we have freshly installed RHEL 7 machine with minimum package selection and configured network.
Quickly we will register machine into RH network and attach subscription with Ansible, update system and restart it use the newest kernel.

subscription-manager register --username <redhat_account>  --password <redhat_account_password>
subscription-manager refresh
subscription-manager list --available --matches '*OpenShift*'
subscription-manager attach --pool=<pool_with_ansible_repo>
subscription-manager repos --disable="*"
subscription-manager repos --enable="rhel-7-server-rpms" --enable="rhel-7-server-extras-rpms" --enable="rhel-7-server-ansible-2.8-rpms" --enable="rhel-7-server-ose-4.3-rpms"
yum -y update
shutdown -r now

In turn, we will install all the packages that will be useful while building our environment.

yum -y install openshift-ansible openshift-clients jq httpd docker nfs-utils nc

4. Load balancer and NFS server setup

Setup two more machines – I will use for load balancer and NFS server the CentOS 7 operating system.

Machine specifications:

To manage machines out of the OCP cluster we create a new ssh keys and locate it in non-default path (for example ~/.ssh/aux).

mkdir -p ~/.ssh/aux
ssh-keygen -t rsa -b 4096 -N '' -f ~/.ssh/aux/id_rsa

Then we copy this key onto new machines.

for h in proxy ginfs; do ssh-copy-id -i ~/.ssh/aux/id_rsa.pub $h; done

where “proxy ginfs” represents names of your load balancer and NFS server.

From security reason we should have machine updated. In most cases update will lead to a new kernel installation so I suggest restart machines also.

for h in proxy ginfs; do ssh -i ~/.ssh/aux/id_rsa $h yum -y update; done
for h in proxy ginfs; do ssh -i ~/.ssh/aux/id_rsa $h shutdown -r now; done

Production environments use highly available, commercial load balancers, and this is a strong recommendation for the production installation.
In test environments where there are no buisness continuity requirements, we can use any solution to disperse traffic at TCP/IP Layer 4 level. I decided to use HA Proxy which is easy to configure.

Now we create HAProxy configuration file to switch traffic for OCP cluster:

cf=haproxy.cfg
masters="<bootstrap_ip> <master1_ip> <master2_ip> <master3_ip>"
workers="<worker1_ip> <worker2_ip> <worker3_ip>"
configs="openshift-api-server machine-config-server ingress-http ingress-https"
ports=(6443 22623 80 443)
echo -e "global\n log 127.0.0.1 local2\n chroot /var/lib/haproxy\n pidfile /var/run/haproxy.pid\n maxconn 4000\n user haproxy\n group haproxy\n daemon\ndefaults\n mode http\n log global\n maxconn 3000\n retries 3\n option forwardfor except 127.0.0.0/8" > $cf
for o in http-server-close redispatch; do echo " option $o" >> $cf; done
for o in "http-request 10s" "queue 1m" "connect 10s" "client 1m" "server 1m" "http-keep-alive 10s" "check 10s"; do echo " timeout $o" >> $cf; done
i=0;for c in $configs; do j=0;for d in frontend backend; do echo "$d $c" >> $cf; if [ $d = "frontend" ]; then echo -e " bind *:${ports[$i]}\n default_backend $c\n mode tcp\n option tcplog" >> $cf; let "i=i+1"; else echo -e " balance source\n mode tcp" >> $cf; if [ $j -le 1  ]; then if [ $i -le 2 ]; then k=0; for f in $masters; do echo -e " server m$k $f:${ports[$i-1]} check" >> $cf; let "k=k+1"; done; else k=1; for f in $workers;do echo -e " server w$k $f:${ports[$i-1]} check" >> $cf; let "k=k+1"; done; fi; fi; fi; done; done

where:
masters – IP address list of all OCP master nodes including bootstrap (put bootstrap first) – "192.168.50.200 192.168.50.201 192.168.50.202 192.168.50.203"
workers – IP address list of all OCP workers – "192.168.100.101 192.168.100.102 192.168.100.103"

Commands above should create a new haproxy.cfg file similar to this:

[root@bastion ~]# cat haproxy.cfg
global
 log 127.0.0.1 local2
 chroot /var/lib/haproxy
 pidfile /var/run/haproxy.pid
 maxconn 4000
 user haproxy
 group haproxy
 daemon
defaults
 mode http
 log global
 maxconn 3000
 retries 3
 option forwardfor except 127.0.0.0/8
 option http-server-close
 option redispatch
 timeout http-request 10s
 timeout queue 1m
 timeout connect 10s
 timeout client 1m
 timeout server 1m
 timeout http-keep-alive 10s
 timeout check 10s
frontend openshift-api-server
 bind *:6443
 default_backend openshift-api-server
 mode tcp
 option tcplog
backend openshift-api-server
 balance source
 mode tcp
 server m0 192.168.50.200:6443 check
 server m1 192.168.50.201:6443 check
 server m2 192.168.50.202:6443 check
 server m3 192.168.50.203:6443 check
frontend machine-config-server
 bind *:22623
 default_backend machine-config-server
 mode tcp
 option tcplog
backend machine-config-server
 balance source
 mode tcp
 server m0 192.168.50.200:22623 check
 server m1 192.168.50.201:22623 check
 server m2 192.168.50.202:22623 check
 server m3 192.168.50.203:22623 check
frontend ingress-http
 bind *:80
 default_backend ingress-http
 mode tcp
 option tcplog
backend ingress-http
 balance source
 mode tcp
 server w1 192.168.100.101:80 check
 server w2 192.168.100.102:80 check
 server w3 192.168.100.103:80 check
frontend ingress-https
 bind *:443
 default_backend ingress-https
 mode tcp
 option tcplog
backend ingress-https
 balance source
 mode tcp
 server w1 192.168.100.101:443 check
 server w2 192.168.100.102:443 check
 server w3 192.168.100.103:443 check

We install additional RH packages on proxy server

ssh -i ~/.ssh/aux/id_rsa <load_balancer_FQDN> yum -y install haproxy policycoreutils-python

There is some commands to configure load balancer and selinux.

ssh -i ~/.ssh/aux/id_rsa <load_balancer_FQDN> setsebool -P haproxy_connect_any=1
ssh -i ~/.ssh/aux/id_rsa <load_balancer_FQDN> semanage permissive -a haproxy_t
ssh -i ~/.ssh/aux/id_rsa <load_balancer_FQDN> 'for p in 6443 22623 80 443; do firewall-cmd --permanent --add-port=$p/tcp; done'
ssh -i ~/.ssh/aux/id_rsa <load_balancer_FQDN> firewall-cmd  --reload
ssh -i ~/.ssh/aux/id_rsa <load_balancer_FQDN> firewall-cmd  --list-all

Then move the HAProxy configuration file to a proxy server and start load balancer.

ssh -i ~/.ssh/aux/id_rsa <load_balancer_FQDN> mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.orig
scp -i ~/.ssh/aux/id_rsa haproxy.cfg <load_balancer_FQDN>:/etc/haproxy/haproxy.cfg
rm -rf haproxy.cfg
ssh -i ~/.ssh/aux/id_rsa <load_balancer_FQDN> systemctl start haproxy
ssh -i ~/.ssh/aux/id_rsa <load_balancer_FQDN> systemctl enable haproxy
ssh -i ~/.ssh/aux/id_rsa <load_balancer_FQDN> systemctl status haproxy

Our NFS server on CentOS requires two more software packages

ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> yum install -y nfs-utils sgdisk

An additional drive on the nfs server must be configured and its space made available for Guardium Insights. It will store audit data (DB2 DWH and MongoDB backends).
For a production installation, I recommend using at least the array configured in RAID 5 or 10. It is also possible to configure with highly available NFS with synchronization via Distributed Replicated Block Device (DRBD).
Also consider commercially available highly available NFS environments (e.g. NetApp). In our case we will add simple raw partition on ginfs server.
To identify the correct path to our block disk device we use command lsblk.

ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> lsblk

The size of the storage space must be adapted to the needs of the Guardium Insights installation. In large implementations we can deal with tens or hundreds of terabytes, and in the case of tests we need only a few hundred gigs.
My disk has 4 TB capacity and we refer to it as block device with /dev/sdb path.

Standard disc partitioning mechanisms allow you to create partitions of a maximum size of 2TB.
Therefore, we will use a customized GPT (GUID Partition Table).
For this purpose, it is worth to reset the entire disc (it is an irreversible operation).

ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> sgdisk --zap-all /dev/<device>

Then using the parted tool we create partition that occupy the entire disk.
The command is interactive and we enter:

  • mklabel gpt – sets the partition type tag
  • print – we display the current settings, the total size of the disk is given
  • mkpart primary 0GB <disk_size>GB – create a partition by specifying its maximum size displayed above
  • q – finish work with a parted tool
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> parted /dev/<device>
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> lsblk

Create a filesystem on a newly created partition (/dev/sdb1 in my case) for NFS server resources.

ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> mkfs.xfs /dev/<partition_device>

We create a mounting point for the new drive, set permissions and configure SELinux

ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> mkdir -p /storage/gi
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> chgrp -R nfsnobody /storage
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> chmod -R 777 /storage
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> semanage fcontext -a -t nfs_t \"/storage/gi\(\/\.\*\)\?\"
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> restorecon -Rv /storage/gi
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> cat /etc/fstab
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> 'echo "/dev/<partition_device> /storage/gi xfs defaults 0 0" >> /etc/fstab'
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> cat /etc/fstab

and then mount it to the NFS server permanently.

ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> mount -a
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> mount

We configure and run the NFS server so that only cluster nodes maintaining application services (by IP address scope) can be clients of our shared file system.

remote_accesses="<access_network1> <access_network_n>"
for h in $remote_accesses; do ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> echo "/storage/gi $h\(rw,sync,no_subtree_check,no_root_squash\) >> /etc/exports"; done
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> 'exportfs -r'
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> exportfs
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> systemctl start nfs-server
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> systemctl enable nfs-server
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> firewall-cmd --permanent --add-service={mountd,nfs,rpc-bind}
ssh -i ~/.ssh/aux/id_rsa <NFS_server_FQDN> firewall-cmd --reload

We check if the NFS server is visible with its resources.

showmount --exports <NFS_server_FQDN>

5. Setup cluster nodes templates

If we use a virtual environment, we need to prepare machine templates for bootstrap and three master nodes.
The bootstrap server will be removed from the infrastructure after successful installation of the OCP cluster.

Bootstrap template

Master template

In case of three RHEL 7-based workers, virtual machines should be created according to the bastion scenario (with configured network and DNS settings).

Worker template

6. OpenShift setup

OpenShift bootstrap and masters nodes must be built using RHCOS images. Operating system is installed over network in case of bare metal installation. We have two possibilities of system boot (I will describe the first option):

  • from ISO
  • using PXE
OCP installation files

OCP installation on RHCOS servers requires that configuration of our cluster will be delivered in special format named as ignition files. Follow next steps to generate them correctly.

Create folder in your bastion home directory

mkdir ocp-inst

In browser login on your RedHat account. In case of licensed version of OpenShift your account must be entitled to register cluster. Not-entitled user can freely install cluster with 60 day trial license.
https://cloud.redhat.com/openshift/install

Select your preferred method (Bare Metal) of cluster installation and then download secret (pull-secret.txt). Place this file into ocp-inst directory.

Guardium Insights 2.0.2 supports version 4.3 of OCP that is why we must download the installation files for this release. At the moment when I work on this topic the last stable version of the cluster is 4.3.8. By default, the RedHat page will offer installation files for newer distributions. Therefore it is necessary to download the files from the archive directories.

From directory https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.3.8/ we download the OCP client and installation utilities specific for your bastion operating system: openshift-client-linux-4.3.8.tar.gz and openshift-install-linux-4.3.8.tar.gz.
Both files place into your ocp-inst directory.

For bar metal installations two more files are required from https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/4.3.8/rhcos-4.3.8-x86_64-metal.x86_64.raw.gz and rhcos-4.3.8-x86_64-installer.x86_64.iso.

First file copy to bastion ocp-inst directory. The second one will be used later to boot our cluster nodes in the operating system installation procedure.

Ignition files generation

Ignition is the utility that is used by RHCOS to manipulate disks during initial configuration. It completes common disk tasks, including partitioning disks, formatting partitions, writing files, and configuring users. In case of OpenShift it is also responsible for orchestration of the early stage the cluster setup.

Extract installation and oc utilities from gzip archives

tar xvf ocp-inst/openshift-install-linux-4.3.8.tar.gz openshift-install
mv openshift-install ocp-inst
tar xvf ocp-inst/openshift-client-linux-4.3.8.tar.gz oc kubectl
mv oc kubectl /usr/local/bin
oc version
ocp-inst/openshift-install version

Create access ssh key for OCP cluster

ssh-keygen -t rsa -b 4096 -N '' -f  ~/.ssh/id_rsa

This key will be shared accross your cluster. It should be properly protected against loss and theft.
The .ssh directory contains both private (id_rsa) and public (id_rsa.pub) parts.

Create ocp4/install-config.yaml with your cluster architecture description (yaml files must be correctly indented)

mkdir ocp4
domain=<your_main_domain>
cluster_subdomain=<subdomain_for_your_cluster>
private_network=<cluster_private_broadcast_network>
private_network_subnet=<subnet_for_broadcast_network>
secret=\'`cat ocp-inst/pull-secret.txt`\'
sshkey=\'`cat .ssh/id_rsa.pub`\'
echo -e "apiVersion: v1 \nbaseDomain: $domain\ncompute: \n- hyperthreading: Enabled\n  name: worker\n  replicas: 0\ncontrolPlane:\n  hyperthreading: Enabled \n  name: master \n  replicas: 3\nmetadata: \n  name: $cluster_subdomain \nnetworking: \n  clusterNetwork: \n  - cidr: $private_network \n    hostPrefix: 23 \n  networkType: OpenShiftSDN \n  serviceNetwork: \n  - 172.30.0.0/16 \nplatform: \n none: {} \nfips: false \npullSecret: $secret \nsshKey: $sshkey" > ocp4/install-config.yaml

where:
domain – global DNS domain (guardium.notes)
cluster_subdomain – subdomain created in DNS setup section (ocp4)
private_network – private network for cluster nodes (this network cannot overlap existing in your envuronment network routes, be aware that all inter-cluster communication use DNS masqureade and cluster nodes cannot communicate over NAT ) (10.128.0.0/16)
private_network_subnet – host reservation mask per cluster node from private_network (23)
secret – secret of your RedHat account
sskhey – public key of ssh key created to manage cluster nodes

Create manifest files

ocp-inst/openshift-install create manifests --dir=ocp4

Because workers will be installed later using ansible we must declare that masters will not operate as infrastructure nodes:

cat ocp4/manifests/cluster-scheduler-02-config.yml | grep mastersSchedulable
sed -i 's/mastersSchedulable: true/mastersSchedulable: false/' ocp4/manifests/cluster-scheduler-02-config.yml
cat ocp4/manifests/cluster-scheduler-02-config.yml | grep mastersSchedulable

Create ignition files

ocp-inst/openshift-install create ignition-configs --dir=ocp4

Just created ignition files contain certificate valid only 24 hours. You must finish OCP cluster setup in this time. If you will not able to this remove content of ocp4 directory and repeat OCP installation procedure again.

Setup web server on bastion server

In bare metal installation the RHCOS operating system is boot from ISO and then packages are installed from downloaded from RedHat site OS image (rhcos-4.3.8-x86_64-metal.x86_64.raw.gz). This file and ignition files are shared through an anonymous web server, which we will configure on bastion. Of course you can use the service already available in the infrastructure – (http protocol is used).

Install and configure Apache web server (port 8080):

cp /etc/httpd/conf/httpd.conf /etc/httpd/conf/httpd.conf.orig
sed -i 's/Listen 80/Listen 8080/' /etc/httpd/conf/httpd.conf 
cat /etc/httpd/conf/httpd.conf | grep Listen
setsebool -P httpd_can_network_connect 1
systemctl enable httpd
systemctl start httpd
firewall-cmd --permanent --add-port=8080/tcp
firewall-cmd --reload

In production and security environments, the http server providing installation files should be configured so that only bootstrap and master calls can access its contents (files contain cluster certificates, your RH account secret and ssh access key)

Then put the demanded content into ocp subfolder of your web server

mkdir /var/www/html/ocp
cp ocp-inst/rhcos-4.3.8-x86_64-metal.x86_64.raw.gz /var/www/html/ocp/m.raw.gz
cp ocp4/*ign /var/www/html/ocp
chmod -R 755 /var/www/html/ocp

Check that files are available (http://your_bastion_ip:8080/ocp)

Open one from the ignition files to confirm that they are readable

Install bootstrap server

Attach to bootstrap template the ISO image downloaded from RedHat site rhcos-4.3.8-x86_64-installer.x86_64.iso.

Start machine and open console to insert booting parameters. Insert <TAB> to display booting parameters for RHCOS installation. We must provide:

  • coreos.inst.install_dev=<device_name> – it must be your raw disk device, in case of VMWare it is normally sda
  • coreos.inst.image_url=<image_location> – points the RHCOS raw image downloaded from RedHat site, in my case it is http://192.168.50.11:8080/ocp/m.raw.gz
  • coreos.inst.ignition_url=<ignition_file_location> – defines create earlier the bootstrap.ign file (http://192.168.50.11:8080/ocp/bootstrap.ign)
  • ip=<bootstrap_host_ip>::<default_gateway>:<mask>:<hostname>:<network_interface>:none – network settings of bootstrap node, where
    • bootstrap_host_ip – is a static IP of just being installed bootstrap node (192.168.50.200)
    • default_gateway for bootstrap network (192.168.50.254)
    • mask – in 4 octets notation (255.255.255.0)
    • hostname – FQDN of bootstrap node (boot.ocp4.guardium.notes)
    • network_interface – specifies network interface with access to network, in case of VMWare the default is ens192 for RHCOS.
  • nameserver=<DNS_server_IP> – points your DNS server where all OCP cluster names have been set.

In case of errors in booting parameters you must restart machine and insert all values again, what can be painful in case of troubleshooting.

After a while RHCOS will be installed and login prompt will appear in machine console.

To confirm successful installation of bootstrap server login to boot node

ssh -l core boot.ocp4.guardium.notes

and watch the bootkube service

journalctl -b -f -u bootkube.service

Output should periodically displays current status of master nodes setup:

Leave this window open to have real-time cluster setup review.

Setup master nodes

Now we can start installation of all three master nodes in any order. Each must be booted from RHCOS ISO and correctly configured using booting parameters.

We use exactly this same set of parameters which are described in bootstrap section. The only variable values are IP address, hostname and ignition file name which is master.ign in this case.

bootkube logs will produce output all the time and after while we can observer that master nodes are committed by boostrap

Finally bootkube log should display message about successful setup of master nodes.

Interrupt bootkube log watch (CTRL+C) and exit from bootstrap node.

To log in to the cluster we have to configure the oc tool. To do this, set the KUBECONFIG variable (which makes sense if you manage multiple clusters) or copy the cluster configuration file to a hidden .kube directory your user home one.

mkdir -p .kube
cp ocp4/auth/kubeconfig .kube/config

Now on bastion execute command to get cluster node list.

oc get nodes

We can also confirm that bootstrap phase finished correctly.

ocp-inst/openshift-install --dir=ocp4 wait-for bootstrap-complete

We do not need longer our bootstrap server because all master servers has been configured and they are operational. So we can stop bootstrap machine

ssh -l core boot.ocp4 "sudo shutdown now"

We should also reconfigure load balancer to stop redirect traffic to bootstrap server. Login to proxy and modify HA Proxy

ssh -i ~/.ssh/aux/id_rsa proxy 'cat /etc/haproxy/haproxy.cfg | grep m0'
ssh -i ~/.ssh/aux/id_rsa proxy "sed -i 's/server m0/#server m0/' /etc/haproxy/haproxy.cfg"
ssh -i ~/.ssh/aux/id_rsa proxy 'cat /etc/haproxy/haproxy.cfg | grep m0'
ssh -i ~/.ssh/aux/id_rsa proxy systemctl restart haproxy

Since we will be based on RHEL 7 system in case of workers we can also stop Apache server on bastion. It will not be needed anymore. It is also worth deleting the shared files because they contain critical environmental information.

systemctl stop httpd
systemctl disable httpd
rm -rf /var/www/html/ocp
Setup worker nodes

Because this installation will use RHEL 7 for worker nodes we must configure them for this role.

We copy ssh public key to workers to be able connect to them without password (we use this same key which was transferred to master nodes with ignition files)

workers="<worker1_FQDN> <worker2_FQDN> <worker3_FQDN>"
for h in $workers; do ssh-copy-id -i ~/.ssh/id_rsa.pub $h; done

Register your RHEL machines

for h in $workers; do ssh $h subscription-manager register --username <user>  --password <password>; done

where: <user>, <password> is your RH account and its password.

Refresh subscriptions

for h in $workers; do ssh $h subscription-manager refresh; done

Subscribe your nodes to pool with OpenShift software access (similar to attached to bastion but should cover RH support and scalability expectations).

for h in $workers; do ssh $h subscription-manager attach --pool=<pool_id>; done

Remove all default repositories and add required ones

for h in $workers; do ssh $h subscription-manager repos --disable="*"; done
for h in $workers; do ssh $h subscription-manager repos --enable="rhel-7-server-rpms" --enable="rhel-7-server-extras-rpms" --enable="rhel-7-server-ose-4.3-rpms"; done

Firewall must be switched off on workers

for h in $workers; do ssh $h systemctl disable --now firewalld.service; done

Finally update your systems and reboot them

for h in $workers; do ssh $h yum -y update; done
for h in $workers; do ssh $h shutdown -r now; done

Now we can scale up cluster by adding to it our 3 workers nodes

We must create new_hosts file to define nodes in scale up process

workers="<worker1_FQDN> <worker2_FQDN> <worker3_FQDN>"
echo -e "[all:vars]\nansible_user=root\nopenshift_kubeconfig=\"~/.kube/config\"\n[workers]\n\n[new_workers]" > ocp-inst/new_hosts
for h in $workers; do echo "$h" >> ocp-inst/new_hosts; done
for h in $workers; do ssh $h exit;done

Scale up process uses Ansible to manage configuration change. Finally we can execute:

ansible-playbook -i ocp-inst/new_hosts /usr/share/ansible/openshift-ansible/playbooks/scaleup.yml

which join our machines to OCP cluster as a workers.

Monitor cluster status.

oc get csr
oc get clusteroperator

Our cluster will be installed correctly when all operators are available. The process of setting up all services can take many minutes and no additional administrative activities should be performed until then.

The list of nodes will be extended and contains new workers.

oc get nodes

At the end of the installation process an administrative account of the cluster is automatically created – kubeadmin. The following command allows you to confirm the fully functional state of the cluster and displays the console URL and authentication data.

ocp-inst/openshift-install --dir=ocp4 wait-for install-complete

Then log in to the console using the URL and credentials provided by the previous command.

Identity provider setup

OCP provides many ways to authenticate users. Production systems often use external LDAP directories or a federated identity (e.g. OpenID). To simplify the process, we will use a local user repository in htpasswd format.

To create htpasswd file

htpasswd -c ocp4/ocp.pwd <username>

To add more user to existing ocp.pwd file execute

htpasswd ocp4/ocp.pwd <username>

After logging in to the OCP console as kubeadmin, a message is displayed that additional identity providers must be used. We can go to their configuration via the indicated URL or in the menu: Administration -> Cluster Settings -> Global Configuration -> OAuth

From the list, we select HTPasswd

And enter the name of the new identity provider (Name), indicate the user definition file (HTPasswd File) created a moment ago and confirm everything with the Add button.

Registration of a new identity in OCP takes place during the first login in the system. Therefore, we log on to the system as a new user with authentication consistent with the one entered in the htpasswd file.

Then again as a kubeadmin user we go to the user management. In the list User Management -> Users a new user account should be there.

If we create a highly privileged administrative account, we have to assign appropriate permissions to it. To do this, select the account and then on the Role Bidnings tab press Create Binding button.

We define the scope of a role in the whole cluster (Cluster-wide) name describing this assignment and from the list of existing roles we choose Cluster Admin. Finally, we indicate our user as an element inheriting the features of the administrative role.

From now we can manage the cluster using a just added personal account.

oc login -u <user>
oc whoami
oc whoami -t

7. OpenShift Container Storage setup

OpenShift is almost installed, but we still need to configure the local disk resources. Guardium Insights recommends using the Ceph distributed file system for this.
Fully supported version of Ceph on OCP cluster we have available within OpenShift Container Storage (OCS), which we will configure now.

Preparation

Check availability of SELinux subsystem for containers on worker nodes

workers="<worker1_FQDN> <worker2_FQDN> <worker3_FQDN>"
for h in $workers; do ssh $h yum -y install policycoreutils container-selinux; done

Persistently enable container use of the Ceph file system in SELinux

for h in $workers; do ssh $h setsebool -P container_use_cephfs on; done

Our worker templates from RHEL 7 have an additional, second 300-400 GB disk drive (it is important that they are the same size). They will be used to build Ceph file system.
Nodes with attached raw devices (workers) must be labelled for containered storage.

for h in $workers; do oc label nodes $h cluster.ocs.openshift.io/openshift-storage=""; done

OCS operator must be located in openshift-storage namespace, which we create using these commands:

mkdir ocs
echo -e "kind: Namespace\napiVersion: v1\nmetadata:\n name: openshift-storage\n labels:\n  openshift.io/cluster-monitoring: \"true\"\n annotations:\n  openshift.io/node-selector: \"\"\nspec: {}" > ocs/ocs-namespace.yaml
oc create -f ocs/ocs-namespace.yaml
oc get namespace openshift-storage

Local storage operator services are responsible for directly attached storage management and Ceph with use this layer in access to our disks. Local storage operator will be installed in the separate namespace (local-storage)

echo -e "kind: Namespace\napiVersion: v1\nmetadata:\n name: local-storage\nspec: {}" > ocs/ls-namespace.yaml
oc create -f ocs/ls-namespace.yaml
oc get namespace local-storage
OCS operators installation

In OCP UI go to Operators -> OperatorHub menu. Search for operators with name contained storage substring. Select Local Storage operator

Select Install button

and select local-storage namespace for it. Subscribe service.

After a while, the operator status should change to Succeeded. We can also monitor the status of the pods in the local-storage project

oc get pods -n local-storage

The second operator we have to install is OCS. As in the previous case, we select the Install button and indicate namespace – this time openshift-storage and the whole thing is subscribed.

One of the prerequisites for OCS is an additional lib-bucket operator, which is automatically installed. We wait for the status of both to change to confirm the successful task.

oc get pods -n openshift-storage
DAS setup with local-storage

We are starting to initialize local-storage services for additional drives connected to workers by creating a volume (LocalVolume).
We must indicate the name of the block devices to be searched for in this process on the indicated nodes. For this purpose, we can use the commands we have used on the NFS server. In my environment all disks have the same name /dev/sdb.
We create volume using yaml configuration file.

devices="list of block devices used for OCS"
echo -e "apiVersion: local.storage.openshift.io/v1\nkind: LocalVolume\nmetadata:\n name: local-block\n namespace: local-storage\n labels:\n  app: ocs-storagecluster\nspec:\n nodeSelector:\n  nodeSelectorTerms:\n  - matchExpressions:\n    - key: cluster.ocs.openshift.io/openshift-storage\n      operator: In\n      values:\n      - \"\"\n storageClassDevices:\n  - devicePaths:" > ocs/local-storage.yaml
for h in $devices; do echo "    - $h" >> ocs/local-storage.yaml; done
echo -e "    storageClassName: localblock\n    volumeMode: Block" >> ocs/local-storage.yaml
oc create -f ocs/local-storage.yaml

where devices is list of block devices to join into LocalVolume.

The operation may take a few minutes. We are waiting for all local-storage project pods to be fully deployed.
In the end a new storage localblock class and 3 or more (depend how many devices has been provided) persistent volumes (PV) will appear in the cluster.

oc get pods -n local-storage
oc get sc
oc get pv

A detailed PV definition indicates that the correct drives are connected.

oc describe pv <local_block_pv>
OCS cluster configuration

Configured drives using local-storage will now allow us to securely install an OCS cluster with multiple management services. For this purpose we create a definition of StorageCluster in the form of a yaml file.

size=<local-storage_pv_size>
echo -e "apiVersion: ocs.openshift.io/v1\nkind: StorageCluster\nmetadata:\n name: ocs-storagecluster\n namespace: openshift-storage\nspec:\n manageNodes: false\n monDataDirHostPath: /var/lib/rook\n storageDeviceSets:\n - count: 1\n   dataPVCTemplate:\n    spec:\n     accessModes:\n     - ReadWriteOnce\n     resources:\n      requests:\n       storage: $size\n     storageClassName: localblock\n     volumeMode: Block\n   name: ocs-deviceset\n   placement: {}\n   portable: false\n   replica: 3\n   resources: {}" > ocs/cluster.yaml
oc create -f ocs/cluster.yaml

where size refers to size of PV created in LocalVolume (if PV’s have different sizes as the size of a single volume in an OCS cluster, indicate the smallest one)

The process can take several minutes and is best monitored through OCS operator status.

watch "oc get pods -n openshift-storage | grep ocs-operator*"

A correctly completed process will create 3 new storage classes. PV’s created through local-storage will be allocated to the OCS cluster (PVC’s) and in addition the noobaa storage server will create an PVC allocation for its services using ceph.

oc get sc
oc get pv
oc get pvc -n openshift-storage

OCS Cluster has been installed with ceph support and we only have to mark ceph block storage class as default in our cluster.

oc patch storageclass ocs-storagecluster-ceph-rbd -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
oc get sc
OCS registry image storage configuration

In the case of bar metal installation of OCP cluster , the system in the initial phase (before installing OCS) does not have any storage resources for its purposes. Therefore the image repository has not been configured.
To be able to install Guardium Insights it is necessary to reconfigure the repository by showing it dedicated disk space on the OCS cluster.
For this purpose we create a new volume (PVC) with a minimum size of 100 GB.

size=100Gi
echo -e "kind: PersistentVolumeClaim\napiVersion: v1\nmetadata:\n name: registry\n namespace: openshift-image-registry\nspec:\n accessModes:\n  - ReadWriteMany\n resources:\n  requests:\n   storage: "$size"\n storageClassName: ocs-storagecluster-cephfs" > ocp4/iregistry_pvc.yaml
oc create -f ocp4/iregistry_pvc.yaml
oc get pvc -n openshift-image-registry

We can now change the image registry configuration by modifying its operator. The change will cause a few minutes of cluster reconfiguration process, which is best monitored through the operator’s status.

oc patch configs.imageregistry.operator.openshift.io/cluster --type=merge --patch '{"spec":{"storage":{"pvc":{"claim": "registry"}}}}'
oc patch configs.imageregistry.operator.openshift.io/cluster --type=merge --patch '{"spec": {"managementState": "Managed"}}'
watch oc get clusteroperator image-registry

We still have to make the configured register available for external calls. We will use it when installing the next elements of the solution. Monitor the change status this same way.

oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge
watch oc get clusteroperator image-registry

8. IBM Cloud Pack Common Services installation

Guardium Insights requires services provided by IBM Cloud Pack Common Services (ICS) to work. This section presents two possible installation paths, using a remote repository and through a local image registry.

Docker on bastion configuration

Loading images requires a docker instance installed on the bastion. To simplify the installation we have to configure it to communicate with repositories using certificates from unregistered locally certification centers (CA) by /etc/docker/daemon.json file modification.

cluster_domain=<cluster_domain_FQDN>
cat /etc/docker/daemon.json
if `jq 'has("insecure_registries")' /etc/docker/daemon.json`; then if ! `jq --arg registry "default-image-registry.apps.$cluster_domain" '.insecure_registries | contains([$registry])' /etc/docker/daemon.json`; then jq --arg registry "default-image-registry.apps.$cluster_domain" '.insecure_registries[.insecure_registries|length] |= . + $registry' /etc/docker/daemon.json > /etc/docker/daemon.json.new; fi; else jq '. += {"insecure_registries": []}' /etc/docker/daemon.json|jq --arg registry "default-image-registry.apps.$cluster_domain" '.insecure_registries[.insecure_registries|length] |= . + $registry' > /etc/docker/daemon.json.new; fi; if [ -f /etc/docker/daemon.json.new ]; then mv -f /etc/docker/daemon.json.new /etc/docker/daemon.json; fi
cat /etc/docker/daemon.json

where cluster_domain is full OCP cluster domain name (ocp4.guardium.notes in my case)

Restart docker on bastion and create two subdirectories.

systemctl restart docker
mkdir ics insights
ICS installation from IBM Cloud repository

To install ICS without having to load all images into the local repository, log into your IBM Cloud account and go to the URL below.

https://myibm.ibm.com/products-services/containerlibrary

Then copy the key that will allow you to connect to the remote repository.

We add to the OCP cluster the authentication data for the remote repository in IBM Cloud.

oc create secret docker-registry entitled-registry --docker-server=cp.icr.io --docker-username=cp --docker-password=<IBM_Cloud_key> --docker-email=<IBM_Cloud_account>

Load the installation image to the local docker on bastion.

cd ics
docker login cp.icr.io -u cp -p <IBM_Cloud_key>
docker pull cp.icr.io/cp/icp-foundation/icp-inception:3.2.4
docker images

Copy ICS configuration files to bastion directory.

docker run --rm -v $(pwd):/data:z -e LICENSE=accept --security-opt label:disable cp.icr.io/cp/icp-foundation/icp-inception:3.2.4 cp -r cluster /data

We create the ICS installation configuration file. Enter the IBM Cloud account key and indicate which Common Services functions will be installed on your workers. We can distribute them among all the nodes or skip the node on which we will install DB2 DWH (in my machine templates this is w3.ocp4.guardium.notes).
Define also the ICS admin user password – ics_passwd.

ics_passwd=<ICS_admin_password>
ibm_key=<IBM_Cloud_key>
mngt_node=<ICP_management_node>
master_node=<ICP_master_node>
proxy_node=<ICP_proxy_node>
echo '{}'|jq --arg ics_passwd "$ics_passwd" --arg ibm_key "$ibm_key" '. += {"image_repo": "cp.icr.io/cp/icp-foundation","default_admin_password": $ics_passwd,"cluster_nodes": {},"docker_password": $ibm_key,"management_services": {"multitenancy-enforcement": "disabled", "catalog-ui": "disabled", "monitoring": "disabled", "mcm-kui": "disabled", "metering": "disabled", "system-healthcheck-service": "disabled", "licensing": "disabled", "logging": "disabled", "audit-logging": "disabled", "iam-policy-controller": "enabled", "nginx-ingress": "enabled", "common-web-ui": "enabled"},"docker_username": "cp","storage_class": "ocs-storagecluster-ceph-rbd","private_registry_enabled": true, "password_rules": ["(.*)"]}'|jq --arg mngt_node "$mngt_node" --arg master_node "$master_node" --arg proxy_node "$proxy_node" '."cluster_nodes" += {"management": [$mngt_node], "master": [$master_node], "proxy": [$proxy_node]}'|python -c 'import sys, yaml, json; yaml.safe_dump(json.load(sys.stdin), sys.stdout, default_flow_style=False)' > cluster/config.yaml

Copy OCP cluster configuration file to ICS cluster installation directory

cp ../ocp4/auth/kubeconfig cluster/kubeconfig

We can start an installation that is controlled by Ansible and may take several minutes.

docker run -t --net=host -e LICENSE=accept -v $(pwd)/cluster:/installer/cluster:z -v /var/run:/var/run:z -v /etc/docker:/etc/docker:z --security-opt label:disable cp.icr.io/cp/icp-foundation/icp-inception:3.2.4 addon -v

The ICS installation process created a new account admin in kube:system identity provider. I suggest assign to it cluster-admin role.

oc adm policy add-cluster-role-to-user cluster-admin admin

Now we can login to ICS console using URL displayed by final message during its installation

Go to root home directory.

cd ..
ICS installation from local repository

Download Guardium Insights installer from Passport Advantage. The software package archive should contain installation gzip file – SCRTY_GUARDIUM_INSIGHTS_V_2.0.2.tar.gz. Copy it to bastion insights directory and unpack.

tar xvf insights/SCRTY_GUARDIUM_INSIGHTS_V_2.0.2.tar.gz -C insights

Remove gzip to safe space on disk. Three subdirectories will appear in insights one.

rm -f insights/SCRTY_GUARDIUM_INSIGHTS_V_2.0.2.tar.gz

Login to local docker on bastion and load ICS images stored in archive in ics_images subdirectory (this operation takes several minutes).

docker login
tar xf insights/ics_images/common-services-2002-x86_64.tar.gz -O | docker load

Copy installation files from icp-inception image to ics subdirectory

cd ics
docker run --rm -v $(pwd):/data:z -e LICENSE=accept --security-opt label:disable ibmcom/icp-inception-amd64:3.2.4 cp -r cluster /data

Create configuration file

ics_passwd=<ICS_admin_password>
mngt_node=<ICP_management_node>
master_node=<ICP_master_node>
proxy_node=<ICP_proxy_node>
echo '{}'|jq --arg ics_passwd "$ics_passwd" '. += {"default_admin_password": $ics_passwd,"cluster_nodes": {}, "management_services": {"multitenancy-enforcement": "disabled", "catalog-ui": "disabled", "monitoring": "disabled", "mcm-kui": "disabled", "metering": "disabled", "system-healthcheck-service": "disabled", "licensing": "disabled", "logging": "disabled", "audit-logging": "disabled", "iam-policy-controller": "enabled", "nginx-ingress": "enabled", "common-web-ui": "enabled"}, "storage_class": "ocs-storagecluster-ceph-rbd","private_registry_enabled": false, "password_rules": ["(.*)"]}'|jq --arg mngt_node "$mngt_node" --arg master_node "$master_node" --arg proxy_node "$proxy_node" '."cluster_nodes" += {"management": [$mngt_node], "master": [$master_node], "proxy": [$proxy_node]}'|python -c 'import sys, yaml, json; yaml.safe_dump(json.load(sys.stdin), sys.stdout, default_flow_style=True)' > cluster/config.yaml

Copy OCP configuration file to the cluster directory

cp ../ocp4/auth/kubeconfig cluster/kubeconfig

Initialize ICS installation

docker run -t --net=host -e LICENSE=accept -v $(pwd)/cluster:/installer/cluster:z -v /var/run:/var/run:z -v /etc/docker:/etc/docker:z --security-opt label:disable ibmcom/icp-inception-amd64:3.2.4 addon -v

It takes a lot of time because all ICS images are transferred from bastion docker to OCP image repository.

Finally you will get information that ICS is installed

The ICS installation process created a new account admin in kube:system identity provider. I suggest assign to it cluster-admin role.

oc adm policy add-cluster-role-to-user cluster-admin admin

Now we can login to ICS console using URL displayed by final message during its installation

Go to root home directory

cd ..
ICS tools installation

Copy tools from ICP console (use your ICS Dasboard URL) into root home directory:

curl -kLo cloudctl https://<ICS Dashboard URL>/api/cli/cloudctl-linux-amd64
curl -kLo helm.tar.gz https://<ICS Dashboard URL>/api/cli/helm-linux-amd64.tar.gz

Move binaries to /usr/local/bin

chmod +x cloudctl; mv cloudctl /usr/local/bin
tar xvf helm.tar.gz *helm; mv linux-amd64/helm /usr/local/bin;rm -rf linux-amd64/ helm.tar.gz

9. Guardium Insights installation

We can start installing Guardium Insights.

NFS client setup

The collected and analyzed data should be located in a repository outside the control of the cluster, thus giving us ease of transfer and archiving in case of an update or reconstruction of the cluster. For this purpose, we configure the NFS client service referring to the previously prepared resource on the NFS server.

export NFS_SERVER_IP=<your NFS server IP>
export NFS_PATH=<full path to your NFS share>
oc create namespace nfs
cloudctl login -a https://<ICS Dasboard URL> --skip-ssl-validation -u admin
helm init --client-only; helm repo update

where:
NFS_SERVER_IP – IP address of NFS server
NFS_PATH – full path to shared storage

Then NFS client is installed using helm

helm install --set podSecurityPolicy.enabled=true --set 'nfs.mountOptions[0]="nfsvers=4"' --set 'nfs.mountOptions[1]="context=system_u:object_r:container_file_t:s0"' --set nfs.server=$NFS_SERVER_IP --set nfs.path=$NFS_PATH stable/nfs-client-provisioner -n=nfs --tls
oc get pods -n nfs
Preparing GI installation files

Download Guardium Insights installer from Passport Advantage. The software package archive should contain installation gzip file – SCRTY_GUARDIUM_INSIGHTS_V_2.0.2.tar.gz. Copy it to bastion insights directory and unpack.

If you installed ICS in offline mode the Guardium Insights archive has been uploaded and unpacked before.

tar xvf insights/SCRTY_GUARDIUM_INSIGHTS_V_2.0.2.tar.gz -C insights

Create two namespaces – one for IBM Streams (gi-addon) and second one for GI (gi) services.

helm and namespace names should have a maximum of 10 characters to avoid installation problems.

oc create namespace <GI namespace>
oc create namespace <IBM Streams namespace>

Load GI images to bastion repository (it takes several minutes).

docker login
cd insights/insights_images;./dockerLoad.sh;cd ../..

Push GI images to openshift registry (takes also few minutes).

docker login default-route-openshift-image-registry.apps.ocp4.guardium.notes -u admin -p `oc whoami -t`
cd insights/insights_images;./dockerPush.sh default-route-openshift-image-registry.apps.ocp4.guardium.notes/<GI namespace>;cd ../..

Start GI installation process by invoking installer.sh script

cd insights/insights_images; ./installer.sh -s <DB worker FQDN> -n <GI namespace> -h <GI URL> -u 192.168.50.10 -l true -a admin -e https://icp-console.apps.ocp4.guardium.notes -i image-registry.openshift-image-registry.svc:5000/gi -d gi-addon -r gi-addon -o values-small.yaml

where:
-s – DB worker specification (for large GI installation you can point 2 nodes but you should create OCP cluster with minimum 4 workers)
-n – Guardium Insights namespace (defined few steps earlier)
-h – Guardium Insights URL – specify any name in apps subdomain of your cluster domain (i.e. insights.ocp4.guardium.notes)
-u – load balancer IP address
-l – license agreement acceptance
-a – ICP admin user
-e – URL of ICP Dashboard (including https:// prefix)
-i – local project registry – put here private cluster docker URL (docker-registry.default.svc:5000/<GI namespace>)
-d – IBM Streams namespace (define few steps earlier)
-r – IBM Streams helm release name (I suggest use this same name as for namespace)
-o – installation size (small.yaml, medium.yaml, large.yaml)
-p – ICP admin password

Confirm successful installation by execution of this command:

oc get cm | grep tenant-postdeploy-ready

output should provide information that setup-tenant-postdeploy-ready pod is running on our cluster.

Then we can login to GI portal using URL defined as an option -h of installer.sh script. Use ICS admin account credential to log in.

GI unistallation cleanup

In case of problems with the GI installation due to insufficient resources, for example, it may be necessary to restart the installation. However, this requires a thorough uninstallation of all GI elements in the following way:

Remove GI helm

helm delete <GI_NAMESPACE> --purge --tls
oc delete namespace <GI_NAMESPACE>

Remove IBM Streams helm

helm delete <IBM_STREAMS_HELM> --purge --tls
oc delete namespace <IBM_STREAMS_NAMESPACE>

Remove LDAP definition in ICS (if exists)

Login to ICS Dashboard and go to LDAP configurations (Administer -> Identity and Access). Then expand options for GI LDAP configuration and select Delete option. Finally confirm deletion using Remove LDAP button.

Remove Service ID (if exists)

Login to ICS Dashboard and go to LDAP configurations (Administer -> Identity and Access). Then switch to Service ID's tab and expand option for a service with a name starting with eventstream. Select Delete option and confirm operation with Remove Service ID button.

Remove GI labels on workers

Execute this command to remove GI label assignments.

oc label node `oc get node -l icp4data --no-headers=true -o name | sed -e 's#node/##'` icp4data-

Remove content on NFS server

If you don’t want to delete data already stored in DB2 DWH, use the correct data migration procedure

This command cleanup your shared folder

ssh -i ~/.ssh/aux/id_rsa ginfs rm -rf /storage/gi/*

10. Post installation tasks

We must do some additional post-installation steps to configure system for our environment.

SCP traffic configuration

If we plan to transfer data from the GDP system to the GI we have to modify the load balancer configuration. Data in the form of datamarts are sent from collectors via SCP to the ssh service on our cluster. During the installation of the GI, a port in the range 30000-32767 is randomly selected for this purpose.
We can identify it with a command:

oc get services <GI namespace>-ssh-service -ojsonpath='{.spec.ports[0].nodePort}' && echo

To update HA Proxy configuration execute these commands:

port=`oc get services <GI namespace>-ssh-service -ojsonpath='{.spec.ports[0].nodePort}' && echo`
workers="<worker1_ip> <worker2_ip> <worker3_ip>"
ssh -i ~/.ssh/aux/id_rsa proxy 'echo -e "frontend scp\n bind *:'$port'\n default_backend scp\n mode tcp\n option tcplog\nbackend scp\n balance source\n mode tcp" >> /etc/haproxy/haproxy.cfg'
i=1;for w in $workers; do record=`echo -e "\ server w$i $w:$port check"`; ssh -i ~/.ssh/aux/id_rsa proxy "echo $record >> /etc/haproxy/haproxy.cfg";let i=i+1; done
ssh -i ~/.ssh/aux/id_rsa proxy cat /etc/haproxy/haproxy.cfg

Restart HA Proxy with new settings

ssh -i ~/.ssh/aux/id_rsa proxy systemctl restart haproxy
ssh -i ~/.ssh/aux/id_rsa proxy systemctl status haproxy

Finally we must open GI SCP port on proxy machine

ssh -i ~/.ssh/aux/id_rsa proxy firewall-cmd --permanent --add-port=$port/tcp
ssh -i ~/.ssh/aux/id_rsa proxy firewall-cmd --reload
LDAP setup for GI

Log in to GI as admin user and select Connect an LDAP Server (GI supports most commercial and free LDAP implementations)

Select your LDAP server type and define all required parameters (I am using Active Directory as an identity source). Save configuration.

Confirm LDAP setup configuration correctness.

GI admins role assignment

The GI implements a separation of duties mechanism to audited data and therefore the role of access management is separate from that of access to data. The default user accessible after installation has access control privileges, therefore it is necessary to add at least one named user who will have the rights to configure data collection services.
For this purpose, we will use the configured access to the LDAP server and import a new user.

In GI portal expand Settings and select User management option.

In User Management click Add user button.

Use filter to find a new user in LDAP directory, select it and use Next buton

Select demanded roles and Save selection.

New user has been created

Re-login to a new account and notice that the new administrator has possibility to configure feeders

Thanks a lot to Devan Shah, Joshua Ho, Ryan Ramphal. I wouldn’t have moved on this article without them.

Here installation cookbook for GI 2.0.0 and 2.0.1 based on OCP 3.11 – cookbook

Here article about feed configuration – Guardium Insights feeding

Guardium DBaaS monitoring using AWS Data Activity Streams

When it comes to monitor activity on databases as a service (DBaaS), there is not the possibility to install the traditional Guardium S-TAP at the operating system level.

There are different options for monitoring these DBaaS:

  • External S-TAP, that intercepts database traffic and sends it to the Collector;
  • Data Stream, where the Collector consumes events from public clouds as Amazon AWS or Microsoft Azure;
  • Native Logs, leveraging database native logging.

In this article I will introduce the steps to have Guardium Data Protection consuming AWS Database Activity Streams (DAS) data from the AWS Kinesis service. AWS DAS provides a near real-time data stream of the database activity.

The target database is an AWS Aurora PostgreSQL (compatible with PostgreSQL 10.7 engine). The first step is creating the database cluster:

From a networking perspective, I made the database publicly accessible and I also created two inbound rules in order to allow the access from two different IP addresses:

  • IP address of the Guardium Collector, so this will be able to pull data from AWS Kinesis;
  • IP address of my workstation, so I will be able to generate some test activity on the database.

The next step is enabling the activity stream for the target database. More information about DAS with Aurora PostgreSQL can be found here: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/DBActivityStreams.html

The DAS enablement requires the use of AWS KMS (Key Management Service) because the activity streams are always encrypted. I created a Master Key named ‘as-stream-key’ – this Master Key will be used to encrypt the Encryption Key that will be effectively encrypting the logged database activity. The Encryption Key will be rotated automatically by the system.

The activity streams are enabled at the cluster level – this means that every DB instance in the cluster is monitored. If I add a new instance to the cluster, this will be automatically monitored.

For enabling DAS:

  • Click on the DB cluster
  • Click on ‘Actions’
  • Click on ‘Start activity stream’
A screenshot of a cell phone

Description automatically generated

After a couple of minutes, the Kinesis stream is enabled. You can check the name of the stream in the configuration tab of the DB cluster.

A screenshot of a cell phone

Description automatically generated

At this point we can switch to the Guardium Collector we want to be the consumer of this stream. Navigate to ‘Cloud DB Service Protection’ tab and add a new account definition:

  • Name: choose a name for this account
  • Provider: Amazon
  • Audit type: Data Streams
  • Authentication type: Security Credentials
  • AWS access key ID: access key ID for your AWS account
  • AWS secret access key ID: secret access key for your AWS account
A screenshot of a cell phone

Description automatically generated

You can discover now the available AWS Kinesis streams for your account. You can configure the discovery to happen only in the regions you select. In this case I selected ‘us-west-2’ because the DB cluster was created in this region.

In my case two different streams appear. Select the one related to your DB cluster (the stream name is visible in the DB cluster configuration tab as previously mentioned) and click on ‘Enable Monitoring’.

Provide the requested information:

  • DB DNS endpoint: visible in the ‘Connectivity & Security’ tab of the DB instance
  • Port: visible in the same tab above
  • Cluster resource ID: the cluster resource ID for the AWS RDS cluster associated with the stream. If you enter an invalid or unknown cluster resource ID, an error is reported in the status for the stream. This ID is visible in the ‘Configuration’ tab of the DB cluster
  • Consumer group name: determines whether multiple consumers have a shared or separate view of this data stream. The consumer group name can be any name. To share the data stream view, use the same consumer group name
A screenshot of a social media post

Description automatically generated
A screenshot of a cell phone

Description automatically generated

Click ‘ok’, after a few seconds refresh the page and the status of the stream will be updated.

The Guardium Collector is now consuming the streams from AWS Kinesis and the sniffer is applying the installed policies on this activity.

Let’s do some test activity so we will see new entries in the ‘Connection Profiling List’ and in the ad-hoc created report ‘as-AWS-stream’.

As an additional test, let’s modify the policy in order to log in full details only activity for the object ‘sensitive_table’. As this is a ‘Selective Audit Trail’ policy, SQL statements regarding any object that is not ‘sensitive_table’ will not be logged in the Collector.

When running SELECT statements on both table ‘sensitive_table’ and ‘random_table’, the activity on the latter one is dropped by the sniffer and only the activity related to ‘sensitive_table’ is stored in the Collector database.

As shown in this post, this kind of monitoring is easy to use with Guardium and the configuration is straightforward.

On the other hand, for AWS the support is currently limited to Aurora PostgreSQL (this is a cloud provider dependency). Also, there are some limitations for rules and policies:

  • Analysis of returned data and extrusion rules are not available;
  • SQL errors are not available;
  • Policies that interact with S-TAP are not available (Blocking or Ignore).

This monitoring approach is available for Microsoft Azure DBaaS as well. In this case Guardium leverages ‘Azure Event Hub’ service and the supported databases are Azure SQL Server/DW and Cosmos DB.

Guardium Insights (GI) can also consume AWS Kinesis data and therefore ingest Database Activity Streams for Aurora PostgreSQL. The big difference with Guardium Data Protection (GDP) is that in the latter one there is the possibility of filtering incoming data by traditional Guardium policies, as mentioned before. This is possible thanks to the sniffer process that is able to analyze traffic and take actions based on what it’s watched.

Currently, in GI there isn’t any component that can filter incoming AWS Kinesis data. All data sent from the AWS service will then be stored in GI.

However, due the different storage mechanism that Guardium Insights provides, a huge number of ingested events is not an issue in this case. Furthermore, the number of events streamed from AWS Kinesis will be exactly the same for both solutions – GDP can filter traffic only after is received and not at the source.

When both GDP and GI are in place, the best solution would be directly stream data from AWS Kinesis to Guardium Insights, removing the intermediate step of Guardium Collectors.

Guardium Insights feeding

Guardium Insights 2.0.0.1 provides two types of data feed. This article describes them in more detail.

Guardium Data Protection

Connections configuration and overview can be found in Connections to Insights pane in Overview dashboard

2020-03-03_09-24-11

or directly from menu

2020-03-03_09-30-50

The Guardium [1] section is responsible for configuring the transfer of data to the GI from the Guardium Data Protection infrastructure. We select Connect Guardium Source + [2] to add an environment.

2020-03-03_09-34-23

In version 2.0.0.1, configuration is only possible by indicating the Central Manager. Probably in the future a more granular approach will be possible by indicating the single collector or their list. This will be particularly useful for distributing data between the tenants.
We give the name of the configuration [1], indicate the Central Manager [2] (DNS name or IP address) and save [3]. The message [4] should confirm the correctness of the data and close the window at the end [5].

2020-03-03_09-35-47

Just added GDP environment should be on the list. To enable integration select Central Manager name [1]

2020-03-03_09-37-02

GI shows list of collectors in GDP environment (only one in my case). To enable them to send data to Guardium Insights select Enable Streaming button [1]. Message [2] informs that process has been initiated.

2020-03-03_09-38-25

It can take some time in environments with dozens or hundreds of collectors. The final effect should be an Enable label at each of the collectors.

2020-03-03_09-38-57

Data from the collectors are sent to the GI using job-controlled datamarts.
The list of configured jobs can be displayed using the command

grdapi list_scheduler_jobs

Among the existing jobs, we are interested in those whose parameter JOB_ASSOCIATED_OBJECT starts with the suffix Export:Insights – for instance:

####### EXTENDED SCHEDULER JOB INFO35 #######
TRIGGER_NAME=DataMartExtractionJobTrigger_66
TRIGGER_GROUP=DataMartExtractionJobGroup
JOB_NAME=DataMartExtractionJob_66
GROUP_NAME=DataMartExtractionJobGroup
JOB_CATEGORY=dataMartExtraction
JOB_STATE=SCHEDULED
JOB_NEXT_FIRE_TIME=Tue Mar 03 10:15:00 CET 2020
JOB_PREV_FIRE_TIME=null
JOB_ASSOCIATED_OBJECT_TYPE=job.type.dataMartExtraction
JOB_ASSOCIATED_OBJECT=Export:Insights:Full SQL
JOB_ASSOCIATED_OBJECT_ID=66
JOB_LAST_EXECUTION_STATE=UNKNOWN

where:
JOB_NAME=DataMartExtractionJob_66 – job name
JOB_ASSOCIATED_OBJECT=Export:Insights:Full SQL – datamart name

to check schedule of this job execute:

grdapi list_schedules jobName=DataMartExtractionJob_66

in this case extraction is is taken every hour, 15 minutes after a full-hour.

job name = DataMartExtractionJob_66
job group = DataMartExtractionJobGroup
job description = Export:Insights:Full SQL
trigger name = DataMartExtractionJobTrigger_66
trigger group = DataMartExtractionJobGroup
previous fire time = 2020-03-03 11:15:00.0000
next fire time = 2020-03-03 12:15:00.0000
start time = 2020-03-03 09:38:04.0
status = WAITING
cron string = 0 15 0/1 ? * 1,2,3,4,5,6,7

finally we can review datamart settings:

grdapi get_datamart_info datamart_name="Export:Insights:Full SQL"

there is a lot of information including:

=========================================
Data Mart Name: Export:Insights:Full SQL
=========================================
Description: 
Based on Report: Export: Full SQL
Based on Query: Export:Full SQL
Extract result to: File
Initial Start: 
Creation Date: 2020-01-14 09:57:40
Time Granularity: 1 HOUR
Active: true
---------------------
Customized Date Format: 
File Name: EXP_INSIGHTS_FULL_SQL
Lines per File: 500000
File Header: "UTC Offset","Access Rule Description","Full Sql","Instance ID","Records Affected","Response Time","Session Id","Succeeded","Timestamp"
Include File Header: true
-----------------------------------------
Copy File Info
-----------------------------------------
Host Name: 192.168.200.151
User Name: scpuser
Directory: /service/datamart/TNT_O75RGRRXPKWNK89XKUVCZL
Password: ******
Transfer Method: SCP
Bundle Name: 
Bundle Main Datamart: false
Send COMPLETE File: true
-----------------------------------------
Last Extraction Info
-----------------------------------------
State:1
---------------------
Timestamp: 2020-03-03 09:15:00
Next Period: 2020-03-03 09:00:00
Last Extracted ID: 630142000000391028
---------------------
Extraction Log
---------------------
Timestamp: 2020-03-03 09:15:08
Extract Status: OK
Start Time: 2020-03-03 09:15:00
End Time: 2020-03-03 09:15:00
Period Start: 2020-03-03 08:00:00
Period End: 2020-03-03 09:00:00
Records Extracted: 137
Details: SCP to: 192.168.200.151, User: scpuser, Path: /service/datamart/TNT_O75RGRRXPKWNK89XKUVCZL, File: 7382399729111195280_coll1_EXP_INSIGHTS_FULL_SQL_20200303070000.gz
Last for Period: true
File Name: /opt/IBM/Guardium/data/dump/DATAMART/EXP_INSIGHTS_FULL_SQL_20200303070000.1.csv
Bundle Name: 
File Transfer Status: Done

Guardium Insights IP address: 192.168.200.151
Tenant: /service/datamart/TNT_O75RGRRXPKWNK89XKUVCZL
SCP user: scpuser

Status of transferred data can be easily monitored using predefined GDP Report Datamart Extraction Log (sort by Run Id)

2020-03-03_12-25-11

In summary, GDP data are extracted every hour and will be available in the GI at this frequency

2020-03-03_12-50-16

Aurora Postgres with Kinesis – direct streaming

This integration requires the collection of some important information from the AWS environment.
At the moment, GI allows you to download activity logs from Aurora database in Postgres mode (streaming for a native Postgres instance is not supported by AWS).
Let’s look at the step-by-step configuration process.

I created a new database cluster Aurora Postgres located in the us-east-2 {1} zone (check current AWS requirements for data streaming)

2020-03-03_13-28-55

{integer} indicates the information referred later

Write down the following information: master username {2} and its password {3}

2020-03-03_13-45-01

database name {4} and port {5}

2020-03-03_13-47-21

Database must be directly available from GI (set correct Inbound rule), of course GI must have access to the Internet to connect to the AWS cloud

2020-03-03_13-45-39

We activate streaming for our cluster

2020-03-03_13-34-37

Asynchronous mode will produce feed without buffering

2020-03-03_13-36-22

This configuration change takes few minutes

2020-03-03_13-36-50

and finally new Kinesis stream should be listed {6}

2020-03-03_14-14-45

Additionally collect cluster resource id {7},

2020-03-04_10-09-59

existing or new access key {8} and its secret {9} for account who has access to your Aurora database:

2020-03-03_14-35-09

and endpoint name {10} of your RDS instance:

2020-03-04_13-14-29

Having collected the necessary information we can activate streaming for our cluster. In Connections window choose Amazon Kinesis [1] and press Automatically discover streams [2]

2020-03-03_14-39-35

Then name the AWS account [1] and insert access key [2] {8} and its secret [3] {9}. Test connection [4] button allows to evaluate credentials [5]. Go Next [6]

2020-03-03_14-42-13

Screen displays list available AWS zones, select correct one [1] {2} where you Aurora RDS is located and go Next [2]

2020-03-03_14-50-49

GI displays all available, not configured yet Aurora/Postgres streams – select correct one {6}

2020-03-03_14-52-05

Now insert DB information: DB name [1] {4}, DB host (endpoint) [2] {10}, super user [3] {2} and its password [4] {3}. Values are dynamically evaluated [5]. Save configuration finally [6].

2020-03-03_15-05-47

New Kinesis connection will appear on the list in disable state. To enable it expand options list [1] and select Enable [2]

2020-03-03_15-06-24

Insert additional information: port [1] {5}, cluster id [2] {7} and consumer group (DynamoDB table created to store supporting information about shards, insert any name) [3] . Finally enable streaming [4]

2020-03-04_10-41-54

After a while stream status will change to enabled one

2020-03-03_15-08-56

Here DynamoDB tables view with just created during stream activation:

2020-03-04_13-30-04

To test the integration, perform SQL operations on a postgres database

2020-03-04_10-16-47

they should be immediately visible in Guardium Insights

2020-03-04_10-18-02

Direct streaming allows us to directly stream events to Guardium Insights and will probably be extended to support Google and Microsoft cloud databases.

GDE – policies with multiple keys

The practical implementation of data encryption measures the choice between security and the appropriateness of the chosen protection method.
File system level encryption gives us the possibility of high granularity of access even to the level of a single file, but at the same time increases the complexity of applied policies and can lead to their unreadability and accidental errors causing data loss or opening of unauthorized access.

To meet this challenge, GDE allows for the selection of keys and access rules within subdirectories, enabling effective description of data access vectors without breaking them down into separate policies assigned independently to many guardpoints.

Using multiple encryption keys in one policy at the same time gives us the possibility of:

  • consolidation the rules in one policy and reduce the number of protection points
  • protection separation for different file types in the same directory
  • use separate cryptoperiod for particular data type
  • shortening the data re-keying time when a single key is disclosed

Below I present the process of creating a policy to protect the OrangeHRM application, where in one directory structure there are files for which access should be distributed among administrators of the operating system and the application.

OrangeHRM data protection guidelines:

  • access for application processes
  • access to database files limited to backup tools in encrypted form only
  • configuration files available only to system administrators
  • logs available only to application administrators
  • all other accesses prohibited

All application files are located in /opt/orangehrm-4.3.4-2 directory and we would like to protect them by single guardpoint. We start in place when GDE agent is installed on managed machine.

2020-02-19_12-42-36

Application data are stored in MySQL databases in the directory <orange_home>/mysql/data (subdirectories represents databases)

2020-02-19_12-47-04

Configuration (conf, cnf, ini) and log files are spread across subdirectories

2020-02-19_12-51-16

This example is based on a standard policy – in the case of LDT it should be modified accordingly. We start by creating 3 new keys to encrypt the previously mentioned file groups

2020-02-19_13-03-25

Key selection is done through Resource Sets. Please note that write operations must always be associated with the correct key, so that no data can be modified with the wrong key.
Resource Sets must uniquely identify a subset of files that cannot overlap with others within a single policy.

First Resource Set (Orange_Logs) points all log files (files with .log extension and ending with _log) in the guarded directory all its subdirectories

2020-02-20_13-36-33

Orange_Configs set has a few file patterns specifications including certificates files (crt, csr, key, pem) used by OrangeHRM

2020-02-19_14-27-22

Orange_Data resource set has more complex structure. MySQL stores tables data in the separate files with different extensions depending on how they are written down.
Each database structure is stored in separate directory (for example default database mysql in <orange_home>/mysql/data/mysql)

2020-02-19_14-46-53

I defined table extensions (CSM, CSV, MYD, MYI, frm, ibd) with narrowing the scope to /mysql/data path. A path definition starting with / does not refer to the root directory. The path specification is relative and refers to guardpoint. In other words if a policy will be assigned to /opt/orangehrm the path in Resource Set will point to /opt/orangehrm/mysql/data.

2020-02-19_14-29-30

Additional definitions (ib_buffer_pool*, ibdata*, ibtmp*, ib_logfile*) without subdirectories scope point MySQL related files where system stores temporary data and redo logs.
These three Resource Sets explicitly indicate the areas of data that I intend to encrypt using the previously created keys.
Because I use standard keys and data already exist, it is necessary to do initial encryption. For this purpose we must create a transformation policy.
The main difference from a single key policy is the need to indicate key_op for each subset of files.
Of course, in the Selection in Transformation sections we also indicate Resource Set and the key to be used for encryption (this assignment must be later duplicated in the target access policy).

2020-02-19_15-09-15

The result of the transformation should be as follows:

  • configuration files encrypted with orange-config key
  • logs encrypted with orange-logs key
  • data in MySQL encrypted with orange-data key
  • other files unencrypted

However, before we perform data transformation, we should have an access policy in place to minimize system downtime.
We do not know about the processes and users that use the files we intend to protect. For this purpose, we can use Learning Mode, which only alerts blocking action instead of force this operation. Personally, I do not use this feature and simulate the process of policy customization as follows.
I start with a blind policy which aims to identify operations on encrypted resources.
The Key Selection section must properly map the Resource Sets to the keys according to the settings in the transformation policy.
The first rule is purely technical and allows everyone to view the file structure (metadata access).
The next three rules indicate all encrypted files and allow any operation on them with simultaneous audit.
The last rule allows any other operations on the remaining unencrypted files.
There is no preventive action in this policy and it will be used to identify existing file access vectors.

2020-02-19_16-06-24

To easily analyze I/O operations, you need to configure agent alert options – Policy Evaluation settings.
All policy processing related events will be audited in detail and sent to DSM. For large production systems, it is recommended to send events and analyse them on an external syslog server.

2020-02-19_16-29-06

Now we can apply to OrangeHRM directory our transformation policy. You should have a backup of data and shutdown all application services.

2020-02-19_17-08-30

When transformation policy will be active (green circle icon) apply initial Access Policy. It will be visible in the disabled mode.

2020-02-19_17-09-35

Now we can transform data. As a root execute command:

dataxform --rekey --gp /opt/orangehrm-4.3.4-2

After some time, depending on the amount of encrypted data, the computing power of the machine and the number of files, the task should succeed. A significant number of files will be skipped because they are not specified in Resource Sets.

2020-02-17_13-07-13

Now we can switch to Access policy – disable transformation policy and enable access one

2020-02-19_17-54-46

The blind policy should be completely transparent and not affect existing accesses. We can start application services and focus on audit data analysis.

2020-02-19_18-06-00

We will quickly diagnose the list of processes and users that refer to encrypted files. Thanks to the inclusion of the audit only for the area of data we are interested in, we are not accused of a whole mass of insignificant events.
After preliminary analysis we conclude that protected files are mainly used by two application processes:

  • <orange_home>/apache/bin/httpd.bin
  • <orange_home>/mysql/bin/mysql.bin

2020-02-17_13-10-07

Both processes are part of the OrangeHRM application and their access to protected files is fully authorized. We can therefore include them in independent access rules and stop auditing.
In this case, process identification is based on file signatures.
Since we use separate keys to encrypt individual data, we add three access rules for application processes.
We place them above the existing ones and remove the audit option, which will cause the modified policy to stop alerting events related to them.

2020-02-19_18-38-59

The policy can be more detailed and we can accurately identify the files used by each application process. Probably only mysqld.bin process refers to database files. However, usually it is not necessary to have such a granularity as we assume that the access from the application is obvious and unchangeable.
The updated policy should now inform us about the remaining references to encrypted files.

2020-02-20_08-58-57

Here the zszmigiero user browses and modifies the configuration files.
Each new, identified access should be analyzed and, upon acceptance of the access vector, added to our policy based on the user name.
In my example the following access rules were identified:

  • group of application administrators (zszmigiero, pnowak) – should have full access to configuration files and access to logs in read mode
  • group of system administrators (mkowalski, mlucas) – should have access to database files given but only in encrypted form (backup), other encrypted files should not be accessible to them in plaintext form

2020-02-20_09-04-40

The updated policy will generate a large number of events as the last rule audits all other application process accesses to unencrypted files.

2020-02-20_09-23-44
Thanks to an additional rule, we will disable this noise (all I/O operation from OrangeHRM processes are accepted without audit).

2020-02-20_09-27-00

This policy separates access into authorized application and administrative access. Updating the list of processes and users is usually an iterative process and the duration (learning time) depends on the number of administrative procedures and takes few days or weeks.

It’s time to change over to a prevention policy by blocking all unauthorized access to the protected directory.
The change proposed here has additional consequences and I assume that you build the policy on a test environment :).

2020-02-20_09-45-54

Blocking other I/O operations automatically controls also access to unencrypted files, which in case of my application is an additional value but requires identification of additional permissions necessary for proper administration of the application (temporary files for example) by analyzing generated events.

2020-02-20_10-37-13

Finally, examples of policies that have been developed.
application administrator (pnowak) [1]:

  • can get content of configuration files [2]
  • can modify them [3]
  • can display log files [4]
  • cannot modify log files [5]
  • cannot access plain files [6]
  • access to mysql data is prohibited [7]

2020-02-20_11-28-14

system administrator (mkowalski) [1]:

  • see content of protected files in encrypted form [2], [4], [8]
  • cannot modify files [3], [5], [7], [9]
  • see content of plain files [6]

2020-02-20_11-35-48

other users including root [1] cannot execute any action on directory content:

2020-02-20_11-44-28

Using the iterative method of building policies we are able to describe virtually any application without interrupting it. With Resource Sets it is possible to use separate keys for different types of data and simplify the management of encryption to a single policy for a protected service.

 

Guardium Insights – installation cookbook (updated to version 2.0.1)

The central reporting of activity in DAM systems is measured against the problem of huge amounts of data and the necessity of long data retention enforced by regulations, as well as correct identification of anomalies in user behavior through quantitative analysis.
Guardium Insights (GI) is a response to these needs and is designed as a container service implementable in both private and public clouds.
The following procedure brings together the various installation and configuration steps required to install Guardium Insights. The whole is divided into 5 main tasks:

I. Infrastructure Setup

Set SSH Keep-Alive to 60 seconds on your client side to avoid automatic session termination. Some installation tasks take minutes without any information displayed on screen.

1. Architecture

GI installer has 3 predefined configurations: small, medium and large. This guide assumes the smallest installation where there is no DB2 database engine redundancy. Therefore the standard implementation of OpenShift requirement with three master nodes is considered optional here.
Of course, for a production installation, I recommend that you comply with all requirements.

Guardium Insights small setup requires:

  • 4 node OCP cluster (1 masters, 3 workers)
  • 1 NFS Server
  • DNS, NTP services

In case of multi-master implementation there is also required load balancer.

GI should be installed on a high-performance hardware platform that provides efficient access to the memory and storage. Due to the nature of the Kubernetes cluster and the exchange of information between services, it is recommended that all machines operate within one data centre.
My installation is VMWare ESX based (with 100 cores, 240 GB RAM and 40 TB HDD).

2. VM requirements

Here the VM machine specifications:

  • Master node (gimaster.guardium.notes):
    – 4 cores, 16 GB RAM, 300 GB HDD
  • 3 worker nodes (gidb.guardium.notes, giaux1.guardium.notes, giaux2.guardium.notes):
    – 32 cores, 64 GB RAM, 300+300 GB HDD (OS+GlusterFS)
  • 1 NFS server (ginfs.guardium.notes):
    – 4 cores, 16 GB RAM, 50 GB HDD, 1-4 TB HDD for GI events

From my point of view huge root partitions on worker nodes are not important and standard 20-40 GB root filesystem should be enough. Docker repository will be located on root file system of master.
Additional 300 GB disks on workers will be utilized by gluster based services.
In this release all audited data will be stored out of cluster nodes on remote file system (NFS)

Software requirements:

  • RedHat 7 with access to OpenShift repository
  • IBM Cloud Private 3.2.1 (latest patch preferred)
  • Security Guardium Insights 2.0.1
  • NFS Server 4.2 (no OS requirements)

3. RedHat setup for OpenShift cluster

OS installation2019-12-30_13-53-47

I suggest Minimal Install with some standard administration tools. Access to infrastructure servers GUI is not required.

Disk setup

Install OS on the first 300 GB disk and use manual partitioning described below to expand root partition to whole available space. Other disks will be configured later.2019-12-30_14-17-55

Switch to suggested configuration2019-12-30_13-55-38

Remove the /home partition from suggested configuration2019-12-30_13-57-21

Set root partition size to maximum available space2019-12-30_13-58-31

In my example the root partition will be used as docker storage (overlayFS) hence its size.
You can also install the docker before installing OpenShift and configure the storage differently, for example as a thin pool and then the root partition can be smaller – 50 GB.

Networking

I suggest static IP configuration of your GI infrastructure. All names must be resolvable using DNS.2019-12-30_14-00-29

2019-12-30_15-19-10

My nodes are located in this same VLAN so there is no needs to setup firewall rules. However you should check connectivity between them in case of multi-VLAN architecture. The nature of container networking setup in the OpenShift requires that communication between nodes should be generally open.

Important: Additionally create DNS GI subdomain – for example insights – and add the redirection to master node using A or CNAME record

*.<subdomain>.<domain> <master_ip_address>

2019-12-30_15-27-23

Set DNS to resolve correctly all your node names.

Other settings:

Set correct local timezone and synchronize all your nodes with NTP server.

4. NFS Server setup

Setup additional machine (ginfs.guardium.notes) with support of NFS Server 4.2. I prefer Linux distributions (CentOS, Ubuntu or RedHat).
Use static IP or DHCP with automatic IP registration in your DNS (in the second case always refer NFS server using resolvable name)
There is no exact demands for primary storage.
System should have attached additional disk or volume to store GI events (for small configuration the default value is 4 TB)

Setup disk and format additional storage to use efficient file system (like xfs or ext4). For example:

fdisk /dev/sdb
mkfs.xfs /dev/sdb1

Create mounting point and setup it in the /etc/fstab (in my case I have decided mount additional disk to /gi directory)

mkdir /gi
echo "/dev/sdb1 /gi xfs defaults 0 0" >> /etc/fstab
cat /etc/fstab
mount -a
#
# /etc/fstab
# Created by anaconda on Fri Mar 27 14:35:40 2020
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/rhel-root / xfs defaults 0 0
UUID=9f294fee-9568-4bf6-90c5-e8933dd19e36 /boot xfs defaults 0 0
/dev/mapper/rhel-swap swap swap defaults 0 0
/dev/sdb1 /gi xfs defaults 0 0

Install NFS utils package

yum install -y nfs-utils

Setup correct access to NFS share

chgrp nfsnobody /gi
chmod 777 /gi/

If your operating system has SELinux activated you should set correct security context for NFS shares:

semanage fcontext -a -t nfs_t "/gi(/.*)?"
restorecon -Rv /gi

Create accesses to NFS share in /etc/exports. Settings will provide access to shared storage only for cluster nodes.

for h in gimaster.guardium.notes gidb.guardium.notes giaux1.guardium.notes giaux2.guardium.notes; do echo "$h(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports; done

Re-read shares and start NFS server

exportfs -r
systemctl start nfs-server
systemctl enable nfs-server

Check exported shares

[root@ginfs ~]# exportfs
/gi gimaster.guardium.notes
/gi gidb1.guardium.notes
/gi giaux1.guardium.notes
/gi giaux2.guardium.notes

Setup correct firewall rules

firewall-cmd --permanent --add-service={mountd,nfs,rpc-bind}
firewall-cmd --reload

5. RedHat subscription and rpm’s setup (execute on master node)

Generate ssh key without passphrase

ssh-keygen

2019-12-30_16-40-41

Then copy key to other nodes to eliminate password authentication between nodes (you must accept key and provide password to each node)

for h in gimaster.guardium.notes gidb.guardium.notes giaux1.guardium.notes giaux2.guardium.notes; do ssh-copy-id -i ~/.ssh/id_rsa.pub $h; done

Register nodes in RedHat (replace with your RHN account credentials)

for h in gimaster.guardium.notes gidb.guardium.notes giaux1.guardium.notes giaux2.guardium.notes; do ssh $h subscription-manager register --username   --password ; done

Attach subscription <pool> with access to RedHat and OpenShift repos.

for h in gimaster.guardium.notes gidb.guardium.notes giaux1.guardium.notes giaux2.guardium.notes; do ssh subscription-manager attach --pool=; done

Check attached subscriptions on master

subscription-manager list --consumed

You should have access to the marked functionalities2019-12-30_16-02-31
Add required installation repo’s

for h in gimaster.guardium.notes gidb.guardium.notes giaux1.guardium.notes giaux2.guardium.notes; do ssh subscription-manager repos --enable="rhel-7-server-rpms" --enable="rhel-7-server-extras-rpms" --enable="rhel-7-server-ose-3.11-rpms" --enable="rhel-7-server-ansible-2.6-rpms" --enable="rh-gluster-3-client-for-rhel-7-server-rpms"; done

Update systems

for h in gimaster.guardium.notes gidb.guardium.notes giaux1.guardium.notes giaux2.guardium.notes; do yum -y update; done

Install gluster-fuse package

for h in gimaster gidb giaux1 giaux2; do ssh $h yum -y install glusterfs-fuse; done

Finally restart systems

for h in gidb.guardium.notes giaux1.guardium.notes giaux2.guardium.notes gimaster.guardium.notes; do shutdown -r now; done

II. OpenShift installation (all steps executed on master node)

My installation assumes that root account is OpenShift installation owner.
I refer to node names, IP addresses in my lab – they should be modified accordingly.
GI 2.0.1 supports OpenShift 3.11 only.

1. Openshift Ansible installation

Install openshift-ansible package on master

yum -y install openshift-ansible

2. Inventory file review

OpenShift configuration must meet IBM Cloud Private and Guardium Insights requirements for glusterfs.

Here my inventory.ini file with some important remarks:

[masters]
gimaster.guardium.notes
[etcd]
gimaster.guardium.notes
[nodes]
gimaster.guardium.notes openshift_node_group_name="node-config-master-infra"
gidb.guardium.notes openshift_node_group_name="node-config-compute"
giaux1.guardium.notes openshift_node_group_name="node-config-compute"
giaux2.guardium.notes openshift_node_group_name="node-config-compute"
[glusterfs]
gidb.guardium.notes glusterfs_ip=192.168.250.160 glusterfs_devices='["/dev/sdb"]'
giaux1.guardium.notes glusterfs_ip=192.168.250.170 glusterfs_devices='["/dev/sdb"]'
giaux2.guardium.notes glusterfs_ip=192.168.250.180 glusterfs_devices='["/dev/sdb"]'
[glusterfs_registry]
gidb.guardium.notes glusterfs_ip=192.168.250.160 glusterfs_devices='["/dev/sdb"]'
giaux1.guardium.notes glusterfs_ip=192.168.250.170 glusterfs_devices='["/dev/sdb"]'
giaux2.guardium.notes glusterfs_ip=192.168.250.180 glusterfs_devices='["/dev/sdb"]'
[OSEv3:children]
masters
nodes
etcd
glusterfs
glusterfs_registry
[OSEv3:vars]
ansible_user=root
ansible_become=false
ansible_ssh_user=root
host_key_checking=False
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
openshift_deployment_type=openshift-enterprise
openshift_release=v3.11
openshift_master_default_subdomain=insights.guardium.notes
openshift_disable_check=docker_storage,docker_image_availability,package_version
openshift_storage_glusterfs_image=registry.access.redhat.com/rhgs3/rhgs-server-rhel7:v3.11
openshift_storage_glusterfs_block_image=registry.access.redhat.com/rhgs3/rhgs-gluster-block-prov-rhel7:v3.11
openshift_storage_glusterfs_heketi_image=registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7:v3.11
openshift_storage_glusterfs_timeout=900
openshift_storage_glusterfs_namespace=glusterfs
openshift_storage_glusterfs_storageclass=true
openshift_storage_glusterfs_storageclass_default=true
openshift_storage_glusterfs_block_deploy=true
openshift_storage_glusterfs_block_host_vol_size=50
openshift_storage_glusterfs_block_storageclass=true
openshift_storage_glusterfs_block_storageclass_default=false
debug_level=2
openshift_docker_selinux_enabled=true
openshift_docker_options="--selinux-enabled --signature-verification=false --insecure-registry=['192.168.250/24','172.30.0.0/16'] --log-opt max-size=1M --log-opt max-file=3 --disable-legacy-registry=true"
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
openshift_master_htpasswd_users={'ocadmin': '$apr1$e4i8sseH$zN5tmrKq3WjCNnIVUPmKM.'}
openshift_master_api_port=7443
openshift_master_console_port=7443
openshift_hostname_check=false
os_sdn_network_plugin_name=redhat/openshift-ovs-networkpolicy
oreg_url=registry.access.redhat.com/openshift3/ose-${component}:${version}
oreg_auth_user="XXX"
oreg_auth_password="XXX"
openshift_examples_modify_imagestreams=true
openshift_hosted_registry_cert_expire_days=3650
openshift_ca_cert_expire_days=3650
openshift_node_cert_expire_days=3650
openshift_master_cert_expire_days=3650
etcd_ca_default_days=3650

[OSEv3:children] – our OpenShift deployment contains glusterfs and glusterfs_registry support.
glusterfs_devices – this parameter points the block device of the additional disk available on all nodes except master, check using fdisk -l the correct device name.
Your block device must be uninitialized – use appropriate system tools to clean all pv’s and volumes on it before installation.
openshift_master_default_subdomain – subdomain reference for cluster services, should correspond to DNS name defined before (insights.guardium.notes in my case)
openshift_docker_options – provide correct network subnet definition for your nodes in the parameter –insecure-registry, value 172.30.0.0/16 is default virtual ip addressing for OpenShift cluster ([‘192.168.250/24‘,’172.30.0.0/16’] – value 192.168.250/24 points my node’s public addresses
openshift_master_htpasswd_users – defines user and password for OpenShift default admin, password must be provided in encrypted form (use htpasswd), my example above codes user “ocadmin” with password “guardium
openshift_master_api_port and openshift_master_console_port – OpenShift public services ports
oreg_auth_user and oreg_auth_password – user and password (plain text) for RHN account

htpasswd command to get encrypted user password for openshift_master_httpasswd_users parameter

htpasswd -nb <user> <password>

3. Inventory file creation

Create inventory.ini file (base on proposed above) in root home directory on master and adapt it to your settings.

4. Prerequisites playbook execution

Execute prerequisites.yml playbook

ansible-playbook -i inventory.ini /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml

After a few minutes, operations on all nodes should be completed.

Sometimes the playbook can hang during the installation of the docker. If the installation progress output does not change the status on this task for a few minutes, restart the playbook.

2019-12-30_20-20-57

5. Cluster deploy playbook execution

Execute deploy_cluster.yml playbook

ansible-playbook -i inventory.ini /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml

2019-12-30_22-08-34

After several minutes, the installation process should be completed and displays summary

The command below displays information about node designation and its status

oc get nodes

2019-12-31_16-41-55

I suggest execute these commands to assign to ocadmin user the cluster-admin role

oc login -u system:admin
oc adm policy add-cluster-role-to-user cluster-admin ocadmin

Finally we can login to OpenShift portal https://gimaster.guardium.notes:7443 or https://console.insights.guardium.notes (ingress  route)2019-12-31_16-48-00Switch to Console2019-12-31_16-49-21and review cluster status2020-01-02_19-44-14

III. IBM Cloud Private installation (all steps executed on master node)

1. Latest version of ICP download

Guardium Insights 2.0.1 requires IBM Cloud Private 3.2.1 installed on OpenShift. The latest ICP version is available on IBM Fix Central

2020-05-04_20-14-50

In my case the latest fix pack has subversion 2003. Please download version applicable for OpenShift installation – ibm-cloud-private-rhos prefix.

Then upload archive to your master node.

2. Installation preparation

To upload ICP docker images to OpenShift repository execute:

tar xf ibm-cloud-private-rhos-3.2.1.2003.tar.gz -O | docker load

your gzip archive name can differ from mine2020-01-02_12-21-07

This process takes several minutes.

ICP archive can be removed from file system to save the storage space.

Then create ICP installation directory and switch to it:

mkdir /opt/icp-install; cd /opt/icp-install

Installation files are located in the one of just uploaded container – ibmcom/inception

To copy installation files to /opt/icp-install execute command:

docker run --rm -v $(pwd):/data:z -e LICENSE=accept --security-opt label:disable `docker images| grep ibmcom/icp-inception | grep -v grep | awk '{print($1":"$2)}'` cp -r cluster /data

Move cli session to cluster directory and copy OpenShift configuration file to it:

cd cluster; cp /etc/origin/master/admin.kubeconfig kubeconfig

We must also update the /opt/icp-install/cluster/hosts file to these settings:

[master]
gimaster.guardium.notes

[etcd]
gimaster.guardium.notes

[nodes]
gidb.guardium.notes
giaux1.guardium.notes
giaux2.guardium.notes

it is node function assignment of our OpenShift cluster (use your node names).

Then we must update /opt/icp-install/cluster/config.yaml file (use your node and domain names)
Place the indentations properly in the yaml document

storage_class: glusterfs-storage

openshift:
 console:
  host: gimaster.guardium.notes
  port: 7443
 router:
  cluster_host: icp-console.insights.guardium.notes
  proxy_host: icp-proxy.insights.guardium.notes

ingress_http_port: 3080
ingress_https_port: 3443

password_rules:
- '(.*)'

management_services:
 vulnerability-advisor: disabled
 storage-glusterfs: disabled
 storage-minio: disabled
 istio: disabled
 monitoring: enabled
 metering: enabled
 logging: enabled
 custom-metrics-adapter: disabled
 platform-pod-security: enabled
 multicluster-hub: disabled

cluster_nodes:
 master:
 - giaux1.guardium.notes
 proxy:
 - giaux1.guardium.notes
 management:
 - giaux1.guardium.notes

default_admin_user: icpadmin
default_admin_password: XXXX

storage_class: glusterfs-storage
preferred cluster storage class for Guardium Insights
console:
 host: gimaster.guardium.notes
 port: 7443
OpenShift console URL (not routed one)
router:
 cluster_host: icp-console.insights.guardium.notes
 proxy_host: icp-proxy.insights.guardium.notes
ICP console and router URL’s – use openshift_master_default_subdomain defined during OpenShift installation
ingress_http_port: 3080
ingress_https_port: 3443
OpenShift ingress ports (80, 443) cannot overlap ICPs’, so define another
cluster_nodes:
 master:
 – giaux1.guardium.notes
 proxy:
 – giaux1.guardium.notes
 management:
 – giaux1.guardium.notes
You can spread ICP functions between no-DB workers or put everything on one of them.
default_admin_user: icpadmin
default_admin_password:
set ICP portal user name and credential in plain text (I suggest do not use the same username for OpenShift and ICP because these two are different identities in our infrastructure and use separate authentication mechanisms, there is possibility to configure later that OCP will use the ICP authentication mechanisms).
monitoring: disabled
metering: disabled
logging: disabled
for non-production environments you can disable ICP infrastructure services – ELK, Prometheus and Grafana

3. ICP installation

ICP installation command:

docker run -t --net=host -e LICENSE=accept -v $(pwd):/installer/cluster:z -v /var/run:/var/run:z -v /etc/docker:/etc/docker:z --security-opt label:disable `docker images| grep ibmcom/icp-inception | grep -v grep | awk '{print($1":"$2)}'` install-with-openshift

It takes several minutes

2020-01-02_14-12-49

4. Post installation tasks

Check nodes assignment and correct security context for ICP:

kubectl --kubeconfig /etc/origin/master/admin.kubeconfig get nodes
kubectl --kubeconfig /etc/origin/master/admin.kubeconfig patch scc icp-scc -p '{"allowPrivilegedContainer": true} --type=merge'
NAME                    STATUS  ROLES                                       AGE   VERSION
gidb.guardium.notes     Ready   compute                                     23d   v1.11.0+d4cacc0
gimaster.guardium.notes Ready   infra,master                                23d   v1.11.0+d4cacc0
giaux1.guardium.notes   Ready   compute,icp-management,icp-master,icp-proxy 23d   v1.11.0+d4cacc0
giaux2.guardium.notes   Ready   compute                                     23d   v1.11.0+d4cacc0

Now we can log to ICP console (https://icp-console.insights.guardium.notes defined in config.yaml)2020-01-02_14-44-00Select Overview from menu and confirm that all deployments are healthy2020-01-02_14-58-35

5. CLI tools installation

GI installation script requires access to cloudctl and helm command line tools which are also useful for cluster administration tasks.

As a root user copy tools from ICP console (use your ICP name) into root home directory:

curl -kLo cloudctl https://icp-console.insights.guardium.notes/api/cli/cloudctl-linux-amd64
curl -kLo helm.tar.gz https://icp-console.insights.guardium.notes/api/cli/helm-linux-amd64.tar.gz

Move binaries to /usr/local/bin

chmod +x cloudctl; mv cloudctl /usr/local/bin
tar xvf helm.tar.gz *helm; mv linux-amd64/helm /usr/local/bin;rm -rf linux-amd64/ helm.tar.gz

2020-01-02_15-34-28

IV. Guardium Insights installation (all steps executed on master node)

1. Cluster configuration to use NFS server

Now we must install and configure on cluster the NFS client pod to get access to remote storage.

Setup environment variables (NFS_SERVER_IP and NFS_PATH) to point IP or server name of NFS server and path to share for audited events

export NFS_SERVER_IP=192.168.100.231
export NFS_PATH=/gi

Create namespace for NFS client (for instance: nfs)

oc create namespace nfs; oc project nfs

Login to kube with helm initialization to nfs workspace

cloudctl login -a https://icp-console.insights.guardium.notes --skip-ssl-validation -u icpadmin

2020-05-04_20-48-36

Update helm repository

helm init --client-only; helm repo update

Install NFS client pod

helm install --set podSecurityPolicy.enabled=true --set 'nfs.mountOptions[0]="nfsvers=4"' --set 'nfs.mountOptions[1]="context=system_u:object_r:container_file_t:s0"' --set nfs.server=$NFS_SERVER_IP --set nfs.path=$NFS_PATH stable/nfs-client-provisioner -n=nfs --tls

Check the installation status, one pod – nfs-client-provisioner – should be in running mode after a while

oc get pods -n nfs

2020-05-04_20-55-33

2. GI software package

Download Guardium Insights installer from Passport Advantage. The software package archive should contain installation gzip file – SCRTY_GUARDIUM_INSIGHTS_V_2.0.1.tar.gz. Copy it to master node and unpack

cd; tar zxf SCRTY_GUARDIUM_INSIGHTS_V_2.0.1.tar.gz; rm SCRTY_GUARDIUM_INSIGHTS_V_2.0.1.tar.gz

Two GI installation subdirectories – insights_images and insights_charts – should appear in your root home directory:2020-01-02_16-08-00

3. values-small.yaml modifications

You can adapt the GI settings before installation. Especially you should provide the correct size of storage reserved on NFS server for audited events.

In case of GI 2.0.1 the NFS share is also used by some other services (IBM Event Streams and Mongo DB) and we must reserve for them approx. 450 GB. So if you prepared NFS share with size of 2TB the correct DB2 space should point 1550Gi

sed -i 's/size: 4000Gi/size: 1550Gi/' insights_charts/values-small.yaml

Only for GI test or demo installation you can also decrease DB2 memory and cpu demands:

sed -i 's/memory: 50Gi/memory: 32Gi/' insights_charts/values-small.yaml
sed -i 's/cpu: 10000m/cpu: 8000m/' insights_charts/values-small.yaml

4. GI installation preparation

Create two namespaces for IBM Event Streams and GI (both names must be cluster unique and maximum 10 characters)

oc create namespace gi-stream
oc create namespace gi

Login to ICP and setup helm

cloudctl login -a https://icp-console.insights.guardium.notes -u icpadmin

2020-01-02_16-28-42

Login to local OpenShift docker repository. Docker login requires token authentication which can be extruded from oc whoami command:

docker login -u ocadmin -p `oc whoami -t` docker-registry-default.insights.guardium.notes

Load GI images to OpenShift repository (it takes a while)

cd insights_images/; ./dockerLoad.sh

2020-01-03_09-59-07

Push images to project, takes next few minutes (in my case it is gi namespace):

./dockerPush.sh docker-registry-default.insights.guardium.notes/gi

2020-01-02_17-15-56

5. GI installation script execution

Execute install.sh with some options

./installer.sh -s gidb.guardium.notes -d gi-stream -n gi -h gi-console.insights.guardium.notes -l true -a icpadmin -e https://icp-console.insights.guardium.notes -i docker-registry.default.svc:5000/gi -o values-small.yaml -r gi-stream -u 192.168.250.150

where:
-s – DB node specification
-n – GI namespace name
-r – IBM Event Stream helm release name
-d – IBM Event Stream namespace name (usually this same like provided with option -r)
-h – virtual name of Guardium Insights portal
-l – license agreement acceptance
-a – ICP admin user
-e – URL of ICP portal
-i – local project registry – put here private cluster docker URL (docker-registry.default.svc:5000/)
-u – master node ip (datamart store service IP for Guardium Data Protection integration)
-o – GI architecture ( values-{small,med,large}.yaml )

Script will additionally ask for ICP password:

Missing global.insights.icp.authPassword. ICP admin users password. Enter the global.insights.icp.authPassword:

and requests confirmation of helm command execution:

Do you wish to execute the above helm command with the shown values ?(y/n)

Process takes several minutes. You can monitor it using OpenShift console, cluster tools or wait for deployment of tenant-postdeploy-ready ConfigMap.

oc get cm | grep tenant-postdeploy-ready

2020-01-02_17-38-31Finally we can login to Guardium Insights using ICP admin credentials (https://gi-console.insights.guardium.notes):2020-01-02_17-49-32

2020-05-05_14-55-17

and configure GI feeds.

You can monitor GI data flows in IBM Event Streams – correct URL is displayed by command

kubectl get route -n gi-stream "gi-stream-ibm-es-ui-route" -o 'jsonpath={.spec.host}'

Then log in using ICP admin credentials

2020-05-04_22-46-55

V. Update GI to 2.0.1

There is no update procedure from previous versions (2.0 and 2.0.0.1) to current one.

New version comes with updated configuration of:

  • OCP – master is not longer a worker and ICP is installed out of it
  • ICP – now is installed across workers and default storage class is gluster-storage
  • GI – do not longer use local filesystem on DB node to store audited events, open source Kafka is replaced by IBM Streams and the additional not-clustered NFS server is used to store events

These changes led to conclusion that production installation should be completely fresh one.

However for tests purposes there is possibility to reinstall GI on already installed OCP and ICP.

Short guideline for this procedure:

1. Remove GI helm and namespace

Execute command (it takes a while)

helm delete [gi_namespace] --purge --tls

Then cleanup GI namespace related objects

oc delete namespace [gi_namespace]

2. Update memory and CPU requirements for workers to presented in I.2

Be carefull and do not stop master node. You can remove data partition from DB node – it is not used in version 2.0.1

3. Setup NFS server according to I.4

4. Optionally upgrade your ICP to the latest version (my steps point upgrade to version 2.3.1.2003)

a. Download the latest version of ICP from Fix Central

b. Upload containers to your docker repository

tar xf ibm-cloud-private-rhos-3.2.1.2003.tar.gz -O | docker load

c. Create the update directory in the root home

mkdir icp-install-2003; cd icp-install-2003

d. Copy from the latest repo the ICP configuration files. You will have minimum two icp-inception versions in the repository so you must directly refer to the latest one:

run --rm -v $(pwd):/data:z -e LICENSE=accept --security-opt label:disable ibmcom/icp-inception-amd64:3.2.1.2003-rhel-ee cp -r cluster /data

d. Copy configuration files from OCP and previous ICP installation directory

cd cluster
cp /etc/origin/master/admin.kubeconfig kubeconfig
cp ../../[previous_ICP_version]/cluster/hosts .
cp ../../[previous_ICP_version]/cluster/config.yaml .

e. Start upgrade process

docker run -t --net=host -e LICENSE=accept -v $(pwd):/installer/cluster:z -v /var/run:/var/run:z -v /etc/docker:/etc/docker:z --security-opt label:disable ibmcom/icp-inception-amd64:3.2.1.2003-rhel-ee upgrade

2020-05-05_12-22-18

f. Log in to ICP web console and check version

2020-05-05_12-32-30

g. Update cloudctl and helm cli tools according to III.5 if needed (there is no changes between 2001 and 2003)

5. Install GI using procedure from chapter IV


Please share with me your problems and findings related to this cookbook – zibi mail

Thanks a lot to Devan Shah, I wouldn’t have moved on this article without him.

This is the third version of Guardium Insights still not all options have been documented, so this cookbook should be treated as a supporting resource.
For my part, I will update it in case of important changes or inaccuracies.

The older version for GI installation 2.0 and 2.0.0.1 is available on IBM Security Learning Academy

Here article about feed configuration – Guardium Insights feeding

6 methods of data access monitoring

Time and again I am asked what and how can be monitored by Guardium? I will try to present in a concise way possible methods and indicate where they are usually used and  pros and cons all of them.

Let’s start with the basic question of what is behind the monitoring?
The purpose of DAM solutions is to identify and analyze every possible vector of access to the data engine in order to:

  • comply with the audit requirements set by the regulations
  • detect unwanted data access activity related to security breaches, attempted theft, destruction or fraudulent actions
  • providing information on threat and risk analysis through the context of access to data
  • prevention of undesirable and harmful effects of un- and intentional activity
  • ensuring accountability of access to data for investigation and corrective actions

These requirements force the DAM solution to access all sessions initiated both remotely and locally using all data engine supported protocols and communication encryption methods.

policy1-1

Access vectors for self-owned and IaaS environments

The diversity of data engines technologies and IT environment setup presents us with a very difficult task and a uniform approach is currently impossible. Therefore, in Guardium it is possible to approach the problem of data monitoring in many ways, which I describe in more detail below.

1. Agent based (S-TAP, Z-TAP)

policy1-2

Agent-based monitoring is the most popular method of collecting information about data activity on the monitored system.

This method requires installation at the operating system level of an agent that has the ability to intercept sessions. In the case of Unix and Linux this is done at the kernel level. Drivers are used on Windows platform. Monitoring of z/OS uses a combination of Query Monitor and analysis of memory areas in DB2 or IMS space. The i5 platform (AS/400) uses a unique method of capturing communication between processes and no need to use journals.
Additionally, an agent on Unix, Linux and Windows platforms allows to track I/O operations in order to monitor file access. Similar functionality is also available on z/OS in the area of monitoring access to various types of data sets.

For most supported data engines it is possible to monitor communication outside the TCP stack: pipe, ipc, shared memory protocols.
Agent implementation does not force termination of encrypted connections in order to capture the context of the session, although not all databases allow this approach (for example MySQL, MariaDB).

All intercepted traffic is sent to the collectors and further processed there (except z/OS and i5 where preprocessing in managed on monitored system level).
The agent allows for selective monitoring by excluding connections (at the IP level) or sessions (dynamically).
Additionally, working at a low level of the operating system allows to block the sessions inline.

Pros

  • Local sessions visibility – only this method allows us to see local sessions with SoD, one of the most important DAM use case.
  • Segregation of Duties (SoD)  – data engine administrator has no influence on monitoring.
  • Very difficult to avoid to be monitored – only database reconfiguration which is changing the available access vectors can lead to miss the activity interception although this kind of situation can be identified and alerted, in some cases Guardium can automatically reconfigure itself even (inspection engine management).
  • Inline blocking – blocking access or preventing specific behavior can be implemented easily but should not be used in the application stream.
  • Easy implemented interception of the encrypted sessions – visibility of encrypted sessions does not require access to certificates and connections termination

Cons

  • Software installation on production systems – can be used only on systems where access to OS is possible (self-owned platform or IaaS), IT departments can block this approach sometimes (especially for outsourced systems), data engine vendor or partner responsible for implementation can try to avoid monitoring this way
  • Influence on the monitored system performance – Guardium agent are very reliable and efficient, CPU load usually does not exceed 2 percent even at peak activity levels. However this aspect can be stopper sometimes on the critical systems or systems where customer has no enough skills (legacy ones) and afraid any change.
  • Strictly defined support of OS platforms – Guardium supports the vast majority of commercial operating systems (Windows, AIX, Solaris, HP-UX, Linux (RedHat/CentOS, Suse, Ubuntu/Debian) – in the case of Linux also for various processor platforms). Support for new OS version usually appears no later than a quarter after release. However, we may encounter niche operating systems or long since not supported versions of OS, which may imply a different method of monitoring.
  • OS and Data Engine administration procedures modification – standard non-encrypted traffic monitoring can be completely transparent for OS and data engine administrators. However, in the case of Linux, where changes in the microkernel are frequent, it is necessary to develop a proper system update process to ensure continuous monitoring. Guardium provides a few scenarios for that which are described here and here.
    Access to encrypted traffic is usually associated with ATAP configuration which requires an additional procedure to update the agent and database. Both of these aspects can block implementation from the IT side sometimes.

Monitoring with S-TAP can be done in several ways:
– KTAP – session capture at the kernel subsystems level
– ATAP – access to decrypted traffic and some protocols through interaction with the data engine
– PCAP – collecting only network packets (outdated)
– EXIT – consumption of the existing data engine API interface for DAM (available for DB2, Informix and Teradata)

The preferred method is EXIT but keep in mind that the data engine administrator has control over its configuration and in this case you need additional configuration change control, for example Guardium CAS.

2. Agentless – SPANpolicy1-3

In this method of monitoring, all sessions are copied at the network level using the port mirroring capability of network switches to the broadcast subnet in which the collector is listening.
Only unencrypted traffic can be analyzed this way.
The configuration of network devices must ensure correct redirection of every possible path to the monitored data engine in order to duplicate session packets.
A single collector provides up to two network interfaces to support this type of event collection.

Technical it is possible to combine agent and agentless methods together, where remote sessions are monitored by SPAN and local ones are intercepted by S-TAP, but it doesn’t make much sense thanks to the very good performance of the agent.

All the time, the choice of a method other than S-TAP agent should be based on technical capabilities, expected benefits, implementation complexity and limitations.

Pros

  • Segregation of Duties – data engine administrator has no influence on monitoring
  • No influence on monitored system – monitoring is completely transparent
  • OS platform independence – OS base of monitored engine is not relevant

Cons

  • No local sessions visibility – monitoring at the network level excludes the possibility of monitoring of local sessions, so the method is used only where access to the operating system is impossible.
  • Additional costs and security controls – ensuring non-repudiation and audit quality requires an additional control layer. In large networks with complex physical and logical infrastructure, redirecting and duplicating traffic involves a large investment in additional network equipment, ensuring high availability and configuration change monitoring. In the case of a public cloud, a more effective solution is to use External STAP, as shown below.
  • Unsupported encryption communication – in this method it is not possible to decrypt traffic, which limits its use (especially in the cloud).
  • No prevention features – is based on passive monitoring and thus no possibility of active session blocking.

3. External S-TAP (E-TAP)policy1-4

The adoption of cloud solutions is inevitable. PaaS and SaaS models are becoming a standard not only for most startups but also for larger and larger organizations.
Migrating critical data beyond the bounds of direct control of its owner or administrator is a challenge that must also be met by Guardium.

External S-TAP allows you to capture sessions inline by redirecting them to it using a load balancer. The intercepted traffic is then sent to the collector.
E-STAP is a docker based container service. Therefore, it can be quickly implemented, scaled and administered using Kubernetes or other tools available to manage dockers on cloud site. The solution itself is not limited to public cloud monitoring only. It can be used effectively in a private cloud or for standard data engines.

Pros

  • Segregation of Duties – data engine administrator has no influence on monitoring
  • OS platform independence – OS base of monitored engine is not relevant
  • Simple implementation on cloud – containerization allows fast and easy management of cloud monitoring.
  • Encrypted traffic visibility and prevention capabilities – despite the illusory similarity to port mirror monitoring, we are dealing with active tracking that implements traffic decryption and preventive actions (probably available from version 11.1)
  • Real alternative to native cloud streams – the critical aspect of the loss of control over the infrastructure is also reflected in the control layer, which is the lack of independent audit of cloud vendor security services. E-STAP allows us to build an independent control mechanism and estimate the quality of services offered by the cloud provider. It is also obvious that monitoring needs to be unified when multiple cloud providers are selected (diversification), where this method can be more convenient than simply streaming events through a cloud data engine.

Cons

  • No local sessions visibility – monitoring at the network level excludes the possibility of monitoring of local sessions but in PaaS and SaaS cases it is not relevant (similar to cloud data engine streaming).
  • Additional costs and security controls – additional use of load balancer and dockered microservice has an impact on the total cost of the monitoring solution. Network configuration control aspects similar to SPAN monitoring.

4. Direct Streaming

policy1-5

The simplest and agentless method of monitoring consisting in direct sending logs of data engine activity to Guardium.
This is the preferred approach by cloud providers but in many cases is not satisfactory due to the lack of segregation of the monitored entity from the monitoring mechanism.
It seems interesting to combine this method with ETAP to verify the quality of the native log :).

Pros

  • Simple and cheap implementation on cloud – usually it is enough to enable the option and specify the recipient of the data, in some solutions free of charge if the events are not sent outside the internal network.
  • All traffic visible (customer access) – a properly structured activity audit should deliver the whole activity on demand (how to prove it?). Another issue is the possibility of its filtering at this stage, which is usually not possible and forces the analysis of the entire traffic on Guardium level even when it is not necessary.

Cons

  • Lack segregation of duties – choosing a cloud solution is always associated with limitations of control and their transfer to the provider. However, wherever possible, we should apply independent security monitoring mechanisms. A good example is the analysis of leakage of personal data protected by law (GDPR or CCPA) and the issue of the responsibility of the controller and data operator. Under most contracts, the cloud provider does not automatically become their operator and it is still necessary to assess the risks and appropriate control measures – whether the non-repudiation, adequacy and all the information is available to us? I leave this question for you to think about 🙂
  • Log absorption – the format of the information provided usually changes much more often than the native engine language, which may affect the quality of the data or its importability to Guardium (however, a close cooperation between IBM and supported cloud vendors should be assumed).
  • No prevention – logging means passive monitoring without the possibility of preventive actions.

5. Logging with STAP

policy1-6

In this approach, the native data engine logs are consumed by the S-TAP and not sent directly to the collector as was the case with the direct stream.
They are usually available as replacement solutions for S-TAP where event capture is complex or preventive actions are not required.
An example is a method of monitoring Oracle engine through communication with Oracle Connection Manager or Cloudera where logs are taken from Kafka node instead of installing an agent on each data node.

Usually S-TAP does not have to be installed directly on the environment where logs are stored, but only in the place where network access to log is possible.

Pros

  • Simple implementation – eliminates technical and administrative problems related to monitoring through the use of native audits.
  • All traffic visible – as in the case of streaming, the log should contain all activities but also enable its filtration.

Cons

  • Lack segregation of duties – the same problems as with the streaming.
  • Additional costs and security controls – the event logging function may be associated with additional licenses, architecture or infrastructure changes.
  • Influence on performance – native logging very often is associated with a large impact on the performance of the data engine which forces the filtration of events to the necessary minimum.
  • No prevention – logging means passive monitoring without the possibility of preventive actions.

6. Universal feed, custom tables

policy1-7

Forgotten and potentially very useful method of supplying Guardium with events from unsupported data engines.
Guardium provides several data absorption methods, including native S-TAP (protobuff) agent communication protocol.
Unfortunately, poor documentation in this area blocks the development of these techniques.
Here is a link to a more detailed description.

Pros

  • Support of unsupported – the ability to integrate analysis and reporting of data access for niche and industry-specific solutions is a value that cannot be overestimated.

Cons

  • Difficult implementation – the aforementioned lack of full API documentation blocks the development of this functionality, but a change in this area is expected, about which a few words at the end of this article.
  • No prevention – Universal Feed implements one-way communication so we are talking about simple log streaming or consumption.

Looking to the future

Will the consolidation of events from the collectors in GBDI or in just announced Guardium Insights bring the significant changes in the techniques of event interception?
Probably not because the “data lake” is only a place where standardized information is processed.
At most, we can expect the possibility of direct transmission of events without the use of collectors rather than new mysterious methods of collecting and delivering them.
From a different perspective, data standardization and ease of extending their schemes in JSON structures, REST-based communication and containerization may facilitate the extension of supported platforms through communities, business partners or Guardium’s clients themselves.

The mechanisms presented are, in my opinion, proof of the good preparation of Guardium for the revolution that is taking place, known as the ‘journey to the cloud’*.

* 😉

 

Monitoring AWS Oracle RDS with Guardium External S-TAP

Database Activity Monitoring with Guardium Data Protection is traditionally leveraged with a lightweight software agent installed on the database server at the OS layer. The agent, called S-TAP, is the responsible for collecting traffic that is the basis of Guardium reports, alerts, dashboards.

But what happens for those systems where we can’t or don’t want to install the S-TAP? Most typical examples are containerized databases or DBaaS where we don’t have the possibility to manage the operating system.

The answer is External S-TAP – a Guardium component that intercepts traffic for cloud and on-premises databases without the need of installing the agent on the database server. The External S-TAP acts as a proxy that forwards the traffic to the database server but also sends a copy of this traffic to a Guardium collector for traditional analysis, parsing and security policy application.

External S-TAP software is based on the S-TAP code, but it is packaged in a Docker image. Due to the nature of the component, two or more containers are required in the same External S-TAP deployment in order to assure the availability of the database connection. This is the reason for the presence of a load balancer as well, that is going to balance the traffic to the containers. Two deployment choices exist now:

  • With Kubernetes (Figure 1) – that will take care of issues like load balancing and Docker containers orchestration. Another big advantage of using Kubernetes is the possibility to deploy the External S-TAP directly from the Guardium user interface.
  • Without Kubernetes (Figure 2) – in this case the deployment is managed with scripts for both load balancer and External S-TAP itself. 
Figure 1
Figure 2

In this post we are going to use the first approach to describe how to setup an External S-TAP with Amazon Elastic Kubernetes Services for an Oracle RDS.

Prerequisites

Create an Amazon EKS Cluster

The EKS Cluster is a set of nodes where we will deploy the External S-TAP docker containers.

Once completed with all the prerequisites, configure the AWS credentials from the host where AWS CLI is installed and start creating the Kubernetes cluster.

Run aws configure commands and provide AWS Access Key ID, AWS Secret Access Key for your AWS account. Choose also the default region.

Figure 3

The cli commands to create Kubernetes clusters in AWS is eksctl create cluster. Run eksctl create cluster --help for all the possible flags to use with this command.

Use the following format in order to create a cluster with 2 t3.medium nodes.

eksctl create cluster \
--name <your cluster name> \
--region <aws region> \
--nodegroup-name <your nodegroup name> \
--node-type t3.medium \
--nodes 2 \
--nodes-min 1 \
--nodes-max 4

The cluster creation takes several minutes in order to complete. You can see the status directly from the command line (Figure 4) or from the AWS management console – Cloud Formation tab (Figure 5).

Figure 4
Figure 5

When completed, two new nodes will appear in EC2 tab of the AWS console. Those are the nodes where the External S-TAP will be deployed.

Run kubectl get svc to verify the cluster creation.

Run kubectl get nodes to list the cluster nodes.

By default, “eksctl create cluster” will build a dedicated VPC. For our configuration, this means the External S-TAP will be in a different VPC from Guardium Collector and database. Our components will be then using Public IPs for the communication.

If we don’t want to use Public IPs we can use VPC Peering service of AWS or directly create the Kubernetes cluster in the same VPC where the Guardium Collector is deployed (and also the database if applicable).

In order to use an existing VPC for the Kubernetes cluster, supply private and/or public subnets using --vpc-private-subnets and --vpc-public-subnets flags in eksctl create cluster command.

eksctl create cluster \
--name <your cluster name> \
--region <aws region> \
--nodegroup-name <your nodegroup name> \
--node-type t3.medium \
--nodes 2 \
--nodes-min 1 \
--nodes-max 4 \
--vpc-private-subnets <subnet_id1>, <subnet_id2>, <subnet_id3> \
--vpc-public-subnets <subnet_id1>, <subnet_id2>, <subnet_id3>

Run kubectl get svc to verify the cluster creation.

Run kubectl get nodes to list the cluster nodes.

Security Groups

An AWS Security Groups acts as a virtual firewall to control inbound and outbound traffic.

We need to add new inbound rules in order the allow the communication between our components.

From External S-TAP to database:

  • an Inbound Rule on the database security group in order to allow traffic from External S-TAP IP address on port 1521 (and port 2484 if SSL is used)
  • add all the IP addresses of all the External S-TAP pods

From External S-TAP to collector:

  • an Inbound Rule on the collector security group in order to allow traffic from External S-TAP IP address on port 16018
  • add all the IP addresses of all the External S-TAP pods

Regarding both cases above:

If using public IP addresses, the External S-TAP public IP are listed on the EC2 tab of the AWS console under IPv4 Public IP column. There is one IP for each node.

If using private IP addresses, after the External S-TAP deployment (next paragraph), on the “External S-TAP instances” page click on the External S-TAP, “Actions”, “View Details”. An IP address for each of the pods will be displayed.

Another option is getting the IP address from the command line:

  • run kubectl get pod to get the list of pods with names
  • run kubectl describe pod <pod name> to get the information about a single pod. The private IP address is under the IP parameter.

Deploy External S-TAP from Guardium UI

One of the big advantages of using Kubernetes is the easy External S-TAP deployment directly from the Guardium GUI (External S-TAP Control window).

The procedure to add a new External S-TAP is based on 5 different tabs where the Guardium administrator has to provide the requested input:

  • Kubernetes – information about Kubernetes deployment
  • Docker tab – Docker account and Docker image information
  • Database tab – database connection information
  • Guardium tab – Collector IP and how to balance/split traffic to different collectors
  • Advanced tab – is SSL is used for the database connection, some further actions with certificates are needed. This is described in the section “Configuring External S-TAP with SSL” at the end of this post.
Figure 6

In the next tables a summary on how/where to gather the requested information.

Kubernetes Tab 
Cloud ProviderAmazon
Master URLRun kubectl cluster-info.
Copy and paste the Kubernetes Master URL.
TokenCreate an EKS-admin account following step 3 of the guide at this link: https://docs.aws.amazon.com/eks/latest/userguide/dashboard-tutorial.html#w243aac39b9b5b3
Run kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep eks-admin | awk '{print $1}') 
Copy and paste the output token. 
Deployment NameChoose a name for the Kubernetes deployment.
NamespaceChoose a namespace.

Docker Tab 
Registry KeyCreate an Image Secret using Docker hub user account. For example: kubectl create secret docker-registry secret --docker-server=docker.io --docker-username=<your_user> --docker-password=<your_password> --docker-email=<your@email>
Run kubectl get secret
Copy and paste the name of the defined secret.
Locationstore/ibmcorp/guardium_external_s-tap
TagProvide the tag of the image. Current available Docker tags are: v11.2.0 v11.1.0 v11.0.0 v10.6.0

Database Tab 
Database TypeOracle
Database Port1512 (2484 if SSL is used)
Database HostThe hostname is on the “Connectivity & Security” tab of the Oracle RDS to monitor.

Guardium Tab 
Collector IPIP of the collector where the External S-TAP will forward the traffic. Depending on which VPC was used for the External S-TAP, this can be the public IP or the private IP.
Load Balancing0 –> No Load Balancing (default)
1 –> Split sessions between collectors
2 –> Duplicate traffic to all collectors
4 –> Split sessions between collectors (Multi-threading)

More info on the External S-TAP window here: https://www.ibm.com/support/knowledgecenter/en/SSMPHH_11.2.0/com.ibm.guardium.doc.stap/proxy/tab_deploy.html

Once all the information is provided, click apply and the External S-TAP will be created. After some seconds, refresh the page and a new instance will appear on the list.

Figure 7

To verify the correct creation of the deployment, run kubectl get deployment.

To list the pods, run kubectl get pod.

To see detailed information about one single pod, run kubectl describe pod <pod_name>. This last command will show the configuration setting of the External S-TAP and also the logs if something is not working (for example lack of communication with the database).

Verify External S-TAP

When the External S-TAP is successfully deployed, database clients must connect to the load balancer and not to the database server anymore. The load balancer will balance the traffic between the External S-TAP containers that will perform two actions:

  • proxy action forwarding the traffic to the database;
  • S-TAP action capturing the traffic and forwarding it to the Guardium collector.

The database connection must be modified in order to use the load balancer external IP – run kubectl get svc command to get this information.

Figure 8

Modify the connection and connect to the load balancer.

Verify that the client successfully connects.

Run some test traffic.

Figure 9

Verify that the traffic is reported on the Collector.

Figure 10
Figure 11

Configuring External S-TAP with SSL

When Secure Sockets Layer (SSL) is used for the database connection, additional steps are requested to configure the External S-TAP.

First of all, we need a certificate signed by a CA.

Create a certificate signing request (CSR) for each External S-TAP with Guardium CLI account.

create csr external_stap

Enter an alias for the certificate (it can be anything).

Enter the CN for the certificate. If CN match is required for the database, enter the hostname of the database. CN match is not required for Oracle database.

Provide all the information requested for the CSR.

When CSR is generated, take note of alias and token. These values will be used for the configuration.

Also, copy and paste the CSR to a file (for example request.csr) starting at the “—–BEGIN NEW CERTIFICATE REQUEST—–” tag and ending at the “—–END NEW CERTIFICATE REQUEST—–” tag.

Send the CSR file to a Certificate Authority (CA) of your choice in order to obtain a valid certificate. To import the certificate into the Guardium appliance, it must be in PEM format.

Figure 12

When the CSR is signed and the certificate in .pem format is available, store the certificate of the CA first:

store certificate keystore_external_stap

Provide the alias and the content of the certificate (starting from “—–BEGIN CERTIFICATE—–” and ending with “—-END CERTIFICATE—–“).

Store the signed External-STAP certificate:

store certificate external_stap

Insert the alias defined during the CSR creation.

Insert the content of the signed certificate (starting from “—–BEGIN CERTIFICATE—–” and ending with “—-END CERTIFICATE—–“).

Once the certificate is successfully imported in the collector, follow the “Deploy External S-TAP from Guardium UI” paragraph to deploy the External S-TAP.

Only an additional step is required – in the “Advanced” tab insert that is the token generated during the CSR creation. If the token wasn’t saved it can be still retrieved running show certificate external_stap from CLI.

Figure 13

Verify the correct deployment with the commands already reported above in other paragraphs:

kubectl get deployment

kubectl get pod

kubectl describe pod <pod_name>

Modify the client database connections in order to connect to the load balancer external IP on the SSL port (2484).

Verify that the connection is working and the traffic is correctly reported to the Guardium collector.

Figure 14

Conclusions

DBaaS are becoming more and more popular nowadays, but the “old-fashioned” requirement of monitoring privileged users and reporting on database activities is still there. External S-TAP is a component that provides this capability with a consolidated tool like Guardium Data Protection but maintaining the traditional Guardium architecture. In fact, the same Guardium deployment could be used to monitor activities in a hybrid environment composed by both on-premises and on-cloud databases.

External S-TAP deployment with Kubernetes is a straightforward task that can be addressed directly from the Guardium UI. This is one of the main differences with the manual Docker deployment.

Kubernetes will take care of the containers orchestration and will also manage the load balancer which is the component where the applications/clients will connect to.

External S-TAP vs Native Logging/DAS for DBaaS

It’s important to remark that Guardium is able to monitor some DBaaS with other options like Native Logging (for Oracle) or DAS (Data Activity Streams) services provided by the cloud vendors (here a post on how to setup DAS on AWS for Guardium: https://guardiumnotes.wordpress.com/2020/03/).

The big advantage of External S-TAP is that acts as a traditional S-TAP agent and so it is able to better capture and inspect database traffic. Things like records affected by a SQL statement, extrusion rules, failed logins or SQL exceptions are not captured with Native Logging or DAS.

External S-TAP provides a complete and mature approach to database activity monitoring. Furthermore, the External S-TAP supports much more DBaaS than the other monitoring mechanisms which are strictly dependent to what cloud and database vendors provide. The IBM Guardium lab have added new databases to the list in the last releases –https://www.ibm.com/support/pages/node/5736879#external_s-tap). Note that the External S-TAP approach is available also for on-premises databases and not only DBaaS.

Although External S-TAP is much more similar to the Guardium traditional monitoring approach, it is also more invasive than Native Logging or DAS. In fact, a proxy is introduced so that applications and DB clients connections must be re-configured in order to connect to the load balancer.

With the other options, DAS or Native Logging, no changes to the connections are needed – we just need to enable these features and have Guardium as a consumer of the database activity monitoring information.

How to deploy a Guardium Data Protection collector on AWS

IBM Security Guardium offers data activity monitoring and other capabilities on several public cloud platforms. An important feature is the possibility to monitor DBaaS where the STAP agent installation is not possible – the monitoring is achieved with an External STAP or with the native streaming provided by the cloud vendor. Other cloud-related capabilities are the discovery/classification and vulnerability assessment scans for databases on cloud.

Also, the whole Guardium Data Protection (GDP) infrastructure can be built on cloud – it is possible to deploy collectors and aggregators directly on cloud instead of having them in traditional on-premises datacenters.  

This approach is particularly interesting when the target databases to monitor are deployed on cloud as well (for example DBaaS or databases in IaaS) as we can see in the example of the picture below.

This deployment is available for the major public cloud platforms:

  • AWS
  • Azure
  • Google Cloud Platform
  • IBM Cloud
  • Oracle Cloud Infrastructure

The deploy process is not the same for all platforms – at the next link there is a guide for each of the vendors:

https://www.ibm.com/support/pages/node/608329

In this post it is described the step by step process in order to build a Guardium collector on Amazon AWS. The procedure is based on the detailed reference guide that you can find at the following link:

https://www.ibm.com/support/pages/sites/default/files/inline-files/$FILE/IBM%20Security%20Guardium%20Cloud%20Deployment%20-%20AWS_0.pdf

Create the instance on AWS

Guardium instances can be deployed on AWS in one of two ways. You can deploy either from the marketplace or from Guardium specific Amazon Machine Images (AMIs). We will focus on the second approach based on AMIs.

The Guardium AMIs are publicly listed and accessible with an AWS account. Once logged in with the AWS account, browse to EC2 service menu. On the left column click on “AMIs” under “Images”.

Select “Public images” from the search bar and search for “guardium”. A list of the available images will appear for different versions. For the purpose of this article, we will choose the v11.1 collector.

Click “Launch” to start with the installation wizard.

The first thing to choose is the instance type with the number of CPUs and RAM needed for the appliance. Here is a link to the technical requirements for Guardium: https://www.ibm.com/support/pages/node/5736891.

For a collector, a minimum of 4 CPUs and 24GB RAM is required.

For this example, the m4.2xlarge instance has been selected (8 CPUs and 32 GB RAM).

Click “Next: Configuration details”.

In this page we can configure the network details for the appliance. Select the VPC and the subnetwork where you want to build your appliance. It is also possible to provide a Public IP to the appliance in order to access it from internet. In this example the Public IP is enabled because the appliance will be managed from a remote workstation.

Click “Next: Add Storage”.

In this page we are going to configure the storage for the appliance. A minimum of 500GB is required. If less than 500GB are selected an error message will appear at the end of the wizard.

Click “Next: Add Tags”.

A tag can be optionally defined. This step is skipped in this guide.

Click “Next: Configure Security Group”.

A security group is a set of firewall rules that control the traffic for your instance. Assign a name and description to the group or use an existing one.

We will add two rules in the new group:

  • Allow SSH traffic on port 22 from “My IP” (aka the workstation from where the instance was created) for the CLI access;
  • Allow TCP traffic on port 8443 from “My IP” for GUI access.

The source from where we can access the appliance can be any other IP address or also all IP addresses (0.0.0.0/0). It’s strongly suggested to allow access only from known IP addresses.

If other ports have to be opened, here is a list of port requirements for Guardium Data Protection:

https://www.ibm.com/support/pages/guardium-v100101101210131014105106-and-v909195-open-ports

“Click: Review and Launch”.

Review all the configurations and click “Launch”.

A key pair is needed for SSH access. A key pair consists of a Public Key (stored by AWS) and a Private Key (stored by the AWS account owner). The Private Key will be downloaded and then used for SSH to the appliance.

Choose an existing key pair or create a new one. If creating a new one, assign a name and download the private key.

Click “Launch Instances”.

Click on the instance ID to be redirected to the instances list.

Take note of the instance ID because it is the default password for all the GUI default users:

  • admin
  • guardium
  • accessmgr

After a while the Instance State and the Status Check will be both green. The collector is now up and running.

Check the connection to the GUI. In a browser, go to https://INSTANCE_PUBLIC_IP:8443

Check the SSH connection using the private key previously created. Use “cli” account. No password will be asked.

License and network settings

As for any Guardium appliance, we need to add valid licenses in order to use the solution.

Login to the GUI (at the first login you are asked to change the default password) and from the left menu go to “Setup” – “Tools and view” – “License”.

Add at least one base license (that identifies the type of appliance) and one append license (that identifies the functionalities to enable).

A last important step is the network configuration with the traditional CLI network commands. This configuration is about the private network settings automatically created by AWS and not about the Public IP (that is optional). Private network settings are configured with the following CLI commands:

store net interface ip <instance-ip>
store net interface mask <netmask>
store network route defaultroute <default-router-ip>
store network resolver <resolver-ip>
store system hostname <instance-hostname>
store system domain <instance-domain>
restart network

The private instance IP can be found on AWS under the instance description.

The private network mask can be found on AWS under the instance description, clicking on the subnet ID and look for the IPv4 CIDR.

The default gateway IP can be found running the next commands with cli:

show network routes operational 

The gateway IP is displayed in the first row of the output under the column “Gateway”.

Hostname and domain can be found on AWS under the instance description.

Run “restart network” with cli.

After the restart you are ready to start working with your new collector deployed directly on AWS!

GDE installation guide

Guardium Data Encryption installation cookbook.

– Software Access – 00:12
– Appliance setup – 04:30
– Failover setup – 10:29
– User management – 14:58
– Agent installation – 20:54
– Web certificates management – 28:57
– Backup & Restore – 32:42
– Host assignment – 38:38
– DSM upgrade – 42:42
– Agent update – 47:56
– Agent deinstallation – 50:21

Published on GuardiumNotes Youtube channel – GDE installation guide

 

Guardium enhancements review 10.1.3-10.6

In addition to simplifying the process of defining report content and its appearance, we can immediately build a datamart and place it in the indicated dashboard.
New Report builder is also available to manage reports based on Custom domains.

The rule setup screen scared every new user. Identifying the correct field and understanding what was defined required a lot of experience.
Simplifying the process of rule creation is the main value of changes in the new version of Policy Builder. The division of criteria into three categories (session, SQL and other) and displaying only active ones allows you to understand the logic of the rule in the blink of an eye.

Edit Rule screen

The management of the order of rules and their interaction has also been simplified. In one window we can manipulate, copy and import rules in the policy and quickly determine whether it is necessary to analyze a given situation by several roles.

Rules in policy

Group Builder (10.5)

Managing groups is one of the basic administrative tasks that allow you to properly monitor and report activity on protected resources. In the new Group Builder, we can find a specific group in a transparent way and identify its meaning in our installation (where the group is used, source of collected data, how many records it contains).

Group list

The most anticipated improvement in group management is the ability to easily filter and modify content. Especially after the Policy Builder version 10.6 upgrade, we get one coherent interface in which managing reference data is a simple task.

Group content

Agent Management by GIM (10.1.4)

The new GIM management interface has been described in my K-TAP video and significantly increases the agent management effectiveness.

In version 10.6 we will find some improvements like preview of installed modules, identification of lack of communication with the agent during configuration of the module, update of progress information.


GIM is now a great tool for managing hundreds of agents without need to get direct access to managed system.

Simple Agent Deployment (10.1.3)

The ability to pre-install the GIM agent in large environments (Listener mode) is extremely valuable. The new “Deploy Monitor Agents” feature also reduces the time it takes to install agents.

Simple Agent Deployment

It is useful tool, however, has a limitation related to the inability to indicate the SSL certificates if we decide to use our own. 😦

I assume that we will have possibility to install other modules (CAS, FAM) and create configuration templates soon.

Simple Compliance Monitoring (10.1.3)

This dashboard provides possibility to setup and present results in the unified view the status of our datasets against our compliance requirements (like GDPR, PCI, SOX).

Compliance Monitoring

Sound good but especially for compliance the polices, reports and audit have to be customized case by case.

The configuration requires customization of classification process, monitoring policy and reports to be real compliance dashboard. I like it but still improvements needed to have the possibility to:

  • assign custom policy instead of the modification permanently assigned
  • assign custom classification process instead the modification the assigned one
  • define list of reports assigned to compliance dashboard

This dashboard assumes that compliance settings have global character for all customer data sources what cannot be true in many cases.

Guardium Application Ecosystem

The ability to create additional functionalities for the system by users or business partners without the need to integrate with the vendor’s development process is a genius idea. The success of the App Exchange platform in QRadar (IBM SIEM solution) resulted in its implementation in other ones, including the Guardium.

App Exchange Portal

The developed application is isolated from the system itself within the container (Docker). The programming platform is Flask – Python based web microframework.
Communication with the system takes place through a rich Guardium REST API that implements most of the functions available in the standard API available through cli.

Application management in the Guardium itself amounts to installing (file import), determining access for roles and launching the developed application.
The entire ecosystem can be fully controlled from the Central Manager level.

Application Management on CM

Creating an application does not require a lot of programming experience. Technologies used in the solution allow you quickly and efficiently implement “missing” system functions.
However, it should be remembered that each application operates in an independent container, therefore using them imposes more requirements on the appliance – a minimum 32 GB of RAM (if Quick Search is switched on).

Here my first Guardium application screen (I am working on separate article focused on this only)

GN Tools Application

Cloud support

DAM solutions appeared on the market when the idea of ​​data storage in the cloud was only the subject of scientific dissertations. Within a dozen or so years, the situation has changed dramatically and the requirement to support processing outside local data centers becomes a requirement and not an addition to the solution.
Each subsequent version of Guardium brought something new in this matter.
Guardium architecture allows to implement it inside IaaS services where the user has access to the operating system and it is possible to install S-TAP. However, data transmission from the cloud to local appliance was ineffective so now we have access to pre-installed appliances in the Amazon, Azure and Softlayer clouds (10.1.3).
However the real challenge for monitoring solutions is the SaaS infrastructure where the service provider administers the data silo without the possibility of installing additional services.
The solution can be the consumption the native audit logs provided by the service or engine by DAM system. What has been introduced for AWS RDS Oracle 11 and 12 in Guardium 10.1.4. However it stresses another problem of support a completely new format of session information for something which works perfectly by parsing SQL syntax. Logs format, scope of information can be changed by provider without possibility to control this stream and elevates huge problem for any DAM vendor.

So in the latest Guardium release we have got the External S-TAP. The proxy solution which assumes that sessions to unmanaged data nodes will be redirected to new Guardium service by load balancer. This approach simplifies implementation because we do not need analyze logs and base on standard session interception.

The External S-TAP is distributed as a Docker solution and can be downloaded from dockerhub

2018-12-20_16-00-43

The customer can decide how will be it implemented:

  • External S-TAP on premise and reroutes all request to cloud service
  • External S-TAP as cloud service and reroutes request inside cloud infrastructure
  • External S-TAP on premise intercepts traffic to local databases without necessity to install S-TAP on monitored system

This kind of approach opens possibility to use it in different situation not only for cloud services but also as a platform to extend functionality for new data silo based on self-developed parsers.

This initial release supports MSSQL and Oracle on premise and AWS Oracle and Azure MSSQL for cloud. I believe that very soon this technology will be opened to support more platforms and armored by SDK.

Platform support enhancement

  • New releases brought support for new versions of Oracle, MSSQL, Teradata, MongoDB, Cassandra, MariaDB, MemSQL
  • Simplification of Cloudera Navigator monitoring with auditing events directly from Kafka node
  • Vulnerability Assessment supports Cloudera now
  • The FAM is MS Office documents aware and reduces noise related to I/O operation related to them
  • SharePoint support – 3rd party agent for SharePoint monitoring and data classification
  • NAS support – 3rd party agent to classify and monitor access to data on Hitachi, NetApp, EMC, Dell EMC devices
  • Simplification of ATAP management (including possibility to avoid service restart in case of STAP upgrade)
  • Teradata monitoring based on EXIT

Other enhancements

VA multi-threading (10.6)

Now many instances of classification and vulnerability assessment tasks can be executed at this same time – however be aware that it requires a lot of resources. So I suggest setup the separated aggregator which will be dedicated to this job.

CLI and public key authentication (10.5)

Finally we can login to cli using certificates instead of simple password based authentication

Session-level policies (10.6)

Standard policy evaluation requires SQL body identification (command, objects) to be available for condition inside rule even the decision process does not require this kind of analysis because the session information only is sufficient (user, ip address, network protocol).

To speed up policy evaluation in these situations now we have completely new type of polices – session-level.Session Level PolicyWe can create policy in GUI or cli using special “Session Level Rule Language Grammar”.

The policy focuses only on session information

Session-Level policy rule

and uses completely new set of actions including information transformation (can be very tricky).

Session-level rule actions

The session-level policy is still evaluated on collector (by sniffer) but the decision about logging, switching to open mode from close one can be returned much more faster.

There is possibility to install together standard and session-level policy.

New Firewall  mode (10.6)

The new FIREWALL_DEFAULT_STATE=2 value integrates the decision flow with WATCH and UNWATCH flags from Session-Level policies. So the decision about excluding of application sessions from blocking analysis approach can be made faster and smarter way.

Enterprise Load Balancer

ELB update allows to create failover groups to switch STAP only in the defined scope of appliances. It provides solution for large environments with geographically scattered Guardium domains.

Summary:


Last four Guardium releases introduce a lot of new functionalities to make this system simpler in use, be prepared for customer transition into a cloud and create system open to extend in the directions stressed by their users.