DCAP Central Architecture

DCAP Central is a system for storing, managing and providing access to the IBM InfoSphere Guardium Database Activity Monitoring (DAM) system (called Guardium in the documents that follow).

DCAP Central is a Big Data system that uses the SonarW NoSQL Data Warehouse to store data that is extracted from Guardium collectors. DCAP Central allows you to store large amounts of Guardium data in one place - thus eliminating the need for complex aggregation processes and allowing you to centralize data from hundreds of Collectors and for long periods of time in one place. Because the data is stored in a best-of-breed data warehouse, reports and analytics run fast and the data can be used for multiple purposes.

DCAP Central includes the following components:

  • The SonarW NoSQL Data Warehouse.
  • The SonarCollector ETL layer and specific Guardium ETL algorithms.
  • The DCAP Central GUI.
  • The SonarK discovery GUI (based on Kibana).
  • SonarSQL, providing SQL access to Guardium data stored within SonarW.
  • JSON Studio providing a GUI for advanced analytic query building and visualization.
_images/sonarg_arch.png

DCAP Central is a software package that is installed on a RHEL Linux server. DCAP Central can run on a physical server or as a virtual machine. DCAP Central can be installed as the only application on the server or co-located with other applications. However, due to the nature of the DCAP Central Big Data workloads, DCAP Central is a resource-intensive application and consumes all resources available to it - compute, memory and I/O. It is therefore recommended to run DCAP Central on its own server.

DCAP Central receives data from Guardium collectors through an SCP process of compressed extraction files. These files are produced by the collectors and the mechanism is supported for Guardium versions 9.x and 10.x. If you are running version 9.5 collectors you need to install the IBM data extraction patch 609 (or a cumulative later patch). Consult your DCAP Central account manager for the precise IBM patch required. Guardium 10 has built in support for producing these extract files.

Data coming from Guardium Collectors is copied to the DCAP Central server where it is processed using a Guardium-specific ETL process before it is inserted into SonarW. When you configure data extraction from Guardium collectors you specify a hostname where the extract files should be copied to. This host can be the DCAP Central host or a separate host which will serve as the staging area for the extract files (from which DCAP Central ETL will copy the files). It is recommended that the collectors copy the files directly to the DCAP Central server to prevent an additional and unnecessary copy.

Collectors produce and copy files on an hourly basis. The DCAP Central ETL process runs continuously and ingests these extract files on an ongoing basis. Data is therefore available in DCAP Central with a lag not longer than ~60-75 minutes.

Once the data is in SonarW, various tools provide access to the Guardium data. These include a DCAP Central custom-built reporting layer, JSON Studio for building queries, reports and visualizations directly over the Guardium data, a Web Services layer and a SQL layer. All these are installed on the DCAP Central server as part of the DCAP Central installer.

System Sizing

A single DCAP Central node is usually used for up to 30TB of compressed Guardium data. You can store more than 30TB on a single node and reporting times may still be reasonable but you can also cluster multiple DCAP Central nodes to provide faster response times. Consult your DCAP Central account manager for additional sizing guidelines.

Each DCAP Central node should have the following specs:

  • Two Intel Xeon processors, at least 6 cores per socket, at least 2.4Ghz each.
  • At least 64GB of memory.
  • Either HDD or SSD drives. In both cases, and especially when using HDDs, the drives should be striped using RAID0 or RAID10. For example, you can choose to use SATA drives. In this case you should create a single RAID array using at least four such disks to give you the ability to read at a rate nearing 500MB/s. The system has been optimized to allow leveraging low-cost SATA drives in order to achieve the most cost-effective large data store using inexpensive drives.
  • At least one SSD of size ~400MB used for temp storage for SonarW.

If you are deploying DCAP Central on an Amazon AWS EC2 instance, DCAP Central recommends using a m4.4xlarge instance or an m4.10xlarge instance if workloads are expected to be very large. An io2 EBS volume with at least 10K-12K PIOPS is recommended since this will allow you to grow the volume as your data size grows with no changes to the DCAP Central application or to RHEL. If you choose general purpose EBS then you should use a RAID0 configuration.

When using virtual machines (VMs), the recommended minimum production configuration is 8 vCPUs and 64GB RAM. You must work with your VM administrator to provision enough IOPS dependant on your loads and data volumes. A VM machine with 32GB RAM and 6 vCPUs can be used for a POC host, with the understanding that the performance will not be optimal.

If you are deploying DCAP Central on a machine that has between 96GB to 128GB of RAM, set the parameter block_allocation_size_percentage to 33. This will take advantage of the memory available. If you are deploying on a machine that has 128GB of RAM or more, set block_allocation_size_percentage to 50.

Limits

A single DCAP Central system maintains up to 10 Trillion distinct sessions per collector. If there is a Guardium system that will feed more than 10 Trillion sessions then older sessions will be deleted and the newer sessions will be maintained.