Hadoop Activity Monitoring

DCAP Central is built to receive activity information from the three main Hadoop distributions - Cloudera, Hortonworks and MapR. Contrary to DAM solutions that try to intercept communications and thus generate data that is noisy and cluttered with internal messages that do not map well to user activity, DCAP Central uses the internal built-in mechanisms provided by each of the distributions for activity monitoring and auditing.

For all three distributions collected data becomes a part of the standard DAM collections such as instance and session and thus all standard reports work well with Hadoop activity. Additionally, tools like SonarK can be used to review this data.

Integrating Cloudera with DCAP Central

DCAP Central consumes audit data from Cloudera Navigator over syslog. To integrate the systems you need to:

Configure your Cloudera cluster to turn on auditing for the desired services as specified in https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cn_iu_audit_log.html#xd_583c10bfdbd326ba–6eed2fb8-14349d04bee–7d6f__section_ypn_5tc_pr or the relevant document for your version of Cloudera.

Configure auditing to use syslog as specified in https://www.cloudera.com/documentation/enterprise/5-5-x/topics/datamgmt_audit_publish.html#concept_bpk_rfc_dt__section_gnx_sf3_m4 or the relevant document for your version of Cloudera. Port 10519 is just a default port used for the integration and you can use any port used for the Cloudera stream on your DCAP Central box.

The appender config should look like:

log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.SyslogHost=<your SonarG hostname>:10514
log4j.appender.SYSLOG.Facility=Local2
log4j.appender.SYSLOG.FacilityPrinting=true
log4j.logger.auditStream=TRACE,SYSLOG
log4j.additivity.auditStream=false
  1. Update your firewall settings to allow traffic over port 10514 from the Cloudera cluster to your DCAP Central node on port 10514.
  2. On DCAP Central remove the comment from the cloudera line in /etc/rsyslog.d/sonargateway.conf and restart the rsyslog service.

If the DCAP Central machine is external to your corporate network, please see Creating Secure SonarGateway Connections in the SonarGateway documentation.

Integrating Hortonworks with DCAP Central

Install the Ranger component and configure Ranger to monitor HDFS, Hive and HDFS:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_security/content/installing_ranger_using_ambari.html

Once Ranger is configured, configure the HDFS, Hive and HBase components to also send audit events to log4j and configure the log4j target to send events to syslog.

For each service, make the following changes in Ambari by selecting the service , clicking on “Configs” and then on “Advanced” tab. Replace <sonargateway-host-address> with the address of the machine running sonargateway or the sonar gateway. After making the changes and saving them, restart the service.

Update your firewall settings to allow UDP traffic from the Hortonworks cluster to your DCAP Central node on port 514.

If the DCAP Central machine is external to your corporate network, please see Creating Secure SonarGateway Connections in the SonarGateway documentation.

  • HDFS
  1. In section “Custom ranger-hdfs-audit” :
xasecure.audit.destination.log4j=true
xasecure.audit.destination.log4j.logger=xaaudit
xasecure.audit.log4j.is.enabled=true
  1. In section “Advanced hdfs-log4j” add :
log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.SyslogHost=<sonargateway-host-address>
log4j.appender.SYSLOG.Facility=Local2
log4j.appender.SYSLOG.FacilityPrinting=true
log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
log4j.appender.SYSLOG.layout.ConversionPattern=%m%n
log4j.logger.xaaudit=INFO,SYSLOG
  • Hive
  1. In section “Custom ranger-hive-audit”:
xasecure.audit.destination.log4j=true
xasecure.audit.destination.log4j.logger=xaaudit
xasecure.audit.log4j.is.enabled=true
  1. In section “Advanced hive-log4j” add:
log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.SyslogHost=<sonargateway-host-address>
log4j.appender.SYSLOG.Facility=Local2
log4j.appender.SYSLOG.FacilityPrinting=true
log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
log4j.appender.SYSLOG.layout.ConversionPattern=%m%n
log4j.logger.xaaudit=INFO,SYSLOG
  • HBase
  1. In section “Custom ranger-hbase-audit”:
xasecure.audit.destination.log4j=true
xasecure.audit.destination.log4j.logger=xaaudit
xasecure.audit.log4j.is.enabled=true
  1. In section “Advanced hbase-log4j” add:
log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.SyslogHost=<sonargateway-host-address>
log4j.appender.SYSLOG.Facility=Local2
log4j.appender.SYSLOG.FacilityPrinting=true
log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
log4j.appender.SYSLOG.layout.ConversionPattern=%m%n
log4j.logger.xaaudit=INFO,SYSLOG

Integrating MapR with DCAP Central

DCAP Central consumes audit files generated by MapR. To integrate the systems you need to:

  1. Enable auditing for volumes, tables etc. within MapR; refer to MapR documentation on how to enable various levels of auditing on your cluster and its’ services (e.g. http://maprdocs.mapr.com/home/SecurityGuide/EnablingAuditing.html). You can either do this from the MapR administration GUI or through the cli.

For example, to enable logging for cluster management operations:

maprcli audit cluster -enabled True

To enable filesystem and table operations logging:

maprcli audit data -enabled True

To enable logging per volume:

maprcli volume audit -name <volume> -enabled true <options>

To enable full auditing for all volumes for example do:

for volume in $(maprcli volume list \
                -columns volumename | tail -n +2); do
  maprcli volume audit -name $volume -enabled true -coalesce 1 -dataauditops +all
done

To enable logging per directory/file/table:

hadoop mfs -setaudit on <directory|file|table>
  1. Use MapR’s NFS gateway and mount the MapR file system on NFS. Install this on the node that will send data to DCAP Central. Consult MapR documentation for details (e.g. http://maprdocs.mapr.com/home/AdministratorGuide/c_POSIX_loopbacknfs_client.html).
  2. Install sonar-mapr-agent on one of the nodes inside the cluster.
  3. On the machine used in step #2

4.1 Install sonar-mapr-gateway.

4.2 Run these commands from a terminal, Replace 1.2.3.4 with the IP address of the DCAP Central instance:

$ sudo bash

$ find /etc/rsyslog.d/sonargatewayforwarder -name “*.conf” | xargs sed -i -e “s:SONARG_IP:1.2.3.4:g”

4.3 Make sure that you have network access to the DCAP Central machine.

  1. On DCAP Central remove the comment from the mapr line (typically port 10530) from /etc/rsyslog.d/sonargateway.conf and restart the rsyslog service.

The data that comes from MapR is inserted into the session, instance and exception collections with a Server Type of “mapr” and displayed as native data in all reports and tools, for example:

{
    "SonarG Source": "ip-10-0-0-211",
    "VolumeName": "mapr.var",
    "dstFid": "2053.35.131206",
    "dstPath": "/var/mapr/local/ip-10-0-0-83.us-west-1.compute.internal",
    "Analyzed Client IP": "10.0.0.83",
    "Service Name": "mapr.jsonar.com",
    "mapr_log_file": "/var/log/mapr/mapr.jsonar.com/var/mapr/local/X/Y/Z.json",
    "Server Host Name": "ip-10-0-0-83.us-west-1.compute.internal",
    "operation": "LOOKUP",
    "srcFid": "2053.32.131200",
    "srcName": "ip-10-0-0-83.us-west-1.compute.internal",
    "srcPath": "/var/mapr/local",
    "status": 0,
    "Period Start": "2017-09-27T15:43:25.000+0000",
    "uid": 5000,
    "DB User Name": "mapr",
    "volumeId": 133173978,
    "Client Host Name": "ip-10-0-0-83.ec2.internal",
    "Database Name": "mapr.jsonar.com",
    "Failed Sqls": 0,
    "Instance Id": null,
    "OS User": "mapr",
    "Objects And Verbs": "mapr.var//var/mapr/local/X.internal LOOKUP",
    "Server IP": "10.0.0.83",
    "Server Type": "mapr",
    "Successful Sqls": 1
  }