Skip to content

Quickstart Guide

borischen edited this page Jan 15, 2013 · 22 revisions

Overview

This guide will show how to setup a single-node combined Splunk Hadoop instance using Shuttl. You should be up and running in less than 15 minutes.

Reference

System Requirements and Environment

This works for Linux. For other platforms, you are on your own.

For full system requirements see: https://github.com/splunk/splunk-shuttl/wiki/System-Requirements

Be sure that JAVA_HOME and SPLUNK_HOME are set appropriately before starting up Splunk.

You can run against an existing instance, as long as it is wire-level compatible with Apache 1.1.1. HDP1 will work. (CDH3 will not work without additional steps. https://github.com/splunk/splunk-shuttl/wiki/System-Requirements) Anything Hadoop 0.23 or 2.0 based has not been tested.

Downloads

Splunk 5.0.x: http://www.splunk.com/download

Hadoop 1.1.1: https://www.apache.org/dyn/closer.cgi/hadoop/common/

Shuttl:

Install Splunk and Hadoop

To install Hadoop: http://hadoop.apache.org/common/docs/r1.0.4/single_node_setup.html (run in Pseudo-Distributed mode)

To install Splunk: http://docs.splunk.com/Documentation/Splunk/latest/Installation/InstallonLinux (easiest path is untar, and do "./splunk start")

Install Shuttl

  1. untar shuttl.tgz in $SPLUNK_HOME/etc/apps
  2. SPLUNK: go to Manager » System settings » General settings
  3. Note your servername
  4. cd $SPLUNK_HOME/etc/apps/shuttl/conf
  5. Edit archiver.xml to have your Splunk servername (replace the string myserver in the serverName block)
  6. Notice the path for localArchiverDir in archiver.xml and ensure you have enough space at that location to handle the Splunk buckets that will be transferred
  7. Edit splunk.xml with the correct admin credentials, and make sure you have the right admin port (usually 8089 by default)

For a full description of configuration values, see the README.md file

Configure server.xml

httpHost: The host name of the machine. (usually localhost)

httpPort: The port for the shuttl server. (usually 9090)

Configure Splunk Index

We'll configure an index mytestdb to be managed by Shuttl.

First, let's create an indexes.conf to edit:

./splunk stop -f
mkdir $SPLUNK_HOME/etc/apps/shuttl/local
cd $SPLUNK_HOME/etc/apps/shuttl/local
cp $SPLUNK_HOME/etc/apps/shuttl/default/indexes.conf $SPLUNK_HOME/etc/apps/shuttl/local

Next, configure local/indexes.conf by appending as follows

# data retained for 5 minutes
# Approx 128MB sized buckets
# Max index at 1GB
[mytestdb]
homePath = $SPLUNK_DB/mytestdb/db
coldPath = $SPLUNK_DB/mytestdb/colddb
thawedPath = $SPLUNK_DB/mytestdb/thaweddb
rotatePeriodInSecs = 10
frozenTimePeriodInSecs = 120
maxWarmDBCount = 1
maxDataSize = 32
maxTotalDataSizeMB = 64
warmToColdScript = $SPLUNK_HOME/etc/apps/shuttl/bin/warmToColdScript.sh
coldToFrozenScript = $SPLUNK_HOME/etc/apps/shuttl/bin/coldToFrozenScript.sh

NOTE: in previous versions the index name would be passed to the coldToFrozenScript.sh, this is now ignored. Note also we have a new script, warmToColdScript that will move data to the backend earlier than during the frozen event. Things in the cold bucket can now be simultaneously archived, and available for search via Splunk.

With the above, in order to see data in HDFS, you need to either:

  • have more than 64mb of data
  • and have data older than 2 minutes

In production, better settings would be (24 hour retention, and 10gb max size):

frozenTimePeriodInSecs = 86400
maxWarmDBCount = 3
maxDataSize = 1024
maxTotalDataSizeMB = 10240
rotatePeriodInSecs = 60

See: http://docs.splunk.com/Documentation/Splunk/latest/admin/Indexesconf

Configure an Input

  1. ./splunk start
  2. SPLUNK: Go to manager, Manager » Data inputs » Files & directories » Add new
  3. Create a file input to monitor a directory, with mytestdb as the index (be sure to set the index correctly, else the data will go to the default! You'll find it under "More Settings.")

Enable Shuttl

  1. SPLUNK: Manager » Apps
  2. Click Enable to enable the app
  3. SPLUNK: Manager » Server controls
  4. Click Restart to restart server (this may not be necessary, but just in case)

Test Shuttl Connectivity

  1. cd to $SPLUNK_HOME/etc/apps/shuttl/bin
  2. ./testArchivingBucket.sh

There should be no errors.

Add Data

  1. Toss data into the monitored area

Verify

Splunk: From search bar, for All Time:

index=mytestdb * You should see data in the index. If you don't, you didn't configure the input or index correctly.

After 5 minutes. Check Hadoop.

Hadoop:

./hadoop fs -ls /archive_root/archive_data/cluster_name/myserver/mytestdb Where "myserver" is your Splunk servername.

You should see buckets listed. If not, HDFS may not be accessible at hdfs://localhost:9000