-
Notifications
You must be signed in to change notification settings - Fork 17
Quickstart Guide
This guide will show how to setup a single-node combined Splunk Hadoop instance using Shuttl. You should be up and running in less than 15 minutes.
This works for Linux. For other platforms, you are on your own.
For full system requirements see: https://github.com/splunk/splunk-shuttl/wiki/System-Requirements
Be sure that JAVA_HOME and SPLUNK_HOME are set appropriately before starting up Splunk.
You can run against an existing instance, as long as it is wire-level compatible with Apache 1.1.1. HDP1 will work. (CDH3 will not work without additional steps. https://github.com/splunk/splunk-shuttl/wiki/System-Requirements) Anything Hadoop 0.23 or 2.0 based has not been tested.
Splunk 5.0.x: http://www.splunk.com/download
Hadoop 1.1.1: https://www.apache.org/dyn/closer.cgi/hadoop/common/
Shuttl:
- VIA SPLUNKBASE: http://splunk-base.splunk.com/apps/58003/shuttl
- VIA BUILD: see the directory build after building Shuttl, you should see shuttl.tgz (https://github.com/splunk/splunk-shuttl)
To install Hadoop: http://hadoop.apache.org/common/docs/r1.0.4/single_node_setup.html (run in Pseudo-Distributed mode)
To install Splunk: http://docs.splunk.com/Documentation/Splunk/latest/Installation/InstallonLinux (easiest path is untar, and do "./splunk start")
- untar shuttl.tgz in $SPLUNK_HOME/etc/apps
- SPLUNK: go to Manager » System settings » General settings
- Note your servername
- cd $SPLUNK_HOME/etc/apps/shuttl/conf
- Edit archiver.xml to have your Splunk servername (replace the string myserver in the serverName block)
- Notice the path for localArchiverDir in archiver.xml and ensure you have enough space at that location to handle the Splunk buckets that will be transferred
- Edit splunk.xml with the correct admin credentials, and make sure you have the right admin port (usually 8089 by default)
For a full description of configuration values, see the README.md file
httpHost: The host name of the machine. (usually localhost)
httpPort: The port for the shuttl server. (usually 9090)
We'll configure an index mytestdb to be managed by Shuttl.
First, let's create an indexes.conf to edit:
./splunk stop -f mkdir $SPLUNK_HOME/etc/apps/shuttl/local cd $SPLUNK_HOME/etc/apps/shuttl/local cp $SPLUNK_HOME/etc/apps/shuttl/default/indexes.conf $SPLUNK_HOME/etc/apps/shuttl/local
Next, configure local/indexes.conf by appending as follows
# data retained for 5 minutes # Approx 128MB sized buckets # Max index at 1GB [mytestdb] homePath = $SPLUNK_DB/mytestdb/db coldPath = $SPLUNK_DB/mytestdb/colddb thawedPath = $SPLUNK_DB/mytestdb/thaweddb rotatePeriodInSecs = 10 frozenTimePeriodInSecs = 120 maxWarmDBCount = 1 maxDataSize = 32 maxTotalDataSizeMB = 64 warmToColdScript = $SPLUNK_HOME/etc/apps/shuttl/bin/warmToColdScript.sh coldToFrozenScript = $SPLUNK_HOME/etc/apps/shuttl/bin/coldToFrozenScript.sh
NOTE: in previous versions the index name would be passed to the coldToFrozenScript.sh, this is now ignored. Note also we have a new script, warmToColdScript that will move data to the backend earlier than during the frozen event. Things in the cold bucket can now be simultaneously archived, and available for search via Splunk.
With the above, in order to see data in HDFS, you need to either:
- have more than 64mb of data
- and have data older than 2 minutes
In production, better settings would be (24 hour retention, and 10gb max size):
frozenTimePeriodInSecs = 86400 maxWarmDBCount = 3 maxDataSize = 1024 maxTotalDataSizeMB = 10240 rotatePeriodInSecs = 60
See: http://docs.splunk.com/Documentation/Splunk/latest/admin/Indexesconf
- ./splunk start
- SPLUNK: Go to manager, Manager » Data inputs » Files & directories » Add new
- Create a file input to monitor a directory, with mytestdb as the index (be sure to set the index correctly, else the data will go to the default! You'll find it under "More Settings.")
- SPLUNK: Manager » Apps
- Click Enable to enable the app
- SPLUNK: Manager » Server controls
- Click Restart to restart server (this may not be necessary, but just in case)
- cd to $SPLUNK_HOME/etc/apps/shuttl/bin
- ./testArchivingBucket.sh
There should be no errors.
- Toss data into the monitored area
Splunk: From search bar, for All Time:
index=mytestdb * You should see data in the index. If you don't, you didn't configure the input or index correctly.
After 5 minutes. Check Hadoop.
Hadoop:
./hadoop fs -ls /archive_root/archive_data/cluster_name/myserver/mytestdb Where "myserver" is your Splunk servername.
You should see buckets listed. If not, HDFS may not be accessible at hdfs://localhost:9000