Knowing all of the applications and methods we can use to get data into Splunk, let's talk about the types of data inputs from data sources, and how they get to the indexer. There are six general types of data inputs in Splunk:
- API inputs
- Database inputs
- Monitoring inputs
- Scripted inputs
- Modular inputs
- Windows inputs
API inputs
There are two ways to get REST API data into Splunk:
- Download the REST API modular input, and install it into your Heavy Forwarder
- Write a REST API poller using cURL or some other method to query the API, and scrub the output for the data you need
If at all possible, use the REST API modular input from Splunk, as it is very easy to set up and use. Just figure out your URL, and set up the API input and it's interval that you want it to be polled at.
The different type of database inputs that DB Connect can leverage are as follows:
Database inputs
- Automated input: These are simply scheduled to run on a time interval, and ingest the whole table that you target.
- Rising column: These inputs can simply watch your unique key in your database and add only the newest rows of data.
- Dynamic queries (Search Head Forwarder only): These are queries (often SQL) that are written into your search queries. These DB queries can be modified by a DB admin to show only the data that a user would want to see instead of the entire table.
- DB lookups (Search Head Forwarder only): These are queries that you can set within DB Connect to lookup things such as employee numbers, or product IDs from a database. You can either use them to lookup data on the fly, or you can write the results to a lookup file or your key-value store.
DB Connect formats any of this data into happy
key=value
pairs for you to easily parse and search through. DB Connect has compatibility with SQL, MySQL, DB2, Oracle, MongoDB, and more. Feel free to check your DB compatibility at
http://www.splunk.com/
.http://docs.splunk.com/Documentation/DBX/latest/DeployDBX/Createandmanagedatabaseinputs
Monitoring inputs
This is literally just monitoring a file, be it text based, XML, or JSON. The more structured the file, and the more well behaved the logging, the easier it is to get into Splunk. Generally, Splunk likes flat files, however there are still frustrations that come along with this depending on what you monitor.
A scripted input is literally a script that can be deployed by the deployment server, run and collect data. The most popular language to use for creating these scripts is bash, Python, PowerShell, or Perl.
These are inputs that are written in a scripting language. Often in Linux, these are written in bash, but if we need to talk to Windows we can use more versatile languages such as Python, Perl, or PowerShell as well. Each Forwarder comes with a set of libraries for languages such as Python, or Perl, but for languages such as PowerShell or bash, the Forwarder will use the libraries that are installed on the system itself. With PowerShell, it's advised to install the PowerShell add-on from Splunk in order to leverage that language appropriately. When you do it's simply a stanza within your
inputs.conf
you can set at an interval of time to collect the data.Modular inputs
These inputs are usually installed on a Heavy Forwarder, and collect data from multiple sources. Let's use the simple example of SNMP (Simple Network Management Protocol) polling, as well as an API modular input. Both of these packages can be installed on the same Heavy Forwarder, and while SNMP is polling your network devices, maybe the API modular input is polling a REST API of your Resource Manager device to draw information about Hadoop jobs for your YARN (Yet Another Resource Negotiator) system. For these you can use the UI of the Heavy Forwarder to set them up.
These inputs can be developed by you, or they also often come packaged within many apps from Splunk. A modular input is usually best recognized for being able to use the UI in order to configure it. Some apps that leverage modular inputs are the EMC app for Xtreme IO, the SNMP modular input, and the REST API input.
Often these inputs leverage backend code or scripting that reach out to the target system and speak to the systems API, pull back a bunch of data, and format the data output into a meaningful format.
Many companies have partnered with Splunk in order to create an app that leverages a modular input. The EMC Xtreme app on Splunk is a great example of that, as is the Service Now app. These apps were developed (and are supported) by both Splunk and the company, so if you get stuck in deploying an app like this, first call Splunk, and (if needed) they can reach out to the company.
Modular inputs can be difficult to develop, however they are quite easy to use, which makes the effort in pre-deployment worth it. Often, Splunk admins simply don't have the time to make these, which is when Splunkbase is a wonderful thing.
We will just use an app from Splunkbase to show what a modular input looks like, and how to recognize one.
The EMX XtremIO add-on is a modular input. This is best installed on either a Search Head Forwarder or a Heavy Forwarder for the purpose of saving you time getting the data from the system to a Splunk index:
This is necessary in order to get data into Splunk from your Xtreme IO appliance, and is part of the installation bundle for the Xtreme IO app.
When you install this kind of add-on to your Heavy Forwarder, you generally won't get a standard app icon within the app selection menu. Since this is a data input, it will be a new selection under the Data inputs menu:
Now you can see that this modular input leverages a REST API for gathering its data. Many times this is the case, but not always. If we click on this when it is unconfigured you will see that it's blank, and that there is no way to configure it.
Windows event logs / Perfmon
Windows is a tricky system to get system data from, though the people at Splunk have eased this process considerably. Splunk has been kind enough to write into their software a way to decode Windows event logs using an API. This helps greatly in the extraction of Windows event logs, as all Windows logs are written in binary and are quite unintelligible to look at in their raw state.
If you're using Windows, you'll need to use a Splunk Universal Forwarder or Heavy Forwarder to pull data in this fashion, although I assure you it's far easier than crafting your own PowerShell scripts or WMI queries to get the data.