Monday, June 18, 2018

Data Inputs

Knowing all of the applications and methods we can use to get data into Splunk, let's talk about the types of data inputs from data sources, and how they get to the indexer. There are six general types of data inputs in Splunk:

API inputs
Database inputs
Monitoring inputs
Scripted inputs
Modular inputs
Windows inputs

API inputs

There are two ways to get REST API data into Splunk:

Download the REST API modular input, and install it into your Heavy Forwarder
Write a REST API poller using cURL or some other method to query the API, and scrub the output for the data you need

If at all possible, use the REST API modular input from Splunk, as it is very easy to set up and use. Just figure out your URL, and set up the API input and it's interval that you want it to be polled at.

The different type of database inputs that DB Connect can leverage are as follows:

Database inputs

Automated input: These are simply scheduled to run on a time interval, and ingest the whole table that you target.
Rising column: These inputs can simply watch your unique key in your database and add only the newest rows of data.
Dynamic queries (Search Head Forwarder only): These are queries (often SQL) that are written into your search queries. These DB queries can be modified by a DB admin to show only the data that a user would want to see instead of the entire table.
DB lookups (Search Head Forwarder only): These are queries that you can set within DB Connect to lookup things such as employee numbers, or product IDs from a database. You can either use them to lookup data on the fly, or you can write the results to a lookup file or your key-value store.

DB Connect formats any of this data into happy key=value pairs for you to easily parse and search through. DB Connect has compatibility with SQL, MySQL, DB2, Oracle, MongoDB, and more. Feel free to check your DB compatibility at http://www.splunk.com/ .

http://docs.splunk.com/Documentation/DBX/latest/DeployDBX/Createandmanagedatabaseinputs

Monitoring inputs

This is literally just monitoring a file, be it text based, XML, or JSON. The more structured the file, and the more well behaved the logging, the easier it is to get into Splunk. Generally, Splunk likes flat files, however there are still frustrations that come along with this depending on what you monitor.

Scripted inputs

A scripted input is literally a script that can be deployed by the deployment server, run and collect data. The most popular language to use for creating these scripts is bash, Python, PowerShell, or Perl.

These are inputs that are written in a scripting language. Often in Linux, these are written in bash, but if we need to talk to Windows we can use more versatile languages such as Python, Perl, or PowerShell as well. Each Forwarder comes with a set of libraries for languages such as Python, or Perl, but for languages such as PowerShell or bash, the Forwarder will use the libraries that are installed on the system itself. With PowerShell, it's advised to install the PowerShell add-on from Splunk in order to leverage that language appropriately. When you do it's simply a stanza within your inputs.conf you can set at an interval of time to collect the data.

Modular inputs

These inputs are usually installed on a Heavy Forwarder, and collect data from multiple sources. Let's use the simple example of SNMP (Simple Network Management Protocol) polling, as well as an API modular input. Both of these packages can be installed on the same Heavy Forwarder, and while SNMP is polling your network devices, maybe the API modular input is polling a REST API of your Resource Manager device to draw information about Hadoop jobs for your YARN (Yet Another Resource Negotiator) system. For these you can use the UI of the Heavy Forwarder to set them up.

These inputs can be developed by you, or they also often come packaged within many apps from Splunk. A modular input is usually best recognized for being able to use the UI in order to configure it. Some apps that leverage modular inputs are the EMC app for Xtreme IO, the SNMP modular input, and the REST API input.

Often these inputs leverage backend code or scripting that reach out to the target system and speak to the systems API, pull back a bunch of data, and format the data output into a meaningful format.

Many companies have partnered with Splunk in order to create an app that leverages a modular input. The EMC Xtreme app on Splunk is a great example of that, as is the Service Now app. These apps were developed (and are supported) by both Splunk and the company, so if you get stuck in deploying an app like this, first call Splunk, and (if needed) they can reach out to the company.

Modular inputs can be difficult to develop, however they are quite easy to use, which makes the effort in pre-deployment worth it. Often, Splunk admins simply don't have the time to make these, which is when Splunkbase is a wonderful thing.

We will just use an app from Splunkbase to show what a modular input looks like, and how to recognize one.

The EMX XtremIO add-on is a modular input. This is best installed on either a Search Head Forwarder or a Heavy Forwarder for the purpose of saving you time getting the data from the system to a Splunk index:

This is necessary in order to get data into Splunk from your Xtreme IO appliance, and is part of the installation bundle for the Xtreme IO app.

When you install this kind of add-on to your Heavy Forwarder, you generally won't get a standard app icon within the app selection menu. Since this is a data input, it will be a new selection under the Data inputs menu:

Now you can see that this modular input leverages a REST API for gathering its data. Many times this is the case, but not always. If we click on this when it is unconfigured you will see that it's blank, and that there is no way to configure it.

Windows event logs / Perfmon

Windows is a tricky system to get system data from, though the people at Splunk have eased this process considerably. Splunk has been kind enough to write into their software a way to decode Windows event logs using an API. This helps greatly in the extraction of Windows event logs, as all Windows logs are written in binary and are quite unintelligible to look at in their raw state.

If you're using Windows, you'll need to use a Splunk Universal Forwarder or Heavy Forwarder to pull data in this fashion, although I assure you it's far easier than crafting your own PowerShell scripts or WMI queries to get the data.

Become a Splunk Ninja

Common Log Format

The Common Log Format,^[1] also known as the NCSA Common log format,^[2] (after NCSA_HTTPd) is a standardized text file format used by web servers when generating server log files. Because the format is standardized, the files can be readily analyzed by a variety of web analysis programs, for example Webalizer and Analog.

Each line in a file stored in the Common Log Format has the following syntax:

host ident authuser date request status bytes

Example[edit]

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

A "-" in a field indicates missing data.

· 127.0.0.1 is the IP address of the client (remote host) which made the request to the server.

· user-identifier is the RFC 1413 identity of the client.

· frank is the userid of the person requesting the document.

· [10/Oct/2000:13:55:36 -0700] is the date, time, and time zone that the request was received, by default in strftime format %d/%b/%Y:%H:%M:%S %z.

· "GET /apache_pb.gif HTTP/1.0" is the request line from the client. The method GET, /apache_pb.gif the resource requested, and HTTP/1.0 the HTTP protocol.

· 200 is the HTTP status code returned to the client. 2xx is a successful response, 3xx a redirection, 4xx a client error, and 5xx a server error.

· 2326 is the size of the object returned to the client, measured in bytes.

Usage[edit]

Log files are a standard tool for computer systems developers and administrators. They record the "what happened when by whom" of the system. This information can record faults and help their diagnosis. It can identify security breaches and other computer misuse. It can be used for auditing. It can be used for accounting purposes.^{[citation needed]}

The information stored is only available for later analysis if it is stored in a form that can be analysed. This data can be structured in many ways for analysis. For example, storing it in a relational database would force the data into a query-able format. However, it would also make it more difficult to retrieve if the computer crashed, and logging would not be available unless the database was available. A plain text format minimises dependencies on other system processes, and assists logging at all phases of computer operation, including start-up and shut-down, where such processes might be unavailable.^{[citation needed]}

Unstructured data

The following screenshot is an example of what unstructured data looks like:

These kinds of logs are much more complicated to bring value to, as all of the knowledge must be manually extracted by a Splunk engineer or admin. Splunk will look at your data and attempt to extract things that it believes is fields. However, this often ends up being nothing of what you or your users are wanting to use to add value to their dashboards.

That being the case, this is where one would need to speak to the developer/vendor of that specific software, and start asking some pointed questions.

There's lots of ways to break an event in Splunk (see http://docs.splunk.com/Documentation/Splunk/6.4.1/Admin/Propsconf and search for break)

Sunday, June 17, 2018

Splunk Search Reference

Command quick reference

Search command quick reference table

The table below lists all of the search commands in alphabetical order. There is a short description of the command and links to related commands. For the complete syntax, usage, and detailed examples, click the command name to display the specific topic for that command.

Some of these commands share functions. For a list of the functions with descriptions and examples, see Evaluation functions and Statistical and charting functions.

Command	Description	Related commands
`abstract`	Produces a summary of each search result.	`highlight`
`accum`	Keeps a running total of the specified numeric field.	`autoregress, delta, trendline, streamstats`
`addcoltotals`	Computes an event that contains sum of all numeric fields for previous events.	`addtotals, stats`
`addinfo`	Add fields that contain common information about the current search.	`search`
`addtotals`	Computes the sum of all numeric fields for each result.	`addcoltotals`, `stats`
`analyzefields`	Analyze numerical fields for their ability to predict another discrete field.	`anomalousvalue`
`anomalies`	Computes an "unexpectedness" score for an event.	`anomalousvalue`, `cluster`, `kmeans`, `outlier`
`anomalousvalue`	Finds and summarizes irregular, or uncommon, search results.	`analyzefields`, `anomalies`, `cluster`, `kmeans`, `outlier`
`anomalydetection`	Identifies anomalous events by computing a probability for each event and then detecting unusually small probabilities.	`analyzefields`, `anomalies`, `anomalousvalue`, `cluster`, `kmeans`, `outlier`
`append`	Appends subsearch results to current results.	`appendcols`, `appendcsv`, `appendlookup`, `join`, `set`
`appendcols`	Appends the fields of the subsearch results to current results, first results to first result, second to second, etc.	`append`, `appendcsv`, `join`, `set`
`appendpipe`	Appends the result of the subpipeline applied to the current result set to results.	`append`, `appendcols`, `join`, `set`
`arules`	Finds association rules between field values.	`associate`, `correlate`
`associate`	Identifies correlations between fields.	`correlate`, `contingency`
`audit`	Returns audit trail information that is stored in the local audit index.
`autoregress`	Sets up data for calculating the moving average.	`accum`, `autoregress`, `delta`, `trendline`, `streamstats`
`bin` (bucket)	Puts continuous numerical values into discrete sets.	`chart`, `timechart`
`bucketdir`	Replaces a field value with higher-level grouping, such as replacing filenames with directories.	`cluster`, `dedup`
`chart`	Returns results in a tabular output for charting. See also, Statistical and charting functions.	`bin`,`sichart`, `timechart`
`cluster`	Clusters similar events together.	`anomalies`, `anomalousvalue`, `cluster`, `kmeans`, `outlier`
`cofilter`	Finds how many times field1 and field2 values occurred together.	`associate`, `correlate`
`collect`	Puts search results into a summary index.	`overlap`
`concurrency`	Uses a duration field to find the number of "concurrent" events for each event.	`timechart`
`contingency`	Builds a contingency table for two fields.	`associate`, `correlate`
`convert`	Converts field values into numerical values.	`eval`
`correlate`	Calculates the correlation between different fields.	`associate`, `contingency`
`datamodel`	Examine data model or data model dataset and search a data model dataset.	`pivot`
`dbinspect`	Returns information about the specified index.
`dedup`	Removes subsequent results that match a specified criteria.	`uniq`
`delete`	Delete specific events or search results.
`delta`	Computes the difference in field value between nearby results.	`accum`, `autoregress`, `trendline`, `streamstats`
`diff`	Returns the difference between two search results.
`erex`	Allows you to specify example or counter example values to automatically extract fields that have similar values.	`extract`, `kvform`, `multikv`, `regex`, `rex`, `xmlkv`
`eval`	Calculates an expression and puts the value into a field. See also, Evaluation functions.	`where`
`eventcount`	Returns the number of events in an index.	`dbinspect`
`eventstats`	Adds summary statistics to all search results.	`stats`
`extract` (kv)	Extracts field-value pairs from search results.	`kvform`, `multikv`, `xmlkv`, `rex`
`fieldformat`	Expresses how to render a field at output time without changing the underlying value.	`eval`, `where`
`fields`	Removes fields from search results.
`fieldsummary`	Generates summary information for all or a subset of the fields.	`analyzefields`, `anomalies`, `anomalousvalue`, `stats`
`filldown`	Replaces NULL values with the last non-NULL value.	`fillnull`
`fillnull`	Replaces null values with a specified value.
`findtypes`	Generates a list of suggested event types.	`typer`
`folderize`	Creates a higher-level grouping, such as replacing filenames with directories.
`foreach`	Run a templatized streaming subsearch for each field in a wildcarded field list.	`eval`
`format`	Takes the results of a subsearch and formats them into a single result.
`from`	Retrieves data from a dataset, such as a data model dataset, a CSV lookup, a KV Store lookup, a saved search, or a table dataset.
`gauge`	Transforms results into a format suitable for display by the Gauge chart types.
`gentimes`	Generates time-range results.
`geom`	Adds a field, named "geom", to each event. This field contains geographic data structures for polygon geometry in JSON and is used for the choropleth map visualization.	`geomfilter`
`geomfilter`	Accepts two points that specify a bounding box for clipping a choropleth map. Points that fall outside of the bounding box are filtered out.	`geom`
`geostats`	Generate statistics which are clustered into geographical bins to be rendered on a world map.	`stats, xyseries`
`head`	Returns the first number n of specified results.	`reverse, tail`
`highlight`	Highlights the specified terms.	`iconify`
`history`	Returns a history of searches formatted as an events list or as a table.	`search`
`iconify`	Displays a unique icon for each different value in the list of fields that you specify.	`highlight`
`input`	Add or disable sources.
`inputcsv`	Loads search results from the specified CSV file.	`loadjob, outputcsv`
`inputlookup`	Loads search results from a specified static lookup table.	`inputcsv`, `join`, `lookup`, `outputlookup`
`iplocation`	Extracts location information from IP addresses.
`join`	Combine the results of a subsearch with the results of a main search.	`appendcols, lookup, selfjoin`
`kmeans`	Performs k-means clustering on selected fields.	`anomalies, anomalousvalue, cluster, outlier`
`kvform`	Extracts values from search results, using a form template.	`extract, kvform, multikv, xmlkv, rex`
`loadjob`	Loads events or results of a previously completed search job.	`inputcsv`
`localize`	Returns a list of the time ranges in which the search results were found.	`map, transaction`
`localop`	Run subsequent commands, that is all commands following this, locally and not on remote peers.
`lookup`	Explicitly invokes field value lookups.
`makecontinuous`	Makes a field that is supposed to be the x-axis continuous (invoked by chart/timechart)	`chart, timechart`
`makemv`	Change a specified field into a multivalued field during a search.	`mvcombine, mvexpand, nomv`
`makeresults`	Creates a specified number of empty search results.
`map`	A looping operator, performs a search over each search result.
`mcollect`	Converts search results into metric data and inserts the data into a metric index on the search head.	`collect`, `meventcollect`
`metadata`	Returns a list of source, sourcetypes, or hosts from a specified index or distributed search peer.	`dbinspect`
`metasearch`	Retrieves event metadata from indexes based on terms in the logical expression.	`metadata`, `search`
`meventcollect`	Converts search results into metric data and inserts the data into a metric index on the indexers.	`collect`, `mcollect`
`mstats`	Calculates statistics for the measurement, metric_name, and dimension fields in metric indexes.	`stats`, `tstats`
`multikv`	Extracts field-values from table-formatted events.
`multisearch`	Run multiple streaming searches at the same time.	`append, join`
`mvcombine`	Combines events in search results that have a single differing field value into one result with a multivalue field of the differing field.	`mvexpand, makemv, nomv`
`mvexpand`	Expands the values of a multivalue field into separate events for each value of the multivalue field.	`mvcombine, makemv, nomv`
`nomv`	Changes a specified multivalued field into a single-value field at search time.	`makemv, mvcombine, mvexpand`
`outlier`	Removes outlying numerical values.	`anomalies`, `anomalousvalue`, `cluster`, `kmeans`
`outputcsv`	Outputs search results to a specified CSV file.	`inputcsv`, `outputtext`
`outputlookup`	Writes search results to the specified static lookup table.	`inputlookup`, `lookup`, `outputcsv`, `outputlookup`
`outputtext`	Outputs the raw text field (`_raw`) of results into the `_xml` field.	`outputtext`
`overlap`	Finds events in a summary index that overlap in time or have missed events.	`collect`
`pivot`	Run pivot searches against a particular data model dataset.	`datamodel`
`predict`	Enables you to use time series algorithms to predict future values of fields.	`x11`
`rangemap`	Sets RANGE field to the name of the ranges that match.
`rare`	Displays the least common values of a field.	`sirare, stats, top`
`regex`	Removes results that do not match the specified regular expression.	`rex`, `search`
`relevancy`	Calculates how well the event matches the query.
`reltime`	Converts the difference between 'now' and '_time' to a human-readable value and adds adds this value to the field, 'reltime', in your search results.	`convert`
`rename`	Renames a specified field; wildcards can be used to specify multiple fields.
`replace`	Replaces values of specified fields with a specified new value.
`rest`	Access a REST endpoint and display the returned entities as search results.
`return`	Specify the values to return from a subsearch.	`format, search`
`reverse`	Reverses the order of the results.	`head, sort, tail`
`rex`	Specify a Perl regular expression named groups to extract fields while you search.	`extract, kvform, multikv, xmlkv, regex`
`rtorder`	Buffers events from real-time search to emit them in ascending time order when possible.
`savedsearch`	Returns the search results of a saved search.
`script` (run)	Runs an external Perl or Python script as part of your search.
`scrub`	Anonymizes the search results.
`search`	Searches indexes for matching events.
`searchtxn`	Finds transaction events within specified search constraints.	`transaction`
`selfjoin`	Joins results with itself.	`join`
`sendemail`	Emails search results to a specified email address.
`set`	Performs set operations (union, diff, intersect) on subsearches.	`append, appendcols, join, diff`
`setfields`	Sets the field values for all results to a common value.	`eval`, `fillnull`, `rename`
`sichart`	Summary indexing version of chart.	`chart, sitimechart, timechart`
`sirare`	Summary indexing version of rare.	`rare`
`sistats`	Summary indexing version of stats.	`stats`
`sitimechart`	Summary indexing version of timechart.	`chart, sichart, timechart`
`sitop`	Summary indexing version of top.	`top`
`sort`	Sorts search results by the specified fields.	`reverse`
`spath`	Provides a straightforward means for extracting fields from structured data formats, XML and JSON.	`xpath`
`stats`	Provides statistics, grouped optionally by fields. See also, Statistical and charting functions.	`eventstats, top, rare`
`strcat`	Concatenates string values.
`streamstats`	Adds summary statistics to all search results in a streaming manner.	`eventstats, stats`
`table`	Creates a table using the specified fields.	`fields`
`tags`	Annotates specified fields in your search results with tags.	`eval`
`tail`	Returns the last number n of specified results.	`head, reverse`
`timechart`	Create a time series chart and corresponding table of statistics. See also, Statistical and charting functions.	`chart, bucket`
`timewrap`	Displays, or wraps, the output of the timechart command so that every `timewrap-span` range of time is a different series.	`timechart`
`top`	Displays the most common values of a field.	`rare, stats`
`transaction`	Groups search results into transactions.
`transpose`	Reformats rows of search results as columns.
`trendline`	Computes moving averages of fields.	`timechart`
`tscollect`	Writes results into tsidx file(s) for later use by tstats command.	`collect, stats, tstats`
`tstats`	Calculates statistics over tsidx files created with the tscollect command.	`stats, tscollect`
`typeahead`	Returns typeahead information on a specified prefix.
`typelearner`	Generates suggested eventtypes.	`typer`
`typer`	Calculates the eventtypes for the search results.	`typelearner`
`union`	Merges the results from two or more datasets into one dataset.
`uniq`	Removes any search that is an exact duplicate with a previous result.	`dedup`
`untable`	Converts results from a tabular format to a format similar to `stats` output. Inverse of `xyseries` and `maketable`.
`where`	Performs arbitrary filtering on your data. See also, Evaluations functions.	`eval`
`x11`	Enables you to determine the trend in your data by removing the seasonal pattern.	`predict`
`xmlkv`	Extracts XML key-value pairs.	`extract, kvform, multikv, rex`
`xmlunescape`	Unescapes XML.
`xpath`	Redefines the XML path.
`xyseries`	Converts results into a format suitable for graphing.

rajesh_puruvolla