Monday, June 18, 2018

Data Inputs

Knowing all of the applications and methods we can use to get data into Splunk, let's talk about the types of data inputs from data sources, and how they get to the indexer. There are six general types of data inputs in Splunk:
  • API inputs
  • Database inputs
  • Monitoring inputs
  • Scripted inputs
  • Modular inputs
  • Windows inputs

API inputs

There are two ways to get REST API data into Splunk:
  • Download the REST API modular input, and install it into your Heavy Forwarder
  • Write a REST API poller using cURL or some other method to query the API, and scrub the output for the data you need
If at all possible, use the REST API modular input from Splunk, as it is very easy to set up and use. Just figure out your URL, and set up the API input and it's interval that you want it to be polled at.
The different type of database inputs that DB Connect can leverage are as follows:
Database inputs
  • Automated input: These are simply scheduled to run on a time interval, and ingest the whole table that you target.
  • Rising column: These inputs can simply watch your unique key in your database and add only the newest rows of data.
  • Dynamic queries (Search Head Forwarder only): These are queries (often SQL) that are written into your search queries. These DB queries can be modified by a DB admin to show only the data that a user would want to see instead of the entire table.
  • DB lookups (Search Head Forwarder only): These are queries that you can set within DB Connect to lookup things such as employee numbers, or product IDs from a database. You can either use them to lookup data on the fly, or you can write the results to a lookup file or your key-value store.
DB Connect formats any of this data into happy key=value pairs for you to easily parse and search through. DB Connect has compatibility with SQL, MySQL, DB2, Oracle, MongoDB, and more. Feel free to check your DB compatibility at http://www.splunk.com/ .

http://docs.splunk.com/Documentation/DBX/latest/DeployDBX/Createandmanagedatabaseinputs

Monitoring inputs

This is literally just monitoring a file, be it text based, XML, or JSON. The more structured the file, and the more well behaved the logging, the easier it is to get into Splunk. Generally, Splunk likes flat files, however there are still frustrations that come along with this depending on what you monitor.

Scripted inputs

A scripted input is literally a script that can be deployed by the deployment server, run and collect data. The most popular language to use for creating these scripts is bash, Python, PowerShell, or Perl.
These are inputs that are written in a scripting language. Often in Linux, these are written in bash, but if we need to talk to Windows we can use more versatile languages such as Python, Perl, or PowerShell as well. Each Forwarder comes with a set of libraries for languages such as Python, or Perl, but for languages such as PowerShell or bash, the Forwarder will use the libraries that are installed on the system itself. With PowerShell, it's advised to install the PowerShell add-on from Splunk in order to leverage that language appropriately. When you do it's simply a stanza within your inputs.conf you can set at an interval of time to collect the data.

Modular inputs

These inputs are usually installed on a Heavy Forwarder, and collect data from multiple sources. Let's use the simple example of SNMP (Simple Network Management Protocol) polling, as well as an API modular input. Both of these packages can be installed on the same Heavy Forwarder, and while SNMP is polling your network devices, maybe the API modular input is polling a REST API of your Resource Manager device to draw information about Hadoop jobs for your YARN (Yet Another Resource Negotiator) system. For these you can use the UI of the Heavy Forwarder to set them up.
These inputs can be developed by you, or they also often come packaged within many apps from Splunk. A modular input is usually best recognized for being able to use the UI in order to configure it. Some apps that leverage modular inputs are the EMC app for Xtreme IO, the SNMP modular input, and the REST API input.
Often these inputs leverage backend code or scripting that reach out to the target system and speak to the systems API, pull back a bunch of data, and format the data output into a meaningful format.
Many companies have partnered with Splunk in order to create an app that leverages a modular input. The EMC Xtreme app on Splunk is a great example of that, as is the Service Now app. These apps were developed (and are supported) by both Splunk and the company, so if you get stuck in deploying an app like this, first call Splunk, and (if needed) they can reach out to the company.
Modular inputs can be difficult to develop, however they are quite easy to use, which makes the effort in pre-deployment worth it. Often, Splunk admins simply don't have the time to make these, which is when Splunkbase is a wonderful thing.
We will just use an app from Splunkbase to show what a modular input looks like, and how to recognize one.
The EMX XtremIO add-on is a modular input. This is best installed on either a Search Head Forwarder or a Heavy Forwarder for the purpose of saving you time getting the data from the system to a Splunk index:
Modular inputs
This is necessary in order to get data into Splunk from your Xtreme IO appliance, and is part of the installation bundle for the Xtreme IO app.
When you install this kind of add-on to your Heavy Forwarder, you generally won't get a standard app icon within the app selection menu. Since this is a data input, it will be a new selection under the Data inputs menu:
Modular inputs

Now you can see that this modular input leverages a REST API for gathering its data. Many times this is the case, but not always. If we click on this when it is unconfigured you will see that it's blank, and that there is no way to configure it. 

Windows event logs / Perfmon

Windows is a tricky system to get system data from, though the people at Splunk have eased this process considerably. Splunk has been kind enough to write into their software a way to decode Windows event logs using an API. This helps greatly in the extraction of Windows event logs, as all Windows logs are written in binary and are quite unintelligible to look at in their raw state.
If you're using Windows, you'll need to use a Splunk Universal Forwarder or Heavy Forwarder to pull data in this fashion, although I assure you it's far easier than crafting your own PowerShell scripts or WMI queries to get the data.


Become a Splunk Ninja





Common Log Format


The Common Log Format,[1] also known as the NCSA Common log format,[2] (after NCSA_HTTPd) is a standardized text file format used by web servers when generating server log files. Because the format is standardized, the files can be readily analyzed by a variety of web analysis programs, for example Webalizer and Analog.


Each line in a file stored in the Common Log Format has the following syntax:


host ident authuser date request status bytes

Example[edit]

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326


A "-" in a field indicates missing data.

·       127.0.0.1 is the IP address of the client (remote host) which made the request to the server.
·       user-identifier is the RFC 1413 identity of the client.
·       frank is the userid of the person requesting the document.
·       [10/Oct/2000:13:55:36 -0700] is the date, time, and time zone that the request was received, by default in strftime format %d/%b/%Y:%H:%M:%S %z.
·       "GET /apache_pb.gif HTTP/1.0" is the request line from the client. The method GET, /apache_pb.gif the resource requested, and HTTP/1.0 the HTTP protocol.
·       200 is the HTTP status code returned to the client. 2xx is a successful response, 3xx a redirection, 4xx a client error, and 5xx a server error.
·       2326 is the size of the object returned to the client, measured in bytes.

Usage[edit]

Log files are a standard tool for computer systems developers and administrators. They record the "what happened when by whom" of the system. This information can record faults and help their diagnosis. It can identify security breaches and other computer misuse. It can be used for auditing. It can be used for accounting purposes.[citation needed]

The information stored is only available for later analysis if it is stored in a form that can be analysed. This data can be structured in many ways for analysis. For example, storing it in a relational database would force the data into a query-able format. However, it would also make it more difficult to retrieve if the computer crashed, and logging would not be available unless the database was available. A plain text format minimises dependencies on other system processes, and assists logging at all phases of computer operation, including start-up and shut-down, where such processes might be unavailable.[citation needed]

See also[edit]

·       Extended Log Format



·       Web counter

·       Data logging

·       Syslog



That being said, developers often write their logs in a way that they can understand them, because ultimately it will be them doing the troubleshooting / code fixing when something severe breaks.

This method has been successful, but time consuming, and the true value of any SME is reducing any systems MTTR, and increasing uptime. With any system, the more transactions processed means the larger the scale of a system, which after about 20 machines, troubleshooting begins to get more complex, and time consuming with a manual process.

The nicer we format the logs, the faster Splunk can reveal the information about our systems, saving everyone time and headaches.

·       Structured data: These are usually logs for Apache, IIS, Windows events, Cisco, and some other manufacturers.
·       Unstructured data: This type of logging usually comes from a proprietary application where each message can be printed differently in different operations and the event itself can span multiple lines with no definitive event start, or event end, or both. Often, this is the bulk of our data
We can't affect the way a framework like Java, .NET, Windows, or Cisco log their information, so let's focus on what we can potentially improve. For the rest, we will have to create some Splunk logic to do what's called data normalization
Innately, Splunk will understand the structure of an IIS( IIS (Internet Information Services) is a web server provided by Microsoft. A web server is used for storing, processing and serving web pages to a requesting client. When you set up your website on IIS and logging is enabled, the server will write a log statement to a log file when an HTTP transaction occurs)
type of log event. However, in some cases it's up to the Splunk engineer to tell Splunk the field names, and the order of each of these events. This is basically event and field extraction, and it's also how we start organizing and managing the value of a dataset.

Note
Splunk has only extracted a handful of log types, such as IIS/Apache logs, by default and cannot be leveraged on other datasets. Many of the other datasets are extracted using either an app from Splunkbase or the manual method. For a full list of datasets Splunk has programmed for automatic field extraction, please visit http://www.splunk.com/ .

First we need to understand our three most popular actions within an application:
·       Publication: This is when an application answers a query and publishes data to a user one time. A user clicks a button, the application serves the user it's data for that request, and the transaction is complete.
·       Subscription: This is a transaction that begins with the click of a button, though the data streams to the user until something stops it. Think of this kind of transaction as a YouTube video that you click on. You click to start the video, and then it just streams to your device. The stream is a subscription type of transaction. While the application is serving up this data, it is also often writing to the application logs. This can get noisy as subscriptions can sometimes last hours or even days.
·       Database call: These are simply calls to a database to either retrieve or insert data to a database. These actions are usually pretty easy to capture. It's what people want to see from this data that becomes a challenge.

Unstructured data

The following screenshot is an example of what unstructured data looks like:
Unstructured data

These kinds of logs are much more complicated to bring value to, as all of the knowledge must be manually extracted by a Splunk engineer or admin. Splunk will look at your data and attempt to extract things that it believes is fields. However, this often ends up being nothing of what you or your users are wanting to use to add value to their dashboards.
That being the case, this is where one would need to speak to the developer/vendor of that specific software, and start asking some pointed questions.
There's lots of ways to break an event in Splunk (see http://docs.splunk.com/Documentation/Splunk/6.4.1/Admin/Propsconf and search for break)
Event breaking - best practice


                                            

                                          

Sunday, June 17, 2018

Splunk Search Reference

Command quick reference

Search command quick reference table

The table below lists all of the search commands in alphabetical order. There is a short description of the command and links to related commands. For the complete syntax, usage, and detailed examples, click the command name to display the specific topic for that command.
Some of these commands share functions. For a list of the functions with descriptions and examples, see Evaluation functions and Statistical and charting functions.
Command Description Related commands
abstract Produces a summary of each search result. highlight
accum Keeps a running total of the specified numeric field. autoregress, delta, trendline, streamstats
addcoltotals Computes an event that contains sum of all numeric fields for previous events. addtotals, stats
addinfo Add fields that contain common information about the current search. search
addtotals Computes the sum of all numeric fields for each result. addcoltotals, stats
analyzefields Analyze numerical fields for their ability to predict another discrete field. anomalousvalue
anomalies Computes an "unexpectedness" score for an event. anomalousvalue, cluster, kmeans, outlier
anomalousvalue Finds and summarizes irregular, or uncommon, search results. analyzefields, anomalies, cluster, kmeans, outlier
anomalydetection Identifies anomalous events by computing a probability for each event and then detecting unusually small probabilities. analyzefields, anomalies, anomalousvalue, cluster, kmeans, outlier
append Appends subsearch results to current results. appendcols, appendcsv, appendlookup, join, set
appendcols Appends the fields of the subsearch results to current results, first results to first result, second to second, etc. append, appendcsv, join, set
appendpipe Appends the result of the subpipeline applied to the current result set to results. append, appendcols, join, set
arules Finds association rules between field values. associate, correlate
associate Identifies correlations between fields. correlate, contingency
audit Returns audit trail information that is stored in the local audit index.
autoregress Sets up data for calculating the moving average. accum, autoregress, delta, trendline, streamstats
bin (bucket) Puts continuous numerical values into discrete sets. chart, timechart
bucketdir Replaces a field value with higher-level grouping, such as replacing filenames with directories. cluster, dedup
chart Returns results in a tabular output for charting. See also, Statistical and charting functions. bin,sichart, timechart
cluster Clusters similar events together. anomalies, anomalousvalue, cluster, kmeans, outlier
cofilter Finds how many times field1 and field2 values occurred together. associate, correlate
collect Puts search results into a summary index. overlap
concurrency Uses a duration field to find the number of "concurrent" events for each event. timechart
contingency Builds a contingency table for two fields. associate, correlate
convert Converts field values into numerical values. eval
correlate Calculates the correlation between different fields. associate, contingency
datamodel Examine data model or data model dataset and search a data model dataset. pivot
dbinspect Returns information about the specified index.
dedup Removes subsequent results that match a specified criteria. uniq
delete Delete specific events or search results.
delta Computes the difference in field value between nearby results. accum, autoregress, trendline, streamstats
diff Returns the difference between two search results.
erex Allows you to specify example or counter example values to automatically extract fields that have similar values. extract, kvform, multikv, regex, rex, xmlkv
eval Calculates an expression and puts the value into a field. See also, Evaluation functions. where
eventcount Returns the number of events in an index. dbinspect
eventstats Adds summary statistics to all search results. stats
extract (kv) Extracts field-value pairs from search results. kvform, multikv, xmlkv, rex
fieldformat Expresses how to render a field at output time without changing the underlying value. eval, where
fields Removes fields from search results.
fieldsummary Generates summary information for all or a subset of the fields. analyzefields, anomalies, anomalousvalue, stats
filldown Replaces NULL values with the last non-NULL value. fillnull
fillnull Replaces null values with a specified value.
findtypes Generates a list of suggested event types. typer
folderize Creates a higher-level grouping, such as replacing filenames with directories.
foreach Run a templatized streaming subsearch for each field in a wildcarded field list. eval
format Takes the results of a subsearch and formats them into a single result.
from Retrieves data from a dataset, such as a data model dataset, a CSV lookup, a KV Store lookup, a saved search, or a table dataset.
gauge Transforms results into a format suitable for display by the Gauge chart types.
gentimes Generates time-range results.
geom Adds a field, named "geom", to each event. This field contains geographic data structures for polygon geometry in JSON and is used for the choropleth map visualization. geomfilter
geomfilter Accepts two points that specify a bounding box for clipping a choropleth map. Points that fall outside of the bounding box are filtered out. geom
geostats Generate statistics which are clustered into geographical bins to be rendered on a world map. stats, xyseries
head Returns the first number n of specified results. reverse, tail
highlight Highlights the specified terms. iconify
history Returns a history of searches formatted as an events list or as a table. search
iconify Displays a unique icon for each different value in the list of fields that you specify. highlight
input Add or disable sources.
inputcsv Loads search results from the specified CSV file. loadjob, outputcsv
inputlookup Loads search results from a specified static lookup table. inputcsv, join, lookup, outputlookup
iplocation Extracts location information from IP addresses.
join Combine the results of a subsearch with the results of a main search. appendcols, lookup, selfjoin
kmeans Performs k-means clustering on selected fields. anomalies, anomalousvalue, cluster, outlier
kvform Extracts values from search results, using a form template. extract, kvform, multikv, xmlkv, rex
loadjob Loads events or results of a previously completed search job. inputcsv
localize Returns a list of the time ranges in which the search results were found. map, transaction
localop Run subsequent commands, that is all commands following this, locally and not on remote peers.
lookup Explicitly invokes field value lookups.
makecontinuous Makes a field that is supposed to be the x-axis continuous (invoked by chart/timechart) chart, timechart
makemv Change a specified field into a multivalued field during a search. mvcombine, mvexpand, nomv
makeresults Creates a specified number of empty search results.
map A looping operator, performs a search over each search result.
mcollect Converts search results into metric data and inserts the data into a metric index on the search head. collect, meventcollect
metadata Returns a list of source, sourcetypes, or hosts from a specified index or distributed search peer. dbinspect
metasearch Retrieves event metadata from indexes based on terms in the logical expression. metadata, search
meventcollect Converts search results into metric data and inserts the data into a metric index on the indexers. collect, mcollect
mstats Calculates statistics for the measurement, metric_name, and dimension fields in metric indexes. stats, tstats
multikv Extracts field-values from table-formatted events.
multisearch Run multiple streaming searches at the same time. append, join
mvcombine Combines events in search results that have a single differing field value into one result with a multivalue field of the differing field. mvexpand, makemv, nomv
mvexpand Expands the values of a multivalue field into separate events for each value of the multivalue field. mvcombine, makemv, nomv
nomv Changes a specified multivalued field into a single-value field at search time. makemv, mvcombine, mvexpand
outlier Removes outlying numerical values. anomalies, anomalousvalue, cluster, kmeans
outputcsv Outputs search results to a specified CSV file. inputcsv, outputtext
outputlookup Writes search results to the specified static lookup table. inputlookup, lookup, outputcsv, outputlookup
outputtext Outputs the raw text field (_raw) of results into the _xml field. outputtext
overlap Finds events in a summary index that overlap in time or have missed events. collect
pivot Run pivot searches against a particular data model dataset. datamodel
predict Enables you to use time series algorithms to predict future values of fields. x11
rangemap Sets RANGE field to the name of the ranges that match.
rare Displays the least common values of a field. sirare, stats, top
regex Removes results that do not match the specified regular expression. rex, search
relevancy Calculates how well the event matches the query.
reltime Converts the difference between 'now' and '_time' to a human-readable value and adds adds this value to the field, 'reltime', in your search results. convert
rename Renames a specified field; wildcards can be used to specify multiple fields.
replace Replaces values of specified fields with a specified new value.
rest Access a REST endpoint and display the returned entities as search results.
return Specify the values to return from a subsearch. format, search
reverse Reverses the order of the results. head, sort, tail
rex Specify a Perl regular expression named groups to extract fields while you search. extract, kvform, multikv, xmlkv, regex
rtorder Buffers events from real-time search to emit them in ascending time order when possible.
savedsearch Returns the search results of a saved search.
script (run) Runs an external Perl or Python script as part of your search.
scrub Anonymizes the search results.
search Searches indexes for matching events.
searchtxn Finds transaction events within specified search constraints. transaction
selfjoin Joins results with itself. join
sendemail Emails search results to a specified email address.
set Performs set operations (union, diff, intersect) on subsearches. append, appendcols, join, diff
setfields Sets the field values for all results to a common value. eval, fillnull, rename
sichart Summary indexing version of chart. chart, sitimechart, timechart
sirare Summary indexing version of rare. rare
sistats Summary indexing version of stats. stats
sitimechart Summary indexing version of timechart. chart, sichart, timechart
sitop Summary indexing version of top. top
sort Sorts search results by the specified fields. reverse
spath Provides a straightforward means for extracting fields from structured data formats, XML and JSON. xpath
stats Provides statistics, grouped optionally by fields. See also, Statistical and charting functions. eventstats, top, rare
strcat Concatenates string values.
streamstats Adds summary statistics to all search results in a streaming manner. eventstats, stats
table Creates a table using the specified fields. fields
tags Annotates specified fields in your search results with tags. eval
tail Returns the last number n of specified results. head, reverse
timechart Create a time series chart and corresponding table of statistics. See also, Statistical and charting functions. chart, bucket
timewrap Displays, or wraps, the output of the timechart command so that every timewrap-span range of time is a different series. timechart
top Displays the most common values of a field. rare, stats
transaction Groups search results into transactions.
transpose Reformats rows of search results as columns.
trendline Computes moving averages of fields. timechart
tscollect Writes results into tsidx file(s) for later use by tstats command. collect, stats, tstats
tstats Calculates statistics over tsidx files created with the tscollect command. stats, tscollect
typeahead Returns typeahead information on a specified prefix.
typelearner Generates suggested eventtypes. typer
typer Calculates the eventtypes for the search results. typelearner
union Merges the results from two or more datasets into one dataset.
uniq Removes any search that is an exact duplicate with a previous result. dedup
untable Converts results from a tabular format to a format similar to stats output. Inverse of xyseries and maketable.
where Performs arbitrary filtering on your data. See also, Evaluations functions. eval
x11 Enables you to determine the trend in your data by removing the seasonal pattern. predict
xmlkv Extracts XML key-value pairs. extract, kvform, multikv, rex
xmlunescape Unescapes XML.
xpath Redefines the XML path.
xyseries Converts results into a format suitable for graphing.

Splunk Interview Questions - Recomandation

Splunk Interview Question & Answers - Recomendations