Common Log Format
The Common Log Format,[1] also known as the NCSA Common log
format,[2] (after NCSA_HTTPd) is a standardized text file format
used by web servers when generating server log files. Because the format is
standardized, the files can be readily analyzed by a variety of web analysis
programs, for example Webalizer and Analog.
Each line in a file stored in the Common Log Format has the
following syntax:
host ident authuser date request
status bytes
127.0.0.1 user-identifier frank
[10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
A "-" in a field indicates missing data.
· 127.0.0.1 is the IP address of the client
(remote host) which made the request to the server.
· frank is the userid of the person requesting the document.
· [10/Oct/2000:13:55:36 -0700] is the date, time,
and time zone that the request was received, by default in strftime format %d/%b/%Y:%H:%M:%S %z.
· "GET /apache_pb.gif HTTP/1.0" is the request line from the client. The method GET, /apache_pb.gif
the resource requested, and HTTP/1.0 the HTTP protocol.
· 200 is the HTTP status code
returned to the client. 2xx is a successful response, 3xx a redirection, 4xx a
client error, and 5xx a server error.
Log files are a standard tool for computer systems developers
and administrators. They record the "what happened when by whom" of
the system. This information can record faults and help their diagnosis. It can
identify security breaches and other computer misuse. It can be used for
auditing. It can be used for accounting purposes.[citation needed]
The information stored is only available for later analysis if
it is stored in a form that can be analysed. This data can be structured in
many ways for analysis. For example, storing it in a relational database would
force the data into a query-able format. However, it would also make it more
difficult to retrieve if the computer crashed, and logging would not be
available unless the database was available. A plain text format minimises
dependencies on other system processes, and assists logging at all phases of
computer operation, including start-up and shut-down, where such processes
might be unavailable.[citation needed]
· Syslog
That being said, developers often write their logs in a way that they can understand them, because ultimately it will be them doing the troubleshooting / code fixing when something severe breaks.
This method has been successful, but time consuming, and the
true value of any SME is reducing any systems MTTR, and increasing uptime. With
any system, the more transactions processed means the larger the scale of a
system, which after about 20 machines, troubleshooting begins to get more
complex, and time consuming with a manual process.
The nicer we format the logs, the faster Splunk can reveal the
information about our systems, saving everyone time and headaches.
· Structured data: These are usually logs for Apache,
IIS, Windows events, Cisco, and some other manufacturers.
· Unstructured data: This type of logging usually comes
from a proprietary application where each message can be printed differently in
different operations and the event itself can span multiple lines with no
definitive event start, or event end, or both. Often, this is the bulk of our
data
We can't affect the way a framework
like Java, .NET, Windows, or Cisco log their information, so let's focus on
what we can potentially improve. For the rest, we will have to create some
Splunk logic to do what's called data normalization
Innately, Splunk will understand the
structure of an IIS( IIS (Internet Information Services) is a web server provided by
Microsoft. A web server is used for storing, processing and serving web pages
to a requesting client. When you set up your website on IIS and logging
is enabled, the server will write a log statement to a log file
when an HTTP transaction occurs)
type of log event. However, in some
cases it's up to the Splunk engineer to tell Splunk the field names, and the
order of each of these events. This is basically event and field extraction,
and it's also how we start organizing and managing the value of a dataset.
Note
Note
Splunk has only extracted a handful of log types, such as
IIS/Apache logs, by default and cannot be leveraged on other datasets. Many of
the other datasets are extracted using either an app from Splunkbase or the
manual method. For a full list of datasets Splunk has programmed for automatic
field extraction, please visit http://www.splunk.com/ .
First we need to understand our three
most popular actions within an application:
· Publication: This is when an application answers
a query and publishes data to a user one time. A user clicks a button, the
application serves the user it's data for that request, and the transaction is
complete.
· Subscription: This is a transaction that begins
with the click of a button, though the data streams to the user until something
stops it. Think of this kind of transaction as a YouTube video that you click
on. You click to start the video, and then it just streams to your device. The stream
is a subscription type of transaction. While the application is serving up this
data, it is also often writing to the application logs. This can get noisy as
subscriptions can sometimes last hours or even days.
· Database call: These are simply calls to a
database to either retrieve or insert data to a database. These actions are
usually pretty easy to capture. It's what people want to see from this data
that becomes a challenge.
Unstructured data
The following screenshot is an example of what unstructured data looks like:
These kinds of logs are much more complicated to bring value to, as all of the knowledge must be manually extracted by a Splunk engineer or admin. Splunk will look at your data and attempt to extract things that it believes is fields. However, this often ends up being nothing of what you or your users are wanting to use to add value to their dashboards.
That being the case, this is where one would need to speak to the developer/vendor of that specific software, and start asking some pointed questions.
There's lots of ways to break an event in Splunk (see
http://docs.splunk.com/Documentation/Splunk/6.4.1/Admin/Propsconf
and search for
break
)
No comments:
Post a Comment