Common Log Format

The Common Log Format,^[1] also known as the NCSA Common log format,^[2] (after NCSA_HTTPd) is a standardized text file format used by web servers when generating server log files. Because the format is standardized, the files can be readily analyzed by a variety of web analysis programs, for example Webalizer and Analog.

Each line in a file stored in the Common Log Format has the following syntax:

host ident authuser date request status bytes

Example[edit]

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

A "-" in a field indicates missing data.

· 127.0.0.1 is the IP address of the client (remote host) which made the request to the server.

· user-identifier is the RFC 1413 identity of the client.

· frank is the userid of the person requesting the document.

· [10/Oct/2000:13:55:36 -0700] is the date, time, and time zone that the request was received, by default in strftime format %d/%b/%Y:%H:%M:%S %z.

· "GET /apache_pb.gif HTTP/1.0" is the request line from the client. The method GET, /apache_pb.gif the resource requested, and HTTP/1.0 the HTTP protocol.

· 200 is the HTTP status code returned to the client. 2xx is a successful response, 3xx a redirection, 4xx a client error, and 5xx a server error.

· 2326 is the size of the object returned to the client, measured in bytes.

Usage[edit]

Log files are a standard tool for computer systems developers and administrators. They record the "what happened when by whom" of the system. This information can record faults and help their diagnosis. It can identify security breaches and other computer misuse. It can be used for auditing. It can be used for accounting purposes.^{[citation needed]}

The information stored is only available for later analysis if it is stored in a form that can be analysed. This data can be structured in many ways for analysis. For example, storing it in a relational database would force the data into a query-able format. However, it would also make it more difficult to retrieve if the computer crashed, and logging would not be available unless the database was available. A plain text format minimises dependencies on other system processes, and assists logging at all phases of computer operation, including start-up and shut-down, where such processes might be unavailable.^{[citation needed]}

See also[edit]

· Extended Log Format

· Log management and intelligence

· Web log analysis software

· Web counter

· Data logging

· Syslog

That being said, developers often write their logs in a way that they can understand them, because ultimately it will be them doing the troubleshooting / code fixing when something severe breaks.

This method has been successful, but time consuming, and the true value of any SME is reducing any systems MTTR, and increasing uptime. With any system, the more transactions processed means the larger the scale of a system, which after about 20 machines, troubleshooting begins to get more complex, and time consuming with a manual process.

The nicer we format the logs, the faster Splunk can reveal the information about our systems, saving everyone time and headaches.

· Structured data: These are usually logs for Apache, IIS, Windows events, Cisco, and some other manufacturers.

· Unstructured data: This type of logging usually comes from a proprietary application where each message can be printed differently in different operations and the event itself can span multiple lines with no definitive event start, or event end, or both. Often, this is the bulk of our data

We can't affect the way a framework like Java, .NET, Windows, or Cisco log their information, so let's focus on what we can potentially improve. For the rest, we will have to create some Splunk logic to do what's called data normalization

Innately, Splunk will understand the structure of an IIS( IIS (Internet Information Services) is a web server provided by Microsoft. A web server is used for storing, processing and serving web pages to a requesting client. When you set up your website on IIS and logging is enabled, the server will write a log statement to a log file when an HTTP transaction occurs)

type of log event. However, in some cases it's up to the Splunk engineer to tell Splunk the field names, and the order of each of these events. This is basically event and field extraction, and it's also how we start organizing and managing the value of a dataset.

Note

Splunk has only extracted a handful of log types, such as IIS/Apache logs, by default and cannot be leveraged on other datasets. Many of the other datasets are extracted using either an app from Splunkbase or the manual method. For a full list of datasets Splunk has programmed for automatic field extraction, please visit http://www.splunk.com/ .

First we need to understand our three most popular actions within an application:

· Publication: This is when an application answers a query and publishes data to a user one time. A user clicks a button, the application serves the user it's data for that request, and the transaction is complete.

· Subscription: This is a transaction that begins with the click of a button, though the data streams to the user until something stops it. Think of this kind of transaction as a YouTube video that you click on. You click to start the video, and then it just streams to your device. The stream is a subscription type of transaction. While the application is serving up this data, it is also often writing to the application logs. This can get noisy as subscriptions can sometimes last hours or even days.

· Database call: These are simply calls to a database to either retrieve or insert data to a database. These actions are usually pretty easy to capture. It's what people want to see from this data that becomes a challenge.

Unstructured data

The following screenshot is an example of what unstructured data looks like:

These kinds of logs are much more complicated to bring value to, as all of the knowledge must be manually extracted by a Splunk engineer or admin. Splunk will look at your data and attempt to extract things that it believes is fields. However, this often ends up being nothing of what you or your users are wanting to use to add value to their dashboards.

That being the case, this is where one would need to speak to the developer/vendor of that specific software, and start asking some pointed questions.

There's lots of ways to break an event in Splunk (see http://docs.splunk.com/Documentation/Splunk/6.4.1/Admin/Propsconf and search for break)

rajesh_puruvolla

Monday, June 18, 2018

Become a Splunk Ninja

Unstructured data

No comments:

Post a Comment

Splunk Interview Questions - Recomandation

Search This Blog