The ability to monitor and log application operations data is an increasingly challenging problem for systems owners. This is driven in large part due to highly distributed and scale-out architectures that use clustering, load-balancing and increasingly cloud-computing elements with many moving parts. The range and diversity of logging formats makes it very difficult to easily collect and act upon this data stream to support application reliability, trouble shooting and performance testing/management use cases. A fundamental issue with logging is the subsequent need for parsing such that it can be analyzed. The irony is that most applications tend to generate log and error information for human consumption but in reality that data must be consumed and acted upon by machines!
Here is a very informative blog on structured logging with some real tips – here is a snippet.
Logging for Humans or Machines
I would argue that every log message is intended to be interpreted by either a human or a machine.
If a log message is only going to be used to help a human debug a system or casually see what’s going on, it’s intended for humans. If a log message is going to be used for machine analysis later (e.g. monitoring, alerting, business analytics, security auditing, etc), it’s intended for machines.
This is an important distinction. If it isn’t clear, please re-read the prior paragraphs. Now might also be a good time for you to audit the logs in your life and assign humanor machine labels to them.
The problem with logging as most of us know it is that it is optimized for human consumption, even if machines are the intended consumer.
At the heart of most logging systems we log a string, a char *, an array of characters that can easily be read by humans. For example, a simple web application might produce log files containing lines like:
2012:11:24T17:32:23.3435 INFO email@example.com successfully logged in
Human readability is a great feature to have. However, it comes at a price: more difficult machine consumption.
By optimizing for human consumption, we’ve introduced a new problem: log parsing.
In the above example log message, what the program has done is combined a few distinct fields of data (an event - logged in, the username, and the time) into a single text string. This is great for humans glancing at a log file. But that decision now necessitates downstream machine consumers to parse/decode that text back into its original, distinct fields.
I call this style of logging destructured or unstructured logging. We start with distinct fields of data and then destructure them into a fully free-form or semi-formed text. By doing so, we lose part of the original data structure and necessitate the need to reconstruct it (via parsing).
Read more at the blog post below for the solution.
Here is another interesting article on Project Lumberjack that provides a universal and structured logging solution.