6 common mistakes and 4 common questions on logging
The examples of this blog post are in java, but the general principles are independent of the programming language.
Mistake #1: Error message not adjusted to the audience
Who will read your log messages? Yourself? An operator? If you’re doing devops: A Teammate? A future teammate who will join in some years? Some person with on-call duty in the middle of the night?
Will he/she know how to deal with this error?
For all serious errors that cannot be recovered by restarting the service, you should provide a step-by-step solution in the service’s playbook for the operator. It should be super simple to execute.
The best example I’ve ever seen are Oracle databases. They provide error code numbers with the “ORA” prefix. Oracle provides good explanations of reasons and things you can do; and the ORA-codes are easy to google.
Oracle database error codes
Mistake #2: Too many error messages
Problem: When a bad thing happens, then you will not find its error log, because there are too many messages.
How to prevent this: You need to alert on every error log from day 1 of your software project. You will think hard if a problem is important enough to wake you up in the middle of the night.
Mistake #3: Too many messages
Sometimes I see software, that continuously log INFO logs, but nobody reads them. They just cost CPU, disk space and money; especially when you have your workloads in the cloud.
Often those log statements actually made some sense in the past when debugging some issue. But after resolving it, they were forgotten in the code.
IMHO we should go through the software at least once a year and just delete all code that logs on DEBUG, TRACE and INFO level log.
Mistake #4: Messed formatting
When starting your project this seems not to be a big problem. Often times you can simply log into a file and use cat
, grep
and less
for viewing and searching.
But when your project evolves, then you need better solutions, and you maybe have some specialized software for searching in your logfiles.
Log analysis software works best, when it can assume that the log messages share a common layout; recently JSON has become popular for log events.
What bad things can happen with mixed message format?
In the worst case an ERROR log just slips through. Sometimes you cannot see the full content of the log message, or you miss the stacktrace of an exception.
What are the reasons for mixed message formats?
Most times your software comes with a bunch of libraries. These libraries do not necessarily share a common log layout.
In java things have become much better since many library providers use slf4j-api
nowadays, and you can provide the slf4j
implementation and configuration centrally.
But some libraries still use their own logging, and sometimes you spend hours to find out, how to configure their log layout.
Mistake #5: Logging code causing exceptions
Maybe you will find something like this in your java code:
logger.info("processing {}", arg);
arg
is some object. When the line is executed it will call the toString()
method of the arg
object.
But what if the toString()
method throws an exception? You don’t know…
- Maybe the exception will be thrown up the call stack and the main method will stop?
- Maybe the exception will be thrown up to some place where some other programmer’s code will do any arbitrary thing?
- Maybe some logging implementations might care about catching the exception and provide some workaround?
- Maybe later somebody will change the log implementation and some completely different things happen?
What can you do?:
- Keep the implementations of
toString()
simple and easy. They should not cause exceptions. Resist to start serializing the complete object into JSON or do some other fancy stuff. - Check the implementation of the
toString()
method before sending it as an argument to logging. - If your logging library does not handle exceptions AND the implementation of
toString()
can throw exceptions AND you cannot change this implementation: You can maybe work around this problem:
logger.info("processing {}", new Object() {
public String toString() {
try {
return arg.toString();
}
catch (RuntimeException re) {
return "could not render arg";
}
}
});
This looks ugly. I would use this only as means of last resort.
Mistake #6: Ignore browser’s javascript errors
Modern web applications run a lot of code in the browser. Your frontend code should send errors of your javascript code to the server. You must not trust these logs completely, because any code running in the browser can be hacked and is not trustworthy. But logs and metrics from the code executed in the browser can tell you more about the user’s actual experience than all the backend monitoring together.
Question #1: Where should I put my logs? stdout, stderr, files?
It depends on how your program is used:
unix style
program output goes to stdout, logs go to stderr. https://stackoverflow.com/questions/4919093/should-i-log-messages-to-stderr-or-stdout
12factor app
In microservices, that are conforming to the 12factor app, the logs go to stdout. https://12factor.net/logs
other context
Choose wisely.
Question #2: Log lines vs. log events?
Most of the time one line in the log file is one log message. But this is not always true, especially not for stacktraces.
When your log analysis system is not working properly, then you will not be able to relate from the exception to its stacktrace.
In theory one log stream event can include many lines.
But can you make sure, that every part in the log processing system is doing this correctly? How much of your log processing system is under your control? Does your log file analyser or log stream analyser work line based? Will it be able to match the exception to its stacktrace?
If you want to do log processing properly, you should have good devops culture.
IMHO the biggest advantage of JSON logging is: You can put everything, including the whole exception stacktrace into one line. JSON logging might save you a lot of time debugging different parts of the log processing.
Question #3: When to use WARN logging?
“WARN
is not as serious as an ERROR
, but still a problem.”
So how do you want to deal with it?
- Do you need to fix it instantly? -> Don’t use WARN level, use ERROR level.
- You have no special policy on how to deal with it? -> Don’t use WARN level, use INFO level.
- Someone should be looking for them, when he/she has time. -> Don’t use WARN level, use INFO level.
Anything else?
Maybe you want to set up an automated scheduled e-mail sender that informs your team’s maintenance agent every morning or every week about the WARN messages (and you give him/her the time to fix the issues)? Then you can use WARN
level.
But if you don’t have a special technical treatment for the level, then don’t use WARN
level.
Question #4: Should I use slf4j’s parameterized logging?
see also https://www.slf4j.org/faq.html#logging_performance
logger.debug("processing "+ arg);
will execute arg
’s toString()
method every time it is executed. Assuming your standard log level is configured to INFO
calling toString()
is just a waste of time and money.
logger.debug("processing {}", arg);
will skip executing arg
’s toString()
method when the configured log level does not include DEBUG
.
But it has some downsides:
- It’s harder to read.
- When processing calls like
logger.error("processing {} failed", arg, exception);
, slf4j needs to guess which arguments are exceptions and should be shown with stacktrace and which are not (see slf4j FAQ). In case of an urgent incident you do not want to reason about: “Is the stacktrace invisible because slf4j’s guess was wrong?”
So my recommendation is:
- for
INFO
level logs: Use parameterized logging. Some people run service just onWARN
orERROR
level logging. - for
ERROR
level logs: Don’t use parameterized logging. Errors should not occur often and people do not usually switch offERROR
level logging. The performance gain is not worth it.
Any comments or suggestions? Leave an issue or a pull request!