The response time of a web application is one of the most important factors to create a convenient user experience. Nobody spends much time exploring a site with a sluggish response time. So you need to understand where the time is spent to improve the response time of your web application. This article is about the response time log of the Apache HTTP Server.
It seems to be mostly unknown that the Apache HTTP Server can write the response time for each request into a logfile. All you need is a custom log with the specific log directive. Since Apache is capable of writing to multiple log files, there is no need to change the format of the common log. You can simply create an additional log for the response time analysis and continue to use the common log for other web analytics software.
I am using the following configuration to generate a logfile with some useful details to investigate the response times of my web server:
LogFormat "\"%{%Y-%m-%d %H:%M:%S}t\" %V %m \"%U\" \"%q\" %{Content-Type}o %s %B %O %D" responsetime
CustomLog "/var/log/apache2/responsetime.log" responsetime
The description of the different log options is available in the Apache documentation of mod_log_config. With this configuration the logfile looks like the following excerpt:
"2011-12-05 04:27:50" www.dograworld.com GET "blog/apache-response-time-and-logging-best-practices" "?key=val" text/html 200 5443 5991 45686
"2011-12-05 04:27:51" www.dograworld.net GET "blog/apache-response-time-and-logging-best-practices" "?url=/test.css" text/css 200 3643 3946 132378
"2011-12-05 04:27:51" www.dograworld.net GET "/test/jquery.js" "" application/javascript 200 32062 32366 1637195
...
Every line of the logfile contains the following items:
- the timestamp formatted according to ISO 8601 for easy parsing by other programs like R
- the virtual host name
- the request method (GET or POST normally, but HEAD is also common and others if you use WebDAV)
- the URL path
- the query string (in my case mod_rewrite is used to map the URL path to a query string)
- the mime type of the requested entity
- the HTTP status code
- the size of the requested entity itself
- the size of the response message containing the entity and the HTTP response headers
- the response time
The final item of every log entry is the interesting part. This is the response time in microseconds (1 second = 1000000 microseconds) of the request. The time not only contains CPU time spent by different software components (webserver, server-side scripting languages, databases, ...) but also the wait times involved (disk I/O waits, database lock waits, network latencies in n-tier environments, ...).
A couple of things I would like to mention though:
- The time is measured on the webserver and reflects the time spent to generate a single response. The server obviously doesn't know about dependencies between content items. The time needed on the client to fetch everything and render the page is definitely higher.
- Large documents (like digital images or videos) probably show large response times when the client uses an internet connection with a low bandwidth. This is caused by the way TCP/IP uses output buffers to keep data until it has been acknowledged. The throughput of the connection depends on the slowest link and the sender has to wait until it knows that everything arrived at the receiver.
At first this approach may look more clumsy than the typical web analytics software (Analog, AWStats, Piwik, Webalizer, ...). But in my opinion the real power is revealed when this data is used to perform statistical analysis.