garden-caffe
Friday, April 30th, 2010
This guy is rocking the “coffee and bathrobe in the back yard” look. Something I can aspire to when the weather turns a bit warmer, LOL.
Yet another collection of random links and rantings of a greying unix geek with a photography bent. Pass the Guinness and Grecian Formula.
This guy is rocking the “coffee and bathrobe in the back yard” look. Something I can aspire to when the weather turns a bit warmer, LOL.
Lets pretend you have a stock ticker application that is getting hammered. A lot of people are interested in one particular stock for some reason.
The poor little app server is running some lumbering hulking piece of code written in a legacy language (java). It can’t keep up with all the requests for this same stock symbol over and over.
Furthermore, the developers haven’t been able to put their own caching code into the application just yet.
How do we fix this, from a system administrators perspective?
Perhaps the better way would be to use mod_cache.
Unfortunately, this option did not exist on our web servers and we needed something PDQ to take the load off the app servers. We went with mod_rewrite instead and a small script to do the caching.
Cron runs a script every 5 minutes, which calls the app server, caches the results in a file. Then mod_rewrite rules tell Apache to use that instead of going to the app server (via existing rewrite rules)
In the apache virtual host entry :
RewriteCond %{REQUEST_URI} ^/ticker/s=AAPL$
RewriteCond /app/ticker/cache/AAPL -f
RewriteRule ^(.*) /app/ticker/cache/AAPL [L]
A script to maintain the cache :
#!/bin/bash
export PATH=/usr/bin
CACHEFILE=/app/ticker/cache/AAPL
TMPFILE=${CACHEFILE}.$$
trap "/bin/rm ${TMPFILE}" 0 1 15
curl -D - -H "host: www.example.com" \
"http://app01/ticker/s=AAPL" > ${TMPFILE} \
&& mv ${TMPFILE} ${CACHEFILE}
The cron entry to make it go :
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /app/ticker/bin/update-ticker-cache >/dev/null 2>&1
Now when the outside url http://www.example.com/ticker/s=AAPL is hit, apache looks for the file /app/ticker/cache/AAPL and delivers its contents instead of passing the request through to the app server (via other rewrite rules).
Cron and the script keep this cache updated every 5 minutes.
The graph speaks for itself.
This is based on a real world example. Details have been changed to protect the guilty parties.

Excel computed it’s brains out for 30+ seconds before giving me a partial graph and the complaint that I had too much data for it to continue. I guess you can’t really do anything *serious* with Excel for charting data, such as a month of 1-minute interval data points.
(60 minutes x 24 hours x 30 days = 42,200 points)
I had to resort to gnuplot, which is a bit funky but easily handles large datasets such as this.
What was I trying to graph anyways?
Web server hits per minute and average response time per minute, over a month.
I had custom apache log lines that looked like this :
63.214.229.120 - - [01/May/2009:00:00:00 +0000] "GET /s HTTP/1.1" 302 381 - - - "http://muy/url/here" "MOT-SPARK/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0" "-" "-" 922us 0s -
I piped the logs through distillation code :
#!/bin/sh
bzcat logs/access_log_2009-05-??.bz2 |\
sed -e 's/\[//g' -e 's/\]//g' |\
awk ' {
day=substr($4,0,2)
month=substr($4,4,3)
year=substr($4,8,4)
hhmm=substr($4,13,5)
if ( month =="Jan" ) month="01"
if ( month =="Feb" ) month="02"
if ( month =="Mar" ) month="03"
if ( month =="Apr" ) month="04"
if ( month =="May" ) month="05"
if ( month =="Jun" ) month="06"
if ( month =="Jul" ) month="07"
if ( month =="Aug" ) month="08"
if ( month =="Sep" ) month="09"
if ( month =="Oct" ) month="10"
if ( month =="Nov" ) month="11"
if ( month =="Dec" ) month="12"
timestamp=year"-"month"-"day"-"hhmm
hits[timestamp]++
time[timestamp] += int($(NF-2)/1000)
timeavg[timestamp] = int(time[timestamp] / hits [timestamp])
}
END {
for (timestamp in hits) {
print timestamp " " hits[timestamp] " " timeavg[timestamp]
}
}
' | sort
Which after about a minute resulted in data looking like :
Time, Hits, Average Response (ms)
2009-05-01-00:00 357 508
2009-05-01-00:01 363 607
2009-05-01-00:02 357 589
2009-05-01-00:03 381 693
2009-05-01-00:04 405 576
2009-05-01-00:05 391 369
( That was the data set that Excel choked on… )
Now, create a gnuplot script for hits per minute
#!/opt/local/bin/gnuplot
set terminal png enhanced size 1024,768
set xdata time
set timefmt "%Y-%m-%d-%H:%M"
set format x "%d-%m-%Y"
set xlabel "time"
set grid
set style data points
set xrange [ "2009-05-01-00:00" : "2009-05-31-23:59" ]
set output "web01-05-hits.png"
set ylabel "Hits per minute"
set title "Web01 - May - Hits per minute"
plot "web01-05.dat" using 1:2 title ""
which created this graph in under a second :

Finally, the response time averages per minute
#!/opt/local/bin/gnuplot
reset
set terminal png enhanced size 1024,768
set xdata time
set timefmt "%Y-%m-%d-%H:%M"
set format x "%d-%m-%Y"
set xlabel "time"
set grid
set style data points
set xrange [ "2009-05-01-00:00" : "2009-05-31-23:59" ]
set output "web01-05-response.png"
set ylabel "Average Response Time (ms)"
set title "Web01 - May - Average Response"
plot "web01-05.dat" using 1:3 title ""
[ Click on the graphs to expand them to a readable size. ]
So yeah. I tried to use Excel as these were to be one-off charts and it seemed like the easiest way to do it at first.
Seduced by the simplistic, (and simpleton) microsoft way at the start…
In the end, I learned a bit about gnuplot and how dead simple it really is to ask it to chew through a decent sample size and produce a nice enough looking graph in under a second.
Does The Linux Desktop Innovate Too Much?
<kosh>Yes.</kosh>
I’ve been waiting for this kind of article to come from inside the FOSS camp. I hope it is just one of many pebbles voting for the usability avalanche that leads to stability and greater traction.
In the mean time, I’ll stick with my proprietary locked down GUI which is based on a certified UNIX platform underneath, both of which do not get in my way and merely help me get my daily work done without getting in my face.
Unix is essentially - and embarrassingly - dead on the desktop, so Unix operating system makers have the luxury of only having to worry about servers.
What you talkin’bout Willis?
Ever hear of OS X? 10.5 Leopard is UNIX certified and has more a respectabledesktop share ( approx 9% ) than all the other Unix and Unix-wannabe (Linux) variations out there.