Tailing rolling logs files

All too often I am tailing log files that are being written by a rolling log file appender using a command like:

tail -f application.log

This works fine, but when the file rolls over tail stops tailing. It’s not always clear if the file has really rolled, or if the application is just not producing any logs currently. This is even less obvious if the tailed file is being piped to grep to look for an uncommon search string.

Today I found that using tail with a -F will follow a file even after it’s been rolled. That is it follows the file by name, not by file descriptor. Pretty cool.

tail -F application.log

Now if only I could get the follow built into less to do the same.

Broken Home button on iPhone fixed!

A few months ago I was half asleep, lying in bed one morning, listening to my son playing when I heard a plonk! Immediately parent’s intuition told me that Plonk! was my iPhone getting thrown into the toilet. We all struggle with our phones at times and consider different ways we could punish them, but this was taking it a little too far. I rushed into my en-suite and there was my phone under water😦.

I took it out and turned it off and I cursed Apple yet again – like hell, why cant I take the battery out of this stupid phone until it dries? Anyway – with luck over a day it mostly dried out and worked quite well except for 2 things

  • The built in speaker seemed dead
  • The home button was marginal at best

The speaker wasn’t such an issue as a headset worked. After a few more days of drying, suddenly the speaker started going again, which only left the button.

It’s strange, some days it works just fine, others I have to press it hard maybe 10 times to have it register. It might be humidity related, but I’m not sure and the issue seems to be getting worse.

Over the last few days it’s been terrible to the point where I was considering burning the rest of my money left on the contract and buying a new phone (perhaps I could have let my son through it back in the toilet), when I came across the AssistiveTouch setting (under General Settings -> Accessibility). This was added with iOS 5 apparently. It adds a home button (among others) to the touch screen. Now when the real button is playing up I can hit this one. I’m so happy! I am back to my normal state again, where Apple products make me happy despite my deep misgivings regarding the company. I think this ad pretty sums it up.

Mars Bar Slice – Sooo good!

Off topic I know but…. Today was my nephew’s Christening and I had to bring a slice. I only know how to make two slices (I had to learn for the Mothers day breakfast at my kids school) and so I went with Mars Bar slice as it’s just so good. I have an old recipe which I was going to post, but I found this which was the same.

For some reason, every time I try to do it, the melted chocolate always changes consistency as I’m about to take it off the heat, no matter it works and it is way too nice.

So, if you are ever stuck for a slice, and you have a sweet tooth – look no further. Whats more, it would be hard to imagine an easier slice to make🙂

I would post pictures, but I was a bit too slow, besides – google has lots!

Strange requests in my Apache access logs.

This morning logwatch on my web server showed a disturbing message

A total of 1 possible successful probes were detected (the following URLs
contain strings that match one or more of a listing of strings that
indicate a possible exploit):
  null HTTP Response 200 

and a number of strange requests from the access.log. Most requests look like this

66.249.71.14 - - [10/Feb/2012:19:09:33 +1100] "GET /css/style.css HTTP/1.1" 200
 4164 "http://www.hypertechsolutions.com.au/" "Mozilla/5.0 (compatible;
 Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.14 - - [10/Feb/2012:19:09:38 +1100] "GET /images/header.png HTTP/1.1"
 200 27472 "http://www.hypertechsolutions.com.au/" "Mozilla/5.0 (compatible; 
Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.14 - - [10/Feb/2012:19:09:38 +1100] "GET /images/nav.png HTTP/1.1" 200
 1318 "http://www.hypertechsolutions.com.au/" "Mozilla/5.0 (compatible;
 Googlebot/2.1; +http://www.google.com/bot.html)"

yet I am seeing increasing numbers of requests such as these

77.108.40.90 - - [10/Feb/2012:20:32:10 +1100] "K\xa1V\\\x8b\x063\xc7\xc2Xt\x96
 \xf2s\xca\x18\xa0\xef\xbb\x0f;\"g\xbe\xcc\xd3\x9aL\xbe@\xca\t\xff\xd5\xc8\xf4
 \xb2\x0e\xbb>\xca\xab\xf9\x9d&4\xf6\xfa" 400 321 "-" "-"
84.101.111.88 - - [10/Feb/2012:20:32:39 +1100] "v\xfa\xfd\xe8\x06\xb6z2\xe2\xc
 9R\xd9) V\x8cuF\xb7J\xd9\x19\x18\x81\xbe3\xd8P`\xa9xY5:.\x0c\x05\xa6" 400 526
 "-" "-"

These don’t bother me overly as they are being rejected with an error code 400. But In the last three days two have been accepted with a code of 200!

166.147.118.185 - - [07/Feb/2012:19:57:58 +1100] "\xf4?\x84\xcd\x8a\x8f\xc7\xc
 5\xe3\x1doL\x9c6\xd2\x16_\xec\xa5cK\t.\xce\x7f\xcf\xf7s:" 200 3794 "-" "-"
113.28.20.236 - - [09/Feb/2012:13:52:17 +1100] "\xa0\x91n$\xd52\xb0{0AuF\xbd\x
 0f\xb3tK\v\x11\x9eB:" 200 3794 "-" "-"

The log watch error was referring to one of these. Looking back two days ago, I saw the same message in my log watch emails. Lesson: I really should read my log watch emails carefully.

After much Googling (is that even a word?), I found another person who noticed the same issue. Then I found this note on blocking such requests.

This article suggests a mod_security rule. Deploying mod_security is not so well documented on Ubuntu 11.10, so I will post about this once I clean up my notes and build a clean machine to test them on. As it turned out, it seems the default rules in mod_security seem to already block these attempts. Being obsessive, I have also created the suggested rule.

I have found a way to replay the old attacks using NetCat

echo -e "\xf4?\x84\xcd\x8a\x8f\xc7\xc5\xe3\x1doL\x9c6\xd2\x16_\xec\xa5cK\
 t.\xce\x7f\xcf\xf7s:" | nc localhost 80

You need to use the -e option of echo to have the translation performed on the characters. I still have no idea what that request actually does, but at least now it’s being blocked. I will watch my logs really carefully over the next few days to make sure all appears good and post back if there are any issues.

Securing SSH with Fail2Ban on Ubuntu

As posted earlier, I am running SSHD on Ubuntu 11.10 to provide an SFTP server for my clients.

I get a reasonable number of unauthorized attempts to access this service as shown below.

cat /var/log/auth.log* | grep 'NOUSER' | grep sshd | awk '{print $1,$2}' | sort -k 1,1M -k 2n | uniq -c 
    326 Jan 29
   1989 Jan 30
    366 Jan 31
    451 Feb 1
    384 Feb 2
   1902 Feb 3
    689 Feb 4
    204 Feb 5
    655 Feb 6
   3829 Feb 7
    181 Feb 8
     73 Feb 9

While none of these have got through, I am a little sensitive given my past experience with SSH. So I decided to bite the bullet and deploy Fail2Ban for SSH on that host. I had put off doing this as I figured it was a huge undertaking, but it was only actually a 15 minute job from starting to google to being protected, so there is no excuse to put it off.

Fail2Ban monitors log files, matching suspect patterns that imply unauthorised access attempts are occurring. On seeing multiple attempts in a short period (10 minutes by default), Fail2Ban will reconfigure the firewall to block that host for a short period (also 10 minutes by default). This will limit block brute force SSH attacks from a single host.

Iptables was already deployed and running with no rules on my Ubuntu server. If you have any SSH rules in your iptables, you will want to remove them until you understand how Fail2Ban plays with the rules to do its magic.

Before installing fail2ban my iptables were empty (I generally apply my firewall rules in my firewall, not on the host):

sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination      

Installing Fail2Ban was very simple:

sudo apt-get install fail2ban

Looking in the config files (/etc/fail2ban), SSH was enabled by default and was the only service enabled – Exactly what I wanted. At this point it was running and I was protected.

Disappointingly the Ubuntu release is a release behind the official package at Fail2Ban.org. I have gone with the Ubuntu package as it will be updated automatically (eventually!) and reading the release notes, I can live without the latest features.

Reviewing my iptables, it had inserted a new empty chain:

sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
fail2ban-ssh  tcp  --  anywhere             anywhere            multiport dports ssh 

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain fail2ban-ssh (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere

If I force a ban by performing multiple failed SSH logins, then I see this get updated to

sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
fail2ban-ssh  tcp  --  anywhere             anywhere            multiport dports ssh 

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain fail2ban-ssh (1 references)
target     prot opt source               destination         
DROP       all  --  Helens-iMac          anywhere            
RETURN     all  --  anywhere             anywhere    

I will run with this configuration for a few days and see if I need to mess with the matching patterns or if this blocks the attempts well enough. I will also have to monitor it to make sure that I don’t block my real customers from accessing the server too🙂.

My next task will be to deploy this to my web server.

It would be really pretty to update the firewall rules in my firewall as opposed to on the host. But that is a game for another day. Maybe it’s better to KISS.

Scripting Telnet or not… Netcat.

I found myself needing to script a Telnet session to run a command on an application that only accepted commands via a TCP socket. The app came with a GUI, however the GUI was kludgy at best and was not easily scriptable. I was able to sniff out the protocol using wireshark.

My first attempt was to redirect a series of echo commands to telnet like this.

(echo "command number 1"; echo "command number 2"; echo "command number 3" ) | telnet somehost 9200

But I found that I needed to include a delay before I could start echoing as the system would only accept commands once it had finished sending it’s banner.

(sleep 1; echo "command number 1"; echo "command number 2"; echo "command number 3" ) | telnet somehost 9200

I knew I really should be using expect but after 10 minutes looking at it, I was still no closer convincing expect to work. Then I came across a really cool utility nc (netcat) which was a straight replacement for telnet in the above script. It was smart enough to no longer need the pause which was a good thing as I had to run the same command across multiple instances of this app, and waiting for a second on each was getting annoying.

(echo "command number 1"; echo "command number 2"; echo "command number 3" ) | nc somehost 9200

Note: Netcat does no translation of CR to CR + LF so you need to careful if you are talking to a picky system that you are sending the expected EOL sequence.

Netcat will will do a whole lot more cool IP stuff (like doing the same as above but on UDP, port scanning, port forwarding and more – see the wikipedia page linked above) and is a really cool tool to have.

Tae Kwon Do

I’ve been taking my older boys to Tae Kwon Do for about a year and have been getting itchy to return (I had a misspent youth doing way too much TKD). So, after a break of close to 20 years and also completely ignoring my mate Johnny’s advice to at least watch a few Bruce Lee and Jean-Claude Van Damme movies beforehand, I went back for my first training session last night. Let’s just say that I think I know how a barbie doll feels after someone has tried to rip its legs off and it gets wiggly in the hips!

It was great fun, however all the moments of magic reliving my youth were dashed as I walked past a mirror and thought who is that old bloke walking up to me in a uniform🙂. It went a lot better without the greying hair. Oh yah, and the white belt seemed almost a little silly.

Towards the end the instructor came to speak with me, apparently he enjoys having me around as I’m “so old style”, I remind him of the good old days! Oh well.

Ubuntu Server Setup

As posted earlier, I have started using a separate Ubuntu Server 11.10 instance for each of my external facing services. Having a number of services has led to need a number of Ubuntu servers with a similar setup. I use a base Ubuntu 11.10 server VM, which I then customise for each service.

Here are my notes on the common customisations.

Copy the base ubuntu server image log

Perform update : sudo apt-get update && sudo apt-get upgrade

Update the hostname to include domain name in /etc/hostname

Fix /etc/resolve.conf to only reference the name server

Update the /etc/network/interfaces to use a static address e.g.

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static 
address 192.168.2.4
netmask 255.255.255.0
network 192.168.2.0
broadcast 192.168.2.255
gateway 192.168.2.1

Install vmware tools:

sudo apt-get install build-essential linux-headers-`uname -r` psmisc
select install vmware tools from vmware fusion
mkdir /mnt/cdrom 
mount /dev/cdrom /mnt/cdrom
cp /mnt/cdrom/VMwareTools*.tar.gz /tmp
cd /tmp
tar -zxf VMwareTools*.tar.gz
cd vmware-tools-distrib
sudo ./vmware-install.pl
sudo reboot

Set up sshd to only accept key based logins.

(if you need to create keys you should use ssh-keygen -t rsa -b 4096 which will write them to ~/.ssh)
copy keys into ~/.ssh/authorized_keys

edit /etc/ssh/sshd_config
PasswordAuthentication no
UsePAM no

sudo service ssh restart

Do a netstat -a and port scan to make sure you are not listening on any unexpected ports!

Install and configure log watch to get emailed a summary of the previous days activities.

sudo apt-get install logwatch

sudo vi /usr/share/logwatch/default.conf/logwatch.conf
Output = mail
Format = html
MailTo = you@somewhere.com
sudo /usr/sbin/logwatch –mailto you@somewhere.com
sudo vi /etc/cron.daily/00logwatch
/usr/sbin/logwatch –mailto you@somewhere.com

Set up unattend-installs.
For me it was installed due to an initial install option (if not sudo apt-get install unattended-upgrades). But need to expand a little.

sudo vi /etc/apt/apt.conf.d/50unattended-upgrades

// Automatically upgrade packages from these (origin, archive) pairs
Unattended-Upgrade::Allowed-Origins {
“${distro_id} ${distro_codename}-security”;
“${distro_id} ${distro_codename}-updates”;
// “${distro_id} ${distro_codename}-proposed”;
// “${distro_id} ${distro_codename}-backports”;
};

Unattended-Upgrades::MinimalSteps “true”;
Unattended-Upgrade::Mail “you@somewhere.com”;
Unattended-Upgrade::Automatic-Reboot “true”;

add the /etc/apt/apt.conf.d/20auto-upgrades file:
sudo bash -c ‘echo -e “APT::Periodic::Update-Package-Lists “1”;\nAPT::Periodic::Unattended-Upgrade “1”;” >> /etc/apt/apt.conf.d/20auto-upgrades’

install bsd-mailx as a bridge between postfix and /usr/bin/sendmail:
sudo apt-get install bsd-mailx

force a downgrade of (say) binutils to test:
sudo apt-get install binutils=2.21.53.20110810-0ubuntu3

now run a manual upgrade:
sudo unattended-upgrade -d

all should be good and make sure email really arrived.

Now re-downgrade the binutils and ensure that the scheduled job runs sometime after 6 am tomorrow (or later today if you are crazy like me and tend to be doing such thinks after midnight.)

Finding slow databse operations from application logs

A good number of the apps I write involve a database. In some cases, these applications don’t run as well I had expected. Were I a database wiz kid, I would work my database magic and find which queries are slowing down my apps, but I’m not. Regardless, I don’t generally have access to my customers databases after install to run any such required magic.

I have recently been investigating the performance of a reporting application. It produces a daily report which runs a set of queries per user, per login session. When the application was created there was an expectation that the number of users would be up to around 500. Roll forward a few years and there is a site with 5,000 users and the daily reports are not arriving for the start of the business day. Even worse (or better depending on your point of view :)) a customer with 24,000 users is preparing to deploy the application. While I could rewrite the application to query the database more efficiently, I would prefer not to as

  • It currently operates correctly, though just inefficiently
  • The application logic is very clear and simple with the current queries
  • There are costs of rewriting the application in terms of time, money and risk

So instead I will try to improve the performance of the queries.

You can generally get a good idea of the very slow running queries by eyeballing your application’s log files. With decent logging around your database operations, it’s obvious if a query is taking a long time to run. But what about more subtly slow, yet frequently called queries. While you could go nuts and index every field you can think of, this impacts your performance at the other end when you are inserting data (which is a real concern in a real time system such as the one under investigation) and wastes database resources. In my limited experience, you are better off spotting the slower queries from your application logs (which a customer can ship without engaging their DBA team).

What you need to look for is occuring at the end of each (say) second. By tracking this you get a granular breakdown of the queries (or other processing) that are delaying your application. You have to consider how often each query is running over the second boundary, after all SOMETHING has to be running as the second ticks over, but if it is consistently one of a small number of queries (or other activities), then these would be good candidates for indexing or rewriting.

In the case of this app, each line of the report starts with a time stamp 8 characters long. In a text editor I could read through the file, looking at which query was running at the end of each second. Take the example below (note the lines have been truncated here):

12:19:12 Perf1_1_60 01  115045  235959       0       0       0       0       0  
12:19:12 db_get_agent_productivity_aht: [select count(*), CAST(AVG(TALK_TIME) as
12:19:12 db_get_agent_productivity_aht: [select sum(IDLE_LENGTH) from IDLE where 
12:19:13 Perf1_1_60 01  115045  235959       0       0       0       0       0  
12:19:13 loaded Perf1_1_60 (2267)

The following query [select sum(IDLE_LENGTH) from IDLE where END_TIME >= 1327539045 and END_TIME <= 1327582799 and AGENT_DBID = '2266';] was running over the second boundary. Given I would need to execute this query some 24000 times I need to execute quickly. What I need to do is go through the whole log file (in this case some 40MB of logs) and see what is running at the end of each second.

Now to the point of this post, doing this manually is laborious to day the least. Thankfully awk comes to the rescue.

awk "// {time = $1;  if (time != oldtime) { oldtime = time; print last_line} else {last_line = substr($0,10,60)}}" example.log

In my case this will output

db_get_agent_productivity_aht: [select count(*), CAST(AVG(TA
db_get_initial_login_state: [SELECT STATUS FROM STAFFED WHER
db_get_agent_productivity_aht: [select count(*), CAST(AVG(TA

SNIP

db_get_login_times: [SELECT STATUS, TIME FROM STAFFED WHERE
db_get_login_times: [SELECT STATUS, TIME FROM STAFFED WHERE
db_get_agent_productivity_aht: [select count(*), CAST(AVG(TA
db_get_agent_productivity_aht: [select count(*), CAST(AVG(TA

I have chosen the substring parameters to cut off the timestamp and give a long enough line so that it is clear to me which query is being run, yet short enough that each instance will be the same (there is a where clause in the queries towards the end of the line).

I can then pass that through sort and uniq to get a line count for each

awk "// {time = $1;  if (time != oldtime) { oldtime = time; print last_line} else {last_line = substr($0,10,60)}}" example.log | sort | uniq -c

giving

      1
      3 db_get_agent_productivity_aht: [select CALL_TYPE, count(*),
      6 db_get_agent_productivity_aht: [select count(*), CAST(AVG(TA
     17 db_get_initial_login_state: [SELECT STATUS FROM STAFFED WHER
    334 db_get_login_times: [SELECT STATUS, TIME FROM STAFFED WHERE
    724 db_get_agent_productivity_aht: [select sum(IDLE_LENGTH) from

In this case I went back to my logs, found the full query that corresponded to the last two entries and applied some indexing to the relevant tables and indexes. On my data set, this improved by query time from 20 minutes to a little under 4 minutes. After loading the extra indexes, a stress test ensured the real time aspect (data loading) components of the solution were not impacted to any measurable degree. I think the problem is solved and it’s time to deploy to a real customer and measure the real world performance increase. Best still, it did not involve a change to the application.

Happy Australia Day, and happy birthday to my Sister-In-Law too!

Julie and Julia

I’m not sure if I am am just being a little sensitive but my wife is watching Julie and Julia. Julie and Eric just had a fight about Julie being “a little narcissistic” in writing a blog and how it’s taking over her life. I’m sure it’s just a coincidence🙂.

Despite it being a chick flick and all, I actually really enjoyed the movie.

Follow

Get every new post delivered to your Inbox.