Worst Professional Blunders…

Here’s a litt story of how things shouldn’t be (first posted at Codeguru).

I was once asked to fix this totally non-critical feature in a very critical application. The argument was “we really don’t need it, but since it’s documented you may as well make it work”.

Bug #1 (my fault):
I added:

i = -1;

where I should have added:

i -= 1;

Bug #2 (my boss’ fault):
Due to “too much to do” at the QA department I was told to test the application myself. My answer was “I’m not sure that’s a clever decision. There is a good reason why we got a QA department, and there is probably a good reason why they’ve got too much to do…”. Anyway, I was told to “think” like they did down at QA, and to do my best… I didn’t find any bugs.

Bug #3 (the techies fault):
When done testing the application I went to the guy responsible for the software releases. He had the final word on any software that was released into the production environment. They had ofcourse strict rules on all software releases, planning, scheduling, avoiding peak hours etc, etc.. But apparently that didn’t apply this day. The software went straight into production (at peak hour).

Bug #4 (operational departments fault):
It took about 2.4 seconds before the bug manifested itself and the CPU went straight to 100%. The first thing to be knocked out was ofcourse the software based KVM. Now we were blinded and in an early state of panic. The server then stopped resposning to network requests, with one exception. It was still sending ‘OK’ heartbeats to it’s $100.000 passive failover server, so the failover server did nothing. In a higher state of panic they reasoned that “we need to pull the plug to terminate those heartbeats”. So they did. All went silent.

After a couple of minutes they found the real reason why the active/passive failover solution didn’t work. After a hardware test a couple of months earlier they never performed the manual failback routine, so we had been running on the failover server all the time while the main server was totally shut down.

Alot of phones went dead that day… but luckily I wasn’t blamed ;)

Leave a comment...

Powered by WordPress. Entries (RSS) and Comments (RSS).