Happy Friday everyone!
I’d like to thank my dear friend
Brilliantly presented in this YouTube video:
So, today I ask you, what was your most memorable onosecond?
Happy Friday everyone!
I’d like to thank my dear friend
Brilliantly presented in this YouTube video:
So, today I ask you, what was your most memorable onosecond?
First off...that’s quite the collection he has sitting on the background of this video.
Back to topic, I’ve had several onoseconds and they still occur, though more often than not they are false alarms. Where I thought I did something wrong, went back and looked...no, I’m good...I did it right. Possibly the worse though was when I helped manage a website for my former employer.
Sit down kids, it’s story time…
I worked for a real estate company here, and the website contained not only the back off utilities that our real estate agents used for managing and maintaining listings, submitting listings to local newspapers and such, but also for the public website that everyone used to view houses for sale, rent, etc. We had a system in place where an agent could create a shortened name for a listing that would redirect to the actual listing. For instance, a typical listing URL would be something like website.com/listings/listing.cfm?id=20201202091305 where the long, randomish number was the MLS (Multiple Listing Service) ID. Obviously, not very user friendly, so we created a system where users could create an aliased url like website.com/listings/123parkwaydrive that would redirect to the long URL. Unfortunately, we had not developed a system for users to delete this alias, nor have to automatically deleted with the listing ID was removed from the SQL database.
So I built one. And like all good systems, I did most of my editing in production. I wasn’t really a Cold Fusion developer, but was learning. In fact, none of us were, but we managed. So early on I did my editing in VIM, not yet learning the way of NotePad++. I eventually got Dreamweaver for this, but regardless, I’m not sure I would have avoided the mistake either way. When I wrote my cleanup utility, I would perform a cfdirectory (that is, give me a list of all of the files and folders in a given directory). When I look at the SQL database, if I see a matching folder path in the database that corresponds with an active listing, leave it alone. If I don’t see a matching path, assume that the folder path is for an old listing that no longer exists, so issue a cfdelete command to delete the contents of the folder and the folder itself.
I wrote my utility, and then ran it. Except….it was taking a long time to respond. Several long seconds later, in confusion, I reloaded the page to run it a second time. After a longer period of waiting, I F5’d again, but this time was met with a “The page cannot be displayed” message. “Oh No!” View the website as a whole….same result, “The page cannot be displayed”.
Perhaps you might see where this is going. As you may or may not know , mistakes seem to happen in two possible ways. Either they fail really fast where you’re blindsided by the result very quickly left wondering what happened, or the fail really slowly, where the system grinds away for a while, not necessarily letting you know what is happening, making you wait and wonder, wonder and wait, and giving you ample time to worry.
Mine….the latter, which in itself was a rather spectacular failure. I made a critical mistake in my cfdirectory command. Instead of telling it to go to the /listings directory, I accidentally went up one level too high, that is, the root of the website. My query ran really fast actually...it looked at the root directory, didn’t see a corresponding entry in the database table, and dutifully executed the cfdelete command across all contents at that level, that is, delete the entirety of the root folder. All of the files. Web pages. Folders. Images. Everything.
Well, not entirely true, the SQL database remained. But that’s of no help without corresponding pages to reference it. And of course, when did this happen. During the lunch hour. You know, one of the peak times when people sitting a their computers are likely browsing real estate listings, looking for their next home while consuming a sandwich. Except they were going to have to go look somewhere else, because there was no website to browse.
As an upside, when we took over the website from the contractor we had, we realized it was very poorly put together with zero redundancy. We added some redundancy. That is, this was running on Windows Server, and we decided that creating a DFS replication link from the datacenter where the website ran to our own server room over the VPN was a good idea. I’m not sure if it was so we could take backups in our server room, or something else. Also, this was a really large website. Thousands of real estate listings. Upwards of 20 or 30 pictures per listing, but we kept three copies of each image - a small thumbnail, a regular sized image, and a full scale image to zoom into. This meant we could have anywhere from 60 - 90 images per listing making for a lot of files. And suddenly deleting the entire website, that’s a lot of data to sync via DFS. We quickly were able to disable DFS replication. Copy the website back from our main office back to the datacenter. Well, copy the folder structure and files, sans images, back to the datacenter quickly.
Then rerun our download script. See, the MLS hosts all of the listings. It’s rather cryptic, but the point is that every few hours, we’d run a script to check for new listings from the MLS. Any new listings it found would be downloaded in imported into our database. Any images would be downloaded, copied, resized, copied, and resized for the three copies for each listing. Restoring the site didn’t take a long time, so at least there was something responding to folks on the web. And then we just wait for the script to run, to download, and update each listing with the proper images.
Total website downtime? Well, about 45 minutes I suppose. Remaining downtime for the images to show back up? Probably a couple hours.
Now certainly this would be considered an RGE - Resume Generating Event. Better go look for a new job, right? Actually, one of the many things I leaned from my IT Director, the other guy maintaining the website, is mistakes happen. The bigger picture is to learn from your mistakes. And I did certainly learn. And this is something that I’ve kept in mind for my more junior engineers and administrators. Mistakes happen, let’s fix it, and learn from it.
Oh...and make sure you have usable backups of course.
I typed a very long post….found a typo, edited said typo, and resubmitted. Post sent to moderation for approval. No sign of said post…. onosecond passses, anxiety rises. Someone please approve….
Mine was when I first started in IT and we all know disk drives start at zero not one but when a drive at a law firm went bad guess what? I pulled the wrong drive and then spent the weekend in the office to restore it.
Something that taught me a valuable lesson moving through my career.
Mine was when I first started in IT and we all know disk drives start at zero not one but when a drive at a law firm went bad guess what? I pulled the wrong drive and then spent the weekend in the office to restore it.
Something that taught me a valuable lesson moving through my career.
I hate this….depends on what you’re looking at. Drive may might be labeled 1, disk is labeled 0. Or in VMware when trying to correlate a virtual disk at the VM level to the volume inside of the OS. It’s not always vmname.vmdk is disk 0, and vmname_1.vmdk is disk 1. Having to make sure you’re correlating with SCSI ID’s….
Mine was when I first started in IT and we all know disk drives start at zero not one but when a drive at a law firm went bad guess what? I pulled the wrong drive and then spent the weekend in the office to restore it.
Something that taught me a valuable lesson moving through my career.
I hate this….depends on what you’re looking at. Drive may might be labeled 1, disk is labeled 0. Or in VMware when trying to correlate a virtual disk at the VM level to the volume inside of the OS. It’s not always vmname.vmdk is disk 0, and vmname_1.vmdk is disk 1. Having to make sure you’re correlating with SCSI ID’s….
Absolutely but this was a physical server which I learned from.
This refreshing feelig when you issue a recursive delete…. And the remeber that you are in a wrong directory - preferably in the root directory…
This happened to me in the very beginning of my career. I was very glad that the colleagues reacted very understanding.
But this lesson was learned forever - doublecheck the position in the filesystem you are BEFORE issueing the command.
This refreshing feelig when you issue a recursive delete…. And the remeber that you are in a wrong directory - preferably in the root directory…
This happened to me in the very beginning of my career. I was very glad that the colleagues reacted very understanding.
But this lesson was learned forever - doublecheck the position in the filesystem you are BEFORE issueing the command.
This is more or less what I did in my missing post. But it was for a website instead.
Mine was when I first started in the field and we have this application that was used by all the branch offices with the database it used hosted at the main office.
Me, being green didn’t realise that if the application is upgraded, it upgrades the database too. You can see where this is going.
Anyway, late on a Friday one user is having trouble with said app and the App Support recommends installing an update which in turn updates the database.
I go ahead and update the app. A message pops up and says, ‘Upgrading database’. I then had the onosecond moment. User was happy issue was solved but Helpdesk got flooded with calls as everyone else got locked out of the App and needed to be upgraded too before they could access it.
This was back in the day with MPLS connections and bonded ADSL. Mad dash with my colleague to get everyone upgraded.
From that day on, I learnt to check and double check and ask if you are not sure about anything.
This refreshing feelig when you issue a recursive delete…. And the remeber that you are in a wrong directory - preferably in the root directory…
This happened to me in the very beginning of my career. I was very glad that the colleagues reacted very understanding.
But this lesson was learned forever - doublecheck the position in the filesystem you are BEFORE issueing the command.
This is more or less what I did in my missing post. But it was for a website instead.
it's a very common experience….
I typed a very long post….found a typo, edited said typo, and resubmitted. Post sent to moderation for approval. No sign of said post…. onosecond passses, anxiety rises. Someone please approve….
should be there now, Derek!
I typed a very long post….found a typo, edited said typo, and resubmitted. Post sent to moderation for approval. No sign of said post…. onosecond passses, anxiety rises. Someone please approve….
should be there now, Derek!
Thanks
I realise that in the past I also made a mistake that I will never forget .
You all know this behavior : a RDP session in a RDP session…
Well you hear me coming…
I was planning to perform a sysprep on a VM template, but I was not connected to the template but on one of the Hyper-V cluster hosts...
Damn, I cursed a lot at that time and I think that my face was like a tomato.
The good news : there was a small downtime of the VMs running on that host, they all started automatically on other hosts in the cluster and no data-loss .
The bad news : I needed some time to reconfigure the host, join it again to the domain, add it again to the cluster, test, …
A lesson : doublecheck you are connected on the right server and perform hostname in the command prompt before performing such drastic changes...
Thanks
I think my own personal one would be when I had to restore a server without NICs alongside the running virtual server. I, basically, needed some data from an Exchange server that was running on Small Business Server. We had recently performed a cloud migration and the only way to get the data we needed was to restore the server and use Outlook.
When I went to restore the server, I was focusing so hard on making sure the NICs weren’t connected, that I accidentally restored the server in place. The live server was instantly removed and the company lost all work since the last backup, which was about 17 hours earlier.
I said some unrepeatable words and considered a change of pants! The lesson, as always, is: double check things before clicking and get a second pair of eyes if you need to.
Some brilliant stories here!
I’ll add mine to this.
It was towards the beginning of my career, and I noticed some issues with the security software, it wasn’t pushing updates. So I was working on it, got this resolved and it needed to pull new binaries & update files down. It was getting late and the download was slow (hello ADSL), so I drove home and decided I’d remote in to finish it.
I got home, connected remotely using an agent-based help desk solution (Think Teamviewer/LogMeIn style). The binaries are downloaded and I approve the updates to install.
Then the agents all started going offline…
I thought, huh must’ve done an auto reboot that’s weird…
But they didn’t come back.
Then the boss called me that they couldn’t access their machine remotely anymore.
Turns out that when I pushed out the updates, the newer version included its own firewall, which I hadn’t created any policy for, so it default denied all the non-generic Windows applications, including the agent-based help desk solution.
I then had to return to site and disable this firewall until I could create a proper policy, and force the Windows Firewall enabled again!
I 100% used Norton Ghost to image a drive on a professors PC once and while he was yelling at me before I got there he continued to rush me and was quite distracting.
fast forward a few hours. When you ghost an empty drive full of zeros over a drive with 0’s and 1’s. you’re going to have a bad time.
A valuable lesson learnt that day. Not even on the technical part, but if an angry customer wants to vent, stop working and let them vent all day. Proceed to tell them as soon as I can get back to work the faster things can be fixed. They usually bugger off pretty quick.
Now that I’m a SAN, backup, Server guy though most of those days are long over :)
Wow...Norton Ghost. That brings back memories from when I was in college an we imaged our workstations for lab work…and yeah, make sure you have the source and destination correct. Robocopy with the /MIR switch is another way to have a really bad day.
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.