Pull up a chair and let me tell you a story from my early days as a professional programmer. It’s about how I screwed up, and what I’ve done since then to make sure that mistake is never repeated.
I’m going to ramble for a bit but I promise that I’ll get to the point eventually.
One of my first big tasks as a programmer was to update, in preparation for an upgrade, some iMan and Unigraphics code I inherited. For you younger kids out there, iMan was the predecessor of Teamcenter Engineering and Unigraphics later bought IDEAS and became NX.
The code was a mess. Of course, every programmer always thinks that code done by someone else is a mess. But this really was. There were single functions that would have taken two dozen sheets of paper to print out — double sided. My favorite
was a family of functions that instead of returning values or modifying a parameter via a pointer updated fields in a global array. One function would update element 0, another would update element 1, etc. And then other functions would know which element to read. But that’s besides the point.
This code was prone to unrepeatable memory crashes. Memory errors are like that. One function would allocate memory, typically for a string or arrays, and then pass the pointers back to their callers. The callers would be responsible for freeing the memory — unless they passed the allocated pointers to their callers who would then have the responsibility, and so on.
If you’re familiar with this type of code you know that it is error prone. Freed memory is read, allocated memory is allocated again, etc. These types of bugs can be hard to track down. Sometimes there’s a problem in an execution path that’s rarely taken. Or sometimes the pointer will still point to valid data so long as the OS hasn’t seen fit to reuse that space already. Nine times out of ten the code will seem to work fine, and then on the tenth try the OS will actually use that address for something else and the program crashes.
I didn’t care much for this type of code, but it would have taken a massive overhaul to make any substantial change. Being new on the job and new to this code base I was reluctant to make too many changes to it. So we tested the code until we found a problem and then I’d hunt it down and try to fix it, and then we’d repeat the process.
Over and over and over and over. Think, whack-a-mole
.
Eventually we couldn’t produce any more errors so we decided it was finally ready to release.
So we went ahead with the upgrade. And then… (cue dramatic music) …nothing much happened. The upgrade went about as well as upgrades ever do. There were some snafus here and there, but no show stoppers.
So a week later I left for my first PLM World users group conference.
And then all hell broke loose.