Wednesday, December 21, 2011

What do we actually know about software development?

I recently saw this video on Vimeo; to say it blew my mind would be an understatement:

In it, Greg Wilson makes a compelling argument that we have a responsibility to seriously question the efficacy of the software engineering practices we employ. For example, do we know code reviews will really lead to better code, or are we just trusting our gut? Is pair programming effective?  Does refactoring really matter at all?   Is design paradigm "blah" effective, or is it really an untested, unproven method with no empirical basis whatsoever?  How much of the things we do are dogma, and how much have been scientifically proven to deliver results?

Painful admission: I'd never seriously considered any of this.  Many of these fads--such as unit testing or code reviews or agile or INSERT FAD HERE--are so beaten into developers that we never stop to question the basic foundation.  So as a personal exercise, I decided to seriously dig into one of them to see what I learned: test driven development. I picked this just as some arbitrary starting point, really, based on some reddit comments.

Test driven development is "...a software development process that relies on the repetition of a very short development cycle: first the developer writes a failing automated test case that defines a desired improvement or new function, then produces code to pass that test and finally refactors the new code to acceptable standards."  In other words, I start developing by writing tests that fail.  I produce code that passes these tests.  This idea seems like a good hypothesis, so I found two heavily cited scholarly articles that supported TDD and scrutinized their findings.

The first, Realizing quality improvements through test driven development: results and experiences of four industrial teams, is an empirical study cited by 40 authors according to google scholar.  The study tracked four different teams at two different companies and compared "defect density" with a "comparable team in the organization not using TDD."  Obviously "comparable team" is ambiguous, so in their "Threats to Validity" section, they mention:
The projects developed using TDD might have been easier to develop, as there can never be an accurate equal comparison between two projects except in a controlled case study. In our case studies, we alleviated this concern to some degree by the fact that these systems were compared within the same organization (with the same higher-level manager and sub-culture). Therefore, the complexity of the TDD and non-TDD projects are comparable.
First, I do not buy that "the complexity of the TDD and non-TDD projects are comparable" because "these systems were compared within the same organization (with the same higher-level manager and sub-culture)." I know this to be true within my own organization; I work on what I'd describe as a very complicated product (computer vision, lots of networking, embedded development, lots of compression, low-latency streaming, etc.), while I have coworkers in the same organization (with the same higher-level manager) who literally write software with a small subset of the requirements of our group; the difference in complexity is gigantic. By not providing specifics of the comparable projects, it's hard to take the findings seriously.  It's also clear--from total lines of code and team size comparison--that it is unlikely these are "comparable projects."  They also don't list the experience breakdown of team members for the comparable projects.

Second, none of these results are statistically significant, and the authors acknowledge that.  It's an empirical study--not an experiment--so at best they can only infer correlations between TDD and "defect density."

Third, they include figures on "Increase in time taken to code the feature due to TDD," and use management estimates as a source of data.  Seriously?  Given what we know about estimating things, is this really a valid method?

Lastly, how do they conclude that TDD was solely responsible for the improvement?  Did the teams have other differences, such as doing code reviews?  Were programmers working together, or were they geographically dispersed?  How did management affect the various projects?  What other factors could have influenced their results?  They allude to some of this obliquely in their "threats" section;  none of it stops them from recommending best practices for TDD in their discussion section.

The second paper, published in 2008--Assessing Test-Driven Development at IBM--has ~134 citations in google scholar. I found this to be a more interesting comparison: two software teams implementing the same specification, one team using a "Legacy" method for software development and the other (admittedly future) team using TDD.   It's still not an experiment since it wasn't controlled, but the final deliverable is more comparable: the same "point of sale" specification by IBM.

Their findings are summarized in two graphs; the top is legacy, and the bottom is the newer project using TDD:

At first glance, these look promising; we see that the TDD project had a much lower "defect rate per thousand lines of code (KLOC)" than the legacy project.  We also see the TDD project had better forecasting of their defect rates.

But on closer inspection, I have to seriously question these results.  In terms of absolute line count, the Legacy solution appears to be about ~11.5 KLOC (80 total defects / (7.0 defects/KLOC)) verses ~67 KLOC for the TDD version (247 total defects / ( 3.7 defects/KLOC)). In other words, from an absolute standpoint there were one-third as many defects on the legacy system and one-sixth the total line count. So I'm struggling to understand how the TDD team ended up with a massive pile of code, and what that cost them in terms of schedule/productivity/maintainability/performance, and how they justify "six times as much code and three times the defect count compared to legacy which purportedly does the same thing!" as being a positive result. I'm open to the possibility I'm misinterpreting these graphs, but if I'm not, the authors deserve a scientific keel-hauling.

There's no comparison of development time. No mention of how successful either product actually was in production. No comparison of productivity between the two teams, only a reference to the productivity of the TDD team.  No acknowledgement that other factors besides TDD might have been responsible for their findings, since again this is not a controlled experiment.  And none of this prevents them from presenting best-practices for TDD in their results section, as though two misleading graphs is a good proxy for a well-executed experiment and statistical significance.   Frankly, this source was enormously disappointing to me.  The discrepancies in KLOC and absolute defect rates are significant enough that I'm struggling to understand how this paper was A) published and B) cited by 134 other authors.

In my opinion, neither of these papers establish that TDD is an effective practice.  Of course, neither of them preclude it from being effective either, so the jury's still out.  And it's entirely possible I've made an error interpreting these findings; I graciously welcome your peer review, dear reader.  In any event, I think Greg Wilson's point stands strong: what are our standards of truth? Why do we do what we think works as opposed to what we've empirically shown to work?

Thursday, July 14, 2011

Who Really Wants a Digital Home?

Let me start by saying I'm not a technophobe. I work in video surveillance, I write code in multiple languages, I cross-compile C++ code, I browse various open source projects in my spare time, I like python's structured use of white space, etc. I'm a nerd, and an unabashed one at that.

But let me be the first to say: I do not want a "digital home." Yep, that's right. I don't want a digital home.  

Let me back up and put some context behind this. In maybe 2005ish, I was asked to head up to Seattle to interface our surveillance product with the products of a home automation/digital home company. Their office was in a drab commercial area of Seattle that presumably fed off the table scraps of larger tech companies in the area; it was one of the dirtiest offices I'd ever seen. Their key product was a little wall-mounted device with an LCD touch screen that would control lighting, HVAC, audio, etc., and they wanted me to make it work with our surveillance system.

I was given a tour by a stocky fellow with a thick eastern European (?) accent who oversaw their engineering. He proudly demonstrated the usage of the wall-mounted device. The whole demo went something like so:

HIM: "So, if you want to turn on the lights, you just do this..."

(turns towards the device, stares directly at it, flicks through a menu or two, hits a button, and a bunch of lights come on)

ME: "Huh....neat."

HIM: "If you want music, you can do this..."

(turns back towards the device, stares directly at it, punching keys, and after some completely boggling transitions through menus we hear music)

Me: "hrrrm..."

HIM: "If you want to control HVAC..."

(turns back towards the device, stares at device, browses unintelligible menus...) get the idea.

While I was watching this, it occurred to me how completely wrong this product was on so many basic levels.  I imagined myself installing this system in my home, trying to explain it to my wife, having her be frustrated learning how to use it, having her be irritated that turning on the lights was suddenly a pain in the ass, me doing tech support when it's busted so my wife can turn on lights, her making me put the wall switch back, and so on.  It's a cool idea...if you're a nerd.

For the rest of the non-nerd world?  Here's how they want to turn on the lights:

...the simplicity is stunning.  The reliability?  Undeniable.  The lack of training required?  Breathtaking.  The familiarity of the interface, the ability to "feel" around a corner and flip on a light (instead of having to stare at a tiny screen on the wall), the speed at which you can turn on the lights, the lovely tactile response of a solid mechanical switch?  All very refined.

I think much of this nuance is often lost on my fellow techies--after all, we don't fear new technology.  To the contrary, we often bask in it, marveling at the possibilities it may bring.  But the truth of the matter is these companies are attempting to replace technology that has been used and refined over the last 100 years.  When my wife wants to turn on a light, she doesn't want to deal with weird technology or navigate menus or stare at a tiny screen on the wall--she wants to turn on the lights.  When a guest is using my restroom, I don't want to explain how to turn on the light by the toilet.  Lord knows I don't want to do IT work at my own home--as it is, I hate servicing people's computers.  Replacing a light switch with a touch screen sounds like a great idea, but in reality it's as misguided and ridiculous as replacing the steering wheel of a car with an iPhone app.

All of this comes full circle to the idea of a "digital home."  What does that even mean, a digital home?  Does it mean replacing technology willy-nilly with stuff we think is cool, or actually attempting to improve the ergonomics, efficiency and useability of our homes?

Companies hawking these products invariably see their role as purveyors of a better solution to "antiquated" technology that's been in use for the last few decades.  But until their products complement--not replace--the ergonomics and efficiency of our home in a reliable manner, they'll ultimately fail to have significant market traction.  This is why the term "digital home" itself is hopelessly misguided; the term itself really implies something nobody wants.  We're pretty happy with our fuddy-duddy analog homes, thank you very much.

I believe companies that truly leave their mark in this market won't replace conventional technology and interfaces like the light switch.  They'll be known for complementing the technology with smart solutions that solve problems people actually have.  For example,
  • Can we find ways of remotely controlling lighting?
  • Can we find ways to turn off lights when they're not necessary, thus saving energy?
  • Can we find ways to reduce the amount of copper wiring a home needs, thus driving down construction costs?  Or make it easier to arbitrarily add outlets and lighting to a home or business without tearing down walls or hiring electricians?
  • Can we find ways to delight people, such as automatically turning on porch or back lights when they come home, or automatically turning lights on and off as we go room to room?  Or letting me turn out lights I forgot while I was on vacation?  Or turn out lights from my iPhone because I'm in bed and I want want to pass through the house one last time?
The key difference is: the goal is to compliment our homes--not replace them with silly digital interfaces.

Tuesday, May 10, 2011

Unroll that loop, homeboy

A month or so ago I was wrestling with a pretty thorny algorithmic problem at work.  Eventually I worked out a solution that worked great on our Windows emulator, so I figured my job was done...was I ever wrong.

Once I got it running on our embedded processor (a lowly ARM926EJ-S at ~270 MHz), I quickly found my algorithm had some serious performance issues.  So, I fired up oProfile and did some profiling so I could see where I was eating cycles.  There were a couple hotspots that I went after, but one was a loop that looked like so:

UINT64 currentRow;

for(int row = 0; row < NUMBER_OF_ROWS; row++)
  currentRow = rows[row];
  for(int column = 0; column < NUMBER_OF_COLUMNS; column++)
    if( (currentRow >> column ) & 1 )  // Check if this bit is set

Nested loops...yuck.  This code ran for every column and row, shifting the current row over to check and see if the bit was set. To make matters worse, the row was represented with a 64 bit value, which costs extra to shift.  Even worse, this wasn't work I could figure out a way of avoiding entirely--enumerating over all the values was really a core part of the algorithm.

So, I decided to see if I could at least speed up the routine by optimizing. First, I unrolled the loop.  As wiki states, the goal of loop unrolling is to increase a program's speed by reducing (or eliminating) instructions that control the loop, such as pointer arithmetic and "end of loop" tests on each iteration.

So instead of the inner loop iterating through all of the columns, it gets replaced with a gigantic block of if-statements:

UINT64 currentRow;


if( currentRow & 0x1 ) //first column
if( currentRow & 0x2 ) // second column


if( currentRow & 0x8000000000000000 ) // 64th column!

This removes all of the overhead of the for loop--no more incrementing the counter, and no more testing to see if we're at the end of the loop.  It also avoids an expensive shift operation to test each column value.

But we can do even better!  One issue is currentRow is a 64 bit value, but we're on a 32 bit processor.  Each 64 bit AND takes multiple instructions, whereas a normal 32 bit AND is a single instruction on most platforms.

So, to speed it up even more, you can break the 64 bit value into its respective halves and test them individually--so instead of 64 64-bit AND operations, it now performs 2 shifts and 64 32-bit AND operations. Another extra bonus is you only need to code 32 constants into your code now:

UINT64 currentRow;

UINT32 val = (UINT32) (currentRow & 0x00000000FFFFFFFF);
if( val & 0x1 ) //first column
if( val & 0x2 ) // second column


val = currentRow >> 32;
if( val & 0x1 ) //first column
if( val & 0x2 ) // second column


I'm sure I could have written the code in assembler for further improvement, but my code would no longer be portable.  Furthermore, in C these improvements will apply to almost every target platform.  Admittedly my solution will be slightly slower than it could be on a 64-bit platform where a 64-bit shift operation would be single-instruction, but not enough to matter.

The real question, though, is what sort of performance gains did I reap? After measuring again in oProfile, I was spending one-fourth the time in the function that I previously did, and the work definitely helped me get the algorithm to fit on a lowly 270 mhz processor.