Crash and Burn: What Investment Management can learn from the Aerospace Industry

On my way to work a few months back I drove past a software development firm I had never heard of. I was curious, so I found them online and looked at what they did. The firm developed QA automation software specifically for aerospace airborne systems. What struck me as interesting is that this QA Automation tool not only helps to automate, manage, and document testing, but it also helps their customers become compliant, certified, and safe so that they can receive FAA approval. It makes sense; with today’s aircraft that are heavily dependent on avionic hardware and software, it could mean the difference between life and death. For those that might be interested in it, it's called DO-178C, Software Considerations in Airborne Systems and Equipment Certification.  

As a flight instructor in my spare time, all things related to aviation are interesting to me, so I kept digging into this certification process to see if I could relate it to the investment management technology arena. As I read on, I could not help but think to myself, “why does the investment management industry not have similar types of specific software certification when it comes to trading and compliance applications?” Having worked for two software vendors that never had the terms “certified” or “compliant” come up in the design and testing phase. These systems, however, manage and transact trillions of dollars a year for investment managers.

The interesting piece of this certification is that the testing standards are based on the risk impact of the software, called the Design Assurance Level (DAL) which is determined by examining the effect of a failure in the system on the aircraft (which in our world would be the portfolio), crew (the investment management firm), and passengers (the investors or clients). The higher the level, the more rigorous proof of testing needed for certification.

To see what DAL would look like in our world, let's substitute a few key words:

A.  Catastrophic - Failure may cause multiple fatalities (financial losses), usually with loss of the airplane (investor's capital)

B.  Hazardous - Failure has a large impact on safety risk or performance, or reduces the ability of the crew (investment manager) to operate the aircraft (portfolio) due to physical (financial) distress or a higher workload, or causes serious or fatal injuries among the passengers (losses among the investors

C.  Major - Failure significantly reduces the safety margin or significantly increases crew (the investment manager's) workload. May result in passenger (investor) discomfort (or even minor injuries)(losses). 

D.  Minor - Failure slightly reduces the safety margin or slightly increases crew (investment manager's) workload. Examples might include causing passenger (investor) inconvenience or a routine flight plan (portfolio investment) change. 

E.  No Effect - Failure has no impact on safety, aircraft (portfolio, trading, and compliance operations), or crew (investment manager) workload. 

After I completed my editing, it looked very familiar to me, as our firm uses a similar approach in our IMP Testing Methodology. For example, on our CLEAR compliance projects, we use a DAL-type of framework to forecast the impact if compliance rules were to fail due to improper coding, or unforeseen bad data. If you look at the $21 million fine imposed on WAMCO recently, there is still some controversy if the problem was purely a data issue, the order management system, or the process and controls around it. It poses the question, would having a certification and a DAL approach in the compliance rule testing process have helped to prevent this? I believe so.

That is just one case, but what other cases could have been prevented?

  • Could the Knight Securities 2012 loss of $440 million dollar in 30 minute been avoided with the implementation of the DAL concept to the code they released into production?
  • Could it have prevented the “Flash Crash” in 2010 caused by trading algorithms that brought all major indexes down severely in just 30 minutes?
  • Could it have prevented the "technical glitch" that just happened last Wednesday when the NYSE closed until the afternoon? (We still don’t know why as of this writing). 

We can learn a lot from other industries and their approach to making technology “safe”. As we have seen over the last decade when our financial systems have collapsed, more than just money can be lost – jobs, businesses, and lives can be ruined.

The next time you’re testing a piece of software functionality, think about the most catastrophic event that could happen if that software fails. Sometimes that means bringing many people into the discussion from IT and the front-office. Once you have figured that out, focus on a test plan aligned with the DAL levels above. It just might prevent your aircraft from crashing.