The Archaeology of Software

code-1114I often take a look at where major grant funding is being spent as a way to keep tabs on major technology trends over the next five to 10 years. After all, when NSF or DARPA puts millions of dollars of taxpayers money to work, it’s a bit like watching the Vegas line for college basketball – once I see bets being placed, I can pretty much predict who will be in the final four. Recently, there has been an increase in activity related to improving the way we go about building software. DARPA, for example, recently funded an $11 million dollar effort led by a Rice University team of software experts focused on creating a software development tool that will “autocomplete” code for developers.

Most software is developed using the same processes that were being taught in the 1990s. Sure management practices may have changed and new languages have been developed, but almost all individual programmers follow the same basic methods. For example, even through we are a small company with an aggressive software development team and can generate new features/products far faster with less people than the likes of an SAP — each of our developers still capture requirements, sketch out an architectural design with workable objects/modules, and interfaces between them, implement, and then test.  But what if during the design and implementation phases of software development you didn’t have to go it alone?  What if you could draw upon the entirety of open source code that has already been written?

This is both a huge opportunity and a huge problem. There are over 4.7 million open source GitHub/SourceForge repositories (places to store open source code so it is accessible). Each of these projects typically contains thousands of lines of code – resulting in billions of lines of code that could be used to inform and guide development of new software. To do this requires an automated method to search, catalog, and separate the valuable code snippets from code that could be buggy or irrelevant.

This is the focus of what is called project PLINY (I suppose the project is named after the Roman philosopher Pliny, the Elder who developed the first major Encyclopedia). Vivek Sarkar and his colleagues will be working on a four-year project that will mine the internet for code modules that can be automatically combined with an existing set of code to “autocomplete” a program.  Need a module to compare patient records in a large database? Start writing your database access module and PLINY will suggest ways to complete the module — saving time and money in the development process. Dr. Sarkar points out that the system should be able to complete as well as search for bugs: “Imagine the power of having all the code that has ever been written in the past available to programmers at their fingertips as they write new code or fix old code”.

This vision has the ability to transform how we think about developers — from standalone engineers to code archeologists who are able to draw pieces of code quickly from the past and piece them together in new ways, creating new features.  I can imagine how this will eventually lead to specialization among developers — those that are experts at creating new code, and those that are good at combining code into complete systems.

It’s a great vision, and one that has the potential to accelerate software development cycles past where they are today. Here at Mersive, we are on a pace of about 3-4 releases a year with significant new features introduced into our software each release. Imagine being able to double that… I know our customers would be happy.