There are many definitions when it comes to the phrase “Data-Driven”. In the simplest of its definition, it is an approach where “data” is emphasized over other elements. In the modern world, “Data-driven” is a buzz word everywhere. You will often hear “data-driven marketing”, “data-driven research”, “data-driven organization” and the list is endless. Having said that, let me bring a fresh perceptive of looking at how software can be designed in a data-driven way.

When we start working on a new application, we approach designing the application based on a certain principle. For example, having all requirements clear and specification ready for all features, we may start with a basic architecture of the application. Once the architecture is finalized, then we will design each of the components in an architecture. Later move to a low-level design, programming, different kinds of testing and release. Often in most of these cases, our focus is on fulfilling business requirements in a robust, efficient and easier way.

Definition

For the data-driven software design, we will build on the premise, data which drives the logic is separated from the software logic itself. Data-driven software design is a principle which focuses on how data related to business requirements can be organized and build a generic business logic or an engine in the application based on the dataset. This approach makes the application business logic to interact with the data in a way that, the whole of the business logic can be as generic and independent of the dataset. Often in such cases, some of the business logic may be far more complex because it needs to consider wide-variety of use-cases coming from the dataset.

Design Approach

In the data-driven software design approach, we first focus on building a schema of generic data around the business requirement. The data schema should take into account all aspects of business requirements. Bring in as much as data as possible into the data schema. Let’s take the example of a software application which is interacting with hardware. We will build as many data template related to software-hardware interaction.

An example of a 3 tier data-driven Software Architecture

Here are some questions we can ask during this phase:

  • What are the available physical communication channels? Eg:- TCP/IP, USB, Serial
  • What are the communication parameters specific for communication available? Eg:- IP Address/Port etc.
  • What are the available communication protocols? Eg:- Modbus, CAN etc.
  • What is the data definition of parameters supported by the hardware? Eg:- Label, Description, Register Address, Data conversion etc.

The above is just an example of how generic data schema can be built based on the business requirements.

The second step is to build complete dataset based on a generic schema that was built in the previous step. It’s ok to have some datasets missed out during the initial development stage because the design approach provides flexibility to add data at a later stage without changing the logic.

The third key step is to build business logic around reading and parsing of data. All features will be built generically implementing business requirement based on the data parsed.

In the final stage, there may be a requirement to fine-tune the data schema and optimize it to best suit the business logic. It may happen that defining data in a certain way can lead to unoptimized logic in the application. So schema has to be changed or remove certain redundant data as well.

Taking the example from above, when the software tries to establish communication with the hardware, it will search in the dataset what device is that.

Dataset will have enough information to advise the application to:

  • The communication channels supported
  • The communication parameters to be used
  • The communication protocol to use
  • The data parameters supported
  • Read/write data registers supported
  • Data conversion in a read/write operation

There are could be many other meta information which could be stored in the database. All of this data is what drives the logic of the application which gives meaningful output.

Data-Driven Dataset vs Database of the Software

At this point, we have to make a clear distinction between data that drives the software logic (data-driven software) versus software data which can be a wide variety of data stored in a database or file. The dataset in the data-driven software is what drives the core logic of the application. This data is never modified once the application is deployed.

On the other hand, a software database which is used in a typical application can store any kinds of information coming from the application. This might be used by the application to populate its user interface and store back the results of a users operation.

Case Study

I have successfully used the data-driven application design approach in many application design. A mimic diagram is a feature of an application which is designed fully in a data-driven model.

Mimic diagram

In the design of this feature, the UI elements are split into two – component and flow. Each of the different key components in the UPS is represented by a component (Cn). The current flow between the components is represented by a flow. For example, in the above diagram UPS in normal operation is having the components (MAINS, PFC, INVERTER and OUTPUT) and following the current flow MAINS-PFC-INVERTER-OUTPUT.

In a data driven approach, we can come with dataset for:

  • Different combination of flow in a particular UPS mode
  • Different conditions for a UPS mode
  • Different Data to be displayed for each component
  • Error conditions for a component

All of these were stored in a specific data schema, parsed in the application and build a business logic that can fully control the mimic diagram’s user interface. The business intelligence on how the mimic diagram should behave comes from the data and not in the code. The business layer acts like a mimic diagram engine based on the dataset. At the user interface level, a traditional OOPs design is followed for components and flow class.

In a traditional OOPs approach, we would have a base class with common operations and features. Derived classes will have their own specific implementation for each product supported by the application. The fundamental problem with this approach is the problem of scalability. Imagine we need to support 10 different kinds of products having 5 different kinds of mimic operations. In a traditional approach, we need to write 10 different kinds of derived classes implementing 5 different kinds of operations in each of them. In all of it, the behaviour is directly inside the code. Obviously, with a good OO design, we can reuse some part of the code. But with a data-driven approach, very minimal code changes are required to support a new product with different operations. All we need is to simply create a new dataset for the new product going to be supported. This level of efficiency is possible only if a proper mimic diagram engine is designed using a data-driven approach.

Advantages

Data-driven Software design has many key advantages over traditional logic-driven software design

  • Flexibility – This approach provides huge flexibility where data and logic can be independently managed and changed without affecting each other until a schema itself is modified.
  • Scalability – The application can be scaled easily to support many kinds of similar datasets with minimal changes in business logic.
  • Maintainability – Since data and logic are separated, the code will be clean, easier to maintain and debug, if good design and programming principles are followed. Bug-fixing done to a feature is applicable to all datasets and has a wider impact and hence will bring a greater scope.
  • Deployability – Application can be deployed easily. Support of new datasets can be done by only creating a patch without often changing the application.

Data-driven software design is a great way of looking at building software. This can yield far greater benefit in the long-term and bring a higher return on investment.