Understanding the 6 Dimensions of Data

In my previous post, Securing Water for the Future, I discussed some of the challenges water companies are facing. One of these challenges was related to Asset Management. Of course, it was easy to point out that better Asset Management processes will help any water utility achieve goals such as reducing Non-Revenue Water (NRW), optimizing asset usage and reducing expenditures. But there’s a deeper story there. After all, you can’t achieve those goals without data.

Which brings us straight to the heart of this article: the importance of Data Registration, Data Quantity, Data Quality, Data Integrity, Data Management and Data Integration, also known as the 6 dimensions of data.

Now, it is no secret that the old adage “garbage in – garbage out” still rings true today. We need to define the playing field, to accurately capture what data is. There are many definitions out there, obviously, but this one in particular stands out:

“In computing, data is information that has been translated into a form that is efficient for movement or processing. Relative to today’s computers and transmission media, data is information converted into binary digital form. It is acceptable for data to be used as a singular subject or a plural subject. Raw data is a term used to describe data in its most basic digital format.”

In essence, without data, there would have nothing to facilitate movement or processing, which means visualization, analysis and reporting have no value. A product might make a wonderful picture, but the true value of the picture would have no data-related meaning. That means we have to take on the responsibility of ensuring that we have a viable dataset to work with before software tools can be functional and useful.

01. Data Registration

Data Registration is critical to the success of any software product, from Point-of-Sale solutions straight through to Spatial Asset Management solutions. Firstly, data registration requires that a proper data acquisition process be defined and then applied to a data model based on industry standards. The database should be specifically designed to use industry standard data models defined by regulatory bodies. Those industry standards are available from national and international organizations that ensure a framework to properly organize your data.  From a Spatial Asset Management perspective, that means having all of the appropriate configurations per asset, from metadata to proper topology to industry standard geometries and so on. In short, you must define the asset as it actually is.

Now, as the evolution of data technology progresses and the databases themselves become more robust, it is becoming increasingly evident that perspectives are changing. Systems like RDBMS, once deemed to be industry standards, are less suited to their tasks in today’s world. In an object-oriented environment (as we will discuss it here, at least), you define all the water and spatial asset management characteristics at the data model level allowing for multiple geometric representations. Functionalities on that data model are easy to enhance and performance is never compromised. Other solutions require you to manage this at the application level. This means that as these older technologies evolve, a customer’s migration path is much more extensive – because those applications will need to be updated.  The result is that your total cost of ownership is not contained.

Lastly, many utilities are faced with having to perform data acquisition processes. In many cases, existing data is incomplete and must be supplemented, with utilities looking at mobile solutions to assist in this process. Making sure this newly-acquired data is validated properly before being disseminated within the organization requires both a strong financial commitment and a strong data management process. There are many mobile tools available, but a significant number of them are not readily configured to manage the data management process.

02. Data Quantity

Data Quantity cannot drive success in and of itself. Although achieving data quantity is quite easy, simply having a lot of data does not help – especially if that data is not qualified. In fact, it clouds the decision-making process, increasing costs and reducing effectiveness and efficiency.

Once your data model is defined, the process of data migration – if any of that data is available – or data conversion must take place. In either case, you must establish a solid plan beforehand in order to fully implement the object-oriented data model you have defined.

In doing so, however, it is important to recognize that your utilities may face data volume issues. The more data there is, the more you must consider how performance can be decreased by this, especially when you add more users and factor in availability – both in terms of location and continuity (i.e. 24/7 access). In properly designed environments, that does not have to be the case. Good architecture will allow you to handle any quantity of data with ease.

When considering data quantity, you must also consider the impact and implications big data will have for your business. Big data, be it related to machine-generated data or real-time data, can be analyzed in terms of predictive analytics and human behavioral patterns. While the amount of data derived from these environments is obviously quite substantial, it is never organized in a way that provides insights on its own. It must always be processed.

Still, from the perspective of Spatial Asset Management, this data can be used to gain valuable insights into asset failure, performance metrics and asset optimization. Big data can also be used in an object-oriented environment. It can be analyzed to create a clearer picture of performance issues and user behavior, allowing improvements to be achieved by optimizing business processes.

03. Data Quality

Achieving a high standard of Data Quality is expensive. We could simply use the database endlessly, allowing it to grow without restriction. But to look at the cost in terms of Data Quality, one needs weigh waste and value.

  • We can define waste as any activity that takes longer (or costs more) because of a low-quality data set. Waste can be defined in different ways: wasted storage media, space or time, faulty decision-making or broken business processes.
  • Meanwhile, value is the importance assigned to a particular dataset – its place within the organisation. How valuable is a Spatial Asset Management system that has been populated with data that will not validate or save? If the asset team are working with antiquated data, how valuable is that information when planning improvements?

When we look at the way data is handled, the ‘cheapest’ option is always the most wasteful. It consumes a large portion of IT budget and results in the lowest value. Worse still, any bad data within a database will spread to all integrated systems, derailing attempts to integrate those systems to improve efficiency. That is why supplying a data quality module is so important to support data validation. While it does consume more time in terms registration, you will at least know the data is correct.

Within every business process, data should inform a decision, and a correct one at that. If it cannot immediately do that, you have a data quality problem.

04. Data Integrity

Data Integrity is, essentially, the maintenance and assurance of data accuracy and consistency over the entire data life cycle. It is therefore a critical aspect of the design, implementation and usage of any system that is intended to provide a consistent means of storing, processing and retrieving data.

In an object-oriented database, data is managed in a consistent fashion because all of its characteristics are defined at the data model level – unlike many other solutions, which manage that behaviour at an application level.

05. Data Management

Data Management is the development and implementation of architectures, policies, practices and processes that properly manage the full data life cycle needs for a given entity, while also supporting the business processes of that entity. Without data validation protocols in place, one cannot support the business processes properly. All too often, becoming enamoured with what is being visualized hides the deficiencies of managing the data properly.

Data Management should also facilitate the storage of time series data. An object-oriented database easily manages time series data because it is scalable. With large amounts of data in a relational format, for example, the potential for performance degradation is practically a given.

One of the main advantages of an object-oriented based system is its ability to manage different versions of that data. This facilitates phased completion, providing hundreds of thousands of checkpoints, as well as scalability and multiple designs. It also allows simple rollback functions to view historical situations.

06. Data Integration

Data Integration combines data residing in different sources, thereby providing users with a unified view of them – the ultimate single source of truth. For the purpose of this article, however, we should take this one step further.

In the ideal scenario, which is attainable in an object-oriented system, integration goes beyond simply combining data. Instead, it leverages this unified view to improve business processes by integrating more qualitative data and making this data readily available at all appropriate levels of your organization.

This prevents data silos, ensures coordination between departments and helps establish a shared vision for the organization.

The value of data mastery

To become future-proof, modern businesses are necessarily required to assess their existing datasets and determine how best to organize that data. During this process, the scalability of both graphic and textual information, number of users, integration possibilities with other systems and version management capabilities must all be carefully investigated to contain to total cost of ownership.

By implementing commercial-off-the-shelf solutions that take care of all these aspects from the outset, utility companies will have a far greater degree of control when it comes to managing their data, processes and total cost of ownership, all while facilitating all of their users and constituents. In an increasingly data-driven world, that is the key to success.

John Leeuwenburg

Global Product Manager

More related articles

Water Office to replace ESRI GIS for PDAM Bogor within a year
Water Office Worldwide

Whether you want a demonstration or need technical support, Water Office is always close by and happy to help.