The growing need for data processing frameworks and data governanceBack to Blog
Imagine a world where data collected from numerous disparate data sources could be processed without intervention, with no impact on downstream activities, and made available to users in the right format, at the right time, allowing your business to gain the intrinsic value held within it.
The shift from offline to digital data sources only compounds the issues that many businesses are facing when it comes to processing and storing data. Combining any new dataset with existing systems should be an easy, replicable process, allowing new feeds to be integrated with minimal effort. But, in reality, it’s not that simple.
To reach this goal, businesses need to write and implement a set of rules (data processing framework), which reduces the time and effort needed to prepare and correct data before attempting to load it into a database.
These rules fall under a wider umbrella (data governance) which, when implemented, assists in the collection and loading of data. Data governance defines the framework or processes that ensure data is effectively managed throughout your business and, when put in place, reduces costs and increases revenue.
There are three main components of data governance:
- Rules or controls for all data flows in and out of your business
- Implementation of these rules or controls
- Data management to ensure your company is compliant
We’ll focus on the first of these three elements here.
When it comes to data collection, whether input directly into a system by a user, a web form asking for customer/supporter information or a third-party collecting data on your behalf, it must be accurate and meet minimum standards defined in the data governance policy.
But this isn’t always the case, with different attributes being captured sporadically containing different values to those expected and using different naming conventions and varying file formats. When such feeds are designed and implemented adhering to defined guidelines, minimal effort is then required to set up ETL (Extract, Transform, Load) processes, and they may even match the format of other feeds already in place.
Let’s look at some simple rules that can benefit your business:
- Naming conventions – define what each field should be called. ‘Forename’, for example, will be captured by different suppliers as ‘Firstname’, ‘First name’ or ‘First_name’. Ask for fields to be supplied with names that conform with your policy. Where suppliers are unable to conform, document the differences and ensure all who are concerned are aware of this divergence. Alternatively, your ETL product may allow for aliases to be used e.g., define the field as ‘First_name’ but with an alias of ‘Forename’. Thus, when the field is supplied as ‘Forename’, it’ll automatically map to the ‘First_name’ field.
- Minimum data requirements – missing or incomplete data attributes cause issues to downstream processes. This covers a broad range of business areas, such as call handlers, digital teams, data management, point of sale etc. Entity management software (single customer/supporter view) relies on many attributes to identify matching criteria across records and, when these are present and correct, leads to an accurate, compliant and trusted dataset. Here are a couple of examples of this. Firstly, capture full names rather than just initials. And, when capturing email addresses, also capture name attributes, not just email addresses – the classic email matching conundrum!
- Permitted values – this is a simple example, but if it’s a postcode field, then stipulate that it contains a postcode. Wherever possible, validation of the value should be configured at the source. Of course, international data must be considered too. Another common scenario around permitted values of data capture relates to consent and ensuring it’s collected and stored correctly. Historically, one feed may have collected data as an opt-in and another as an opt-out, but both with a yes and no option, which conflict when put together. Standardising the values as much as validating them is important. Ideally, all data capture points would implement the same standards and capture consistent values, but these aren’t always within the control of the receiving organisation. The pain point is, therefore, often accepted to be at the point of standardisation. In some cases, we’ve seen recoding of the historical data, although this needs to be done with care and full awareness of all data users.
- Change control/management – an obvious item to point out. Amendments to existing feeds must be developed and tested before being allowed anywhere near your live environment.
If the above rules are scoped out, implemented and adhered to, the need for human interaction to fix data will be massively reduced if not (in some cases) completely removed. The benefits will be felt right across your business, ranging from the data team who are constantly fixing issues to the teams responsible for selections, analysis and reporting projects. They’ll be able to spend less time QA-ing and validating the results and build trust in the dataset available.