The growing need for data processing frameworks and data governance
4 August 2021 | Blog
Back to BlogImagine a world where data collected from numerous disparate data sources could be processed without intervention, with no impact on downstream activities, and made available to users in the right format, at the right time, allowing your business to gain the intrinsic value held within it.
Sounds good, doesn’t it?
However, the shift from offline to digital data sources only compounds the issues that many businesses are facing when it comes to processing and storing data. Combining any new dataset with existing systems should be an easy, replicable process, allowing new feeds to be integrated with minimal effort.
But, in reality, it’s not that simple.
To reach this goal, businesses need to write and implement a set of rules (data processing framework), which reduces the time and effort needed to prepare and correct data before attempting to load it into a database.
These rules fall under a wider umbrella (data governance) which, when implemented, assists in the collection and loading of data and the create of data insights reports.
Data governance explained
Data governance defines the framework or processes that ensure data is effectively managed throughout your business and, when put in place, reduces costs and increases revenue.
There are three main components of data governance:
- Rules or controls for all data flows in and out of your business
- Implementation of these rules or controls
- Data management to ensure your company is compliant
We’ll focus on the first of these three elements here.
When it comes to data collection, whether input directly into a system by a user, a web form asking for customer/supporter information or a third-party collecting data on your behalf, it must be accurate and meet minimum standards defined in the data governance policy.
But this isn’t always the case, with different attributes being captured sporadically containing different values to those expected and using different naming conventions and varying file formats. When such feeds are designed and implemented adhering to defined guidelines, minimal effort is then required to set up ETL (Extract, Transform, Load) processes, and they may even match the format of other feeds already in place.
4 essential data governance rules explained
1. Naming conventions
Firstly, you need to use naming conventions to define what each field should be called. ‘Forename’, for example, will be captured by different suppliers as ‘Firstname’, ‘First name’ or ‘First_name’, which is why you need to ask for fields to be supplied with names that conform with your policy.
Where suppliers are unable to conform, document the differences and ensure all who are concerned are aware of this divergence. Alternatively, your ETL (extract, transform, and load) product may allow for aliases to be used e.g., define the field as ‘First_name’ but with an alias of ‘Forename’. Thus, when the field is supplied as ‘Forename’, it’ll automatically map to the ‘First_name’ field.
2. Minimum data requirements
Missing or incomplete data attributes cause issues to downstream processes. This covers a broad range of business areas, such as call handlers, digital teams, data management, point of sale, and more.
Entity management software (single customer/supporter view) relies on many attributes to identify matching criteria across records and, when these are present and correct, leads to an accurate, compliant and trusted dataset.
Here are a couple of examples of this. Firstly, capture full names rather than just initials. And, when capturing email addresses, also capture name attributes, not just email addresses – the classic email matching conundrum!
3. Permitted values
This is a simple example, but if it’s a postcode field, then you need to stipulate that it contains a postcode. Wherever possible, validation of the value should be configured at the source – however, of course, international data must be considered too.
Another common scenario around permitted values of data capture relates to consent and ensuring it’s collected and stored correctly. Historically, one feed may have collected data as an opt-in and another as an opt-out, but both with a yes and no option, which conflict when put together.
Standardising the values as much as validating them is important. Ideally, all data capture points would implement the same standards and capture consistent values, but these aren’t always within the control of the receiving organisation.
The pain point is, therefore, often accepted to be at the point of standardisation. In some cases, we’ve seen recoding of the historical data, although this needs to be done with care and full awareness of all data users.
4. Change control/management
Finally, an obvious item to point out: any amendments to existing feeds must be developed and tested before being allowed anywhere near your live environment.
How data governance can benefit your business
If the above rules are scoped out, implemented and adhered to, the need for human interaction to fix data will be massively reduced if not (in some cases) completely removed.
The benefits will be felt right across your business, ranging from the data team who are constantly fixing issues to the teams responsible for selections, analysis and data insights report projects. They’ll be able to spend less time QA-ing and validating the results and more building trust in the dataset available and using it to their advantage!
Need help getting the most out of your data?
At Wood for Trees, we are experts in data processing frameworks and data governance, and have years of experience helping charities to create get the most out of their data.
We help make things happen through data analytics, insights, and systems, and have collaborated with some of the world’s best-known charities and not-for-profits to improve fundraising efficiency and performance and boost ROI.
If you’d like to find out more about how we can help you with data governance so you can get the most out of your data, contact our team today to learn more.