Introduction to data
Data is usually thought of as numbers, but any kind of information is made up of data. Information is data that has been organised or analysed to create useful insights or provide knowledge of some kind.
Data could be numbers, but also images, videos, maps, social media, online content or paper.
Working with data
Good data is created, stored, managed and used through recognised and approved processes. It follows agreed standards and procedures meaning its quality can be trusted. This means its value, accuracy, consistency and level of detail are all known.
To support your data management use the FAIR data principles from GO FAIR. FAIR data is:
- Findable – the data is documented and stored in a way that means you and others know how to find it
- Accessible – now that you can find the data this is knowing how to access it
- Interoperable – your data can be used across different systems and can be used with other data sets
- Reusable – your data is collected or recorded in a way that it can be used again in different projects or settings
To get good FAIR data, you need to think about how policies and ways of working in your project or organisation ensure the data can be used. These policies and ways of working are known as data management. It is essential to set up appropriate governance from the start of your project to reduce the chance of expensive mistakes later.
Using data effectively
You should use data to inform your decision making and improve the experience for users. To ensure you use your data effectively you should complete a data management plan.
You need to plan for the management and use of data throughout the lifecycle of your service. When using data you will either be collecting the data yourself or reusing data that already exists.
An example of why this is important can be seen in the hypothetical examples of:
- issues accessing existing data
- duplicating data collection and not sharing your data
Issues accessing existing data
Your hypothetical service has been developed to map soil types in Scotland to help farmers manage their land. A mobile application is built for farmers to enter a location and learn their soil type. However, the data informing your service does not have a data management plan.
The soil data has been collected by different government agencies, charity sector workers, private sector companies and citizens. With no plan your terms for accessing the data are not clear. This means:
- different organisations have different policies for sharing the data
- some teams have concerns about personal or commercially sensitive information and cannot share the data you need
- where data has been shared with you, the information is stored in different formats and can be contradictory
This results in the users of your mobile application finding soil information is missing or wrong. After an initial spike in usage your service stops being used entirely. Finding these data access issues earlier could have prevented costly mistakes. A data management plan would help you identify potential issues before they happen.
Duplicating data collection and not sharing your data
Your hypothetical service to map soil types in Scotland collects data on farmers’ soil every time they submit a request. This data is held by another government agency but instead of working with them you collect your own version of the data.
With no data management plan:
- your service team is unaware of any existing soil datasets
- you have no data sharing agreements with other teams
- farmers find themselves being asked for the same soil information repeatedly
- you spend more money through unnecessary data collection
This results in your service becoming slow and expensive. Publishing or cataloguing the existing soil data would help you and other teams to avoid repeating this same work. A shared soil database could remove duplication, improve the data quality and speed up service delivery.
Managing your data
To use data effectively (and avoid the previous examples) you should complete a data management plan. Following this plan helps you consider data throughout your service’s lifecycle preventing costly and time-consuming changes.