Data loss in recent migration
We recently undertook a large-scale migration of DataHub. This migration included upgrading CKAN to the latest stable release, running DataHub on a completely new cloud infrastructure, and moving to redundant S3 buckets for storing dataset resources.
The migration occurred in November 2015, after lots of hard work by our infrastructure and development teams, and we’ve since seen a more performant and stable DataHub, which will serve as a basis for new developments in 2016.
On the 11th of January 2016, we were alerted in this post on the Open Knowledge Discuss Forum to the potential loss of data. On immediate investigation, it was found that 38 datasets, with a total of 78 resources have been irretrievably lost.
These resources were, unlike the rest of the data on DataHub, stored on the local file system of the server, and not on a cloud-based storage backend. There is no way we are able to retrieve these resources.
We deeply regret any inconvenience caused by this data loss. While DataHub is a free service without any specific guarantees for data persistence, we have let ourselves and the community of users down by this loss of data.
We have taken steps to prevent this happening in the future. In fact, the move to the new infrastructure is the solution, as resources are now stored on fully redundant S3 storage.
Additionally, DataHub is more generally robust and secure, with a robust backup system in place for the database, and stateless application servers running in Docker in a new cluster. We appreciate the trust the community puts in us to host their data, and we’ll be doing more in 2016 to make DataHub a stable, reliable and free solution for storing and accessing open data.
We are happy to assist owners of these datasets in republishing the data if they desire. Please be in touch.