This talk presents the efforts of a team of criminologists to develop a critical infrastructure ransomware dataset. It addresses how we created (and are maintaining) this dataset using responsible data science practices for data collection, rehashing collected data into meaningful variables, and developing code books for variables to promote transparency and trustworthiness. We also present challenges and limitations we experienced in generating this dataset, such as missing data, establishing consistency across varying levels and types of information, and using only publicly disclosed incidents. We also share the different communities that have requested this dataset and their potential uses. Finally, we discuss feedback from the community that has impacted major revisions to this dataset and how further engagement might inform future iterations. We hope to demonstrate how this dataset is not only valuable as a free resource, but that it is also dynamic and evolving based on community engagement.
- Understanding the relevance of using responsible data science principles to develop and maintain datasets.
- Understanding reasonable expectations for datasets based on publicly disclosed incidents.
- Demonstrating the need for academic-industry-government dialog and partnerships.