Dataset Overview

Research of provenance visualization requires analysis records as samples to visualize. We conducted sets of user studies using text analysis scenarios and multidimensional data analysis to generate provenance test data. The data sets consist of interaction logs over time, the data and interface elements interacted with, think-aloud comments from participant "analysts", and researcher-annotated labels.

Project Lead: Eric Ragan, PhD; Indie Lab at University of Florida

Data is free to use for research with acknowledgement:

Mohseni, S., Pachuilo, A., Nirjhar, E. H., Linder, R., Pena, A., & Ragan, E. D. (2018). Analytic Provenance Datasets: A Data Repository of Human Analysis Activity and Interaction Logs. arXiv arXiv:1801.05076. (link)

Text Analysis for Intelligence Analysis

Designing provenance visualizations requires analysis records. We conducted a set of user studies using text analysis scenarios to provide a provenance test data. Data logs were captured from 24 analysis sessions. Each analysis session involved one of three intelligence analysis scenarios selected from the VAST Challenge datasets, a set of synthetically created data sets and analysis scenarios designed to be similar to real-world cases and problems.

Text Analysis Provenance Data


Cyber Threat Analysis

We conducted a user study to gather interaction logs and eye tracking information on how users analyze multidimensional data via multiple coordinated view applications. Each session lasted about 90 minutes in which they used an application we created to explore data from the VAST VAST 2009 Mini Challenge 1. Participants used a simple visualization tool with coordinated views investigate any network events they found suspicious. The tool offered ways to filter by source IP and time filtering. Users were also able to view where employees were in the office at given time periods. Not only did we gather interaction logs and eye tracking data but we also recorded each participant's screen as well as audio for think aloud data.

Cyber Analysis Provenance Data