TY - JOUR
T1 - TraceGen
T2 - User activity emulation for digital forensic test image generation
AU - Du, Xiaoyu
AU - Hargreaves, Christopher
AU - Sheppard, John
AU - Scanlon, Mark
N1 - Publisher Copyright:
© 2021 The Authors
PY - 2021/10
Y1 - 2021/10
N2 - Digital forensic test images are commonly used across a variety of digital forensic use cases including education and training, tool testing and validation, proficiency testing, malware analysis, and research and development. Using real digital evidence for these purposes is often not viable or permissible, especially when factoring in the ethical and in some cases legal considerations of working with individuals' personal data. Furthermore, when using real data it is not usually known what actions were performed when, i.e., what was the ‘ground truth’. The creation of synthetic digital forensic test images typically involves an arduous, time-consuming process of manually performing a list of actions, or following a ‘story’ to generate artefacts in a subsequently imaged disk. Besides the manual effort and time needed in executing the relevant actions in the scenario, there is often little room to build a realistic volume of non-pertinent wear-and-tear or ‘background noise’ on the suspect device, meaning the resulting disk images are inherently limited and to a certain extent simplistic. This work presents the TraceGen framework, an automated system focused on the emulation of user actions to create realistic and comprehensive artefacts in an auditable and reproducible manner. The framework consists of a series of actions contained within scripts that are executed both externally and internally to a target virtual machine. These actions use existing automation APIs to emulate a real user's behaviour on a Windows system to generate realistic and comprehensive artefacts. These actions can be quickly scripted together to form complex stories or to emulate wear-and-tear on the test image. In addition to the development of the framework, evaluation is also performed in terms of the ability to produce background artefacts at scale, and also the realism of the artefacts compared with their human-generated counterparts.
AB - Digital forensic test images are commonly used across a variety of digital forensic use cases including education and training, tool testing and validation, proficiency testing, malware analysis, and research and development. Using real digital evidence for these purposes is often not viable or permissible, especially when factoring in the ethical and in some cases legal considerations of working with individuals' personal data. Furthermore, when using real data it is not usually known what actions were performed when, i.e., what was the ‘ground truth’. The creation of synthetic digital forensic test images typically involves an arduous, time-consuming process of manually performing a list of actions, or following a ‘story’ to generate artefacts in a subsequently imaged disk. Besides the manual effort and time needed in executing the relevant actions in the scenario, there is often little room to build a realistic volume of non-pertinent wear-and-tear or ‘background noise’ on the suspect device, meaning the resulting disk images are inherently limited and to a certain extent simplistic. This work presents the TraceGen framework, an automated system focused on the emulation of user actions to create realistic and comprehensive artefacts in an auditable and reproducible manner. The framework consists of a series of actions contained within scripts that are executed both externally and internally to a target virtual machine. These actions use existing automation APIs to emulate a real user's behaviour on a Windows system to generate realistic and comprehensive artefacts. These actions can be quickly scripted together to form complex stories or to emulate wear-and-tear on the test image. In addition to the development of the framework, evaluation is also performed in terms of the ability to produce background artefacts at scale, and also the realism of the artefacts compared with their human-generated counterparts.
KW - Evidence planting
KW - Forensic disk image creation
KW - Forensic education
KW - Tool testing and validation
KW - User emulation
UR - http://www.scopus.com/inward/record.url?scp=85119476863&partnerID=8YFLogxK
U2 - 10.1016/j.fsidi.2021.301133
DO - 10.1016/j.fsidi.2021.301133
M3 - Article
AN - SCOPUS:85119476863
SN - 2666-2825
VL - 38
JO - Forensic Science International: Digital Investigation
JF - Forensic Science International: Digital Investigation
M1 - 301133
ER -