TY - GEN
T1 - A COMPARISON OF CONVOLUTIONAL NEURAL NETWORKS AND VISION TRANSFORMERS AS MODELS FOR LEARNING TO PLAY COMPUTER GAMES
AU - Dudon, Adrien
AU - Cawley, Oisin
N1 - Publisher Copyright:
© 2023 EUROSIS-ETI. All Rights Reserved.
PY - 2023
Y1 - 2023
N2 - Convolutional Neural Network (CNN) architecture, coupled with the Double Deep Q Network (DQN) algorithm, has been extensively employed in solving complex video game environments. Nevertheless, the emergence of Vision Transformer (ViT) architectures has demonstrated superior performance in various tasks previously dominated by CNNs. This research seeks to replicate the study conducted by Meng et al. and assess whether the Swin Transformer, a variant of ViT, can effectively learn to play video games and achieve comparable results within fewer training steps, as compared to CNN. The study's findings reveal that the Swin Transformer architecture demonstrates notable performance; however, the CNN architecture outperforms it with a limited number of training steps in contrast to Meng et al. Additionally, the CNN architecture proves to be more computationally efficient, requiring less computing power and functioning optimally on older hardware while consuming a reasonable amount of memory. To surpass CNN performance, the Swin Transformer necessitates a substantial number of training steps, in support of Meng et al.'s study.
AB - Convolutional Neural Network (CNN) architecture, coupled with the Double Deep Q Network (DQN) algorithm, has been extensively employed in solving complex video game environments. Nevertheless, the emergence of Vision Transformer (ViT) architectures has demonstrated superior performance in various tasks previously dominated by CNNs. This research seeks to replicate the study conducted by Meng et al. and assess whether the Swin Transformer, a variant of ViT, can effectively learn to play video games and achieve comparable results within fewer training steps, as compared to CNN. The study's findings reveal that the Swin Transformer architecture demonstrates notable performance; however, the CNN architecture outperforms it with a limited number of training steps in contrast to Meng et al. Additionally, the CNN architecture proves to be more computationally efficient, requiring less computing power and functioning optimally on older hardware while consuming a reasonable amount of memory. To surpass CNN performance, the Swin Transformer necessitates a substantial number of training steps, in support of Meng et al.'s study.
KW - Artificial Intelligence
KW - Computer Game Programming
KW - Convolutional Neural Networks
KW - Machine Learning
KW - Vision Transformer
UR - http://www.scopus.com/inward/record.url?scp=85183864995&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85183864995
T3 - 24th International Conference on Intelligent Games and Simulation, GAME-ON 2023
SP - 5
EP - 9
BT - 24th International Conference on Intelligent Games and Simulation, GAME-ON 2023
A2 - Kehoe, Joseph
A2 - Bourke, Philip
A2 - Cawley, Oisin
PB - EUROSIS
T2 - 24th International Conference on Intelligent Games and Simulation, GAME-ON 2023
Y2 - 6 September 2023 through 8 September 2023
ER -