Publication Date

Spring 2020

Advisor(s) - Committee Chair

Dr. Lukun Zheng, Dr. Mark Robinson, Dr. Ngoc Nguyen & Dr. Qi Li

Degree Program

Department of Mathematics and Computer Science

Degree Type

Master of Science


An interesting topic in the visual analysis is to determine the genre of a book by its cover. The book cover is the very first communication to the reader which shapes the reader’s expectation about the type of the book. Each book cover is carefully designed by the cover designers and typographers to convey the visual representation of its content. In this study, we explore several different deep learning approaches for predicting the genre from the cover image alone, such as MobileNet V1, MobileNet V2, ResNet50, Inception V2. Moreover, we add an extra modality by extracting text from the cover image. We explore some text classification algorithms such as LSTM and Universal Sentence Encoder. Moreover, we focus on multi-modal fusion based on two best performing models on cover images and text. Finally, we propose the use of Deep Canonical Correlation Analysis (DCCA) to jointly learn the features from two modalities: image and text, then use a support vector machine classifier to predict the genre of the book. Overall, the concatenation of two models without DCCA yields the best result. However, our analysis revealed the weakness of these models for solving this task on our used dataset. Our results suggest that solving this task to a satisfactory level needs significant efforts and a much-more accurate dataset.


Artificial Intelligence and Robotics | Other Computer Sciences | Other Physical Sciences and Mathematics | Statistics and Probability