Mahurin Honors College Capstone Experience/Thesis Projects

Department

Mathematics

Additional Departmental Affiliation

Computer Science

Document Type

Thesis

Abstract

There currently does not exist a way to easily view the relationships between a collection of written items (e.g. sports articles, diary entries, research papers). In recent years, novel machine learning methods have been developed which are very good at extracting semantic relationships from large numbers of documents. One of them is the (unsupervised) machine learning model Doc2Vec which constructs vectors for documents. The research project detailed in this paper uses this and other already existing algorithms to analyze the relationship between pieces of text. We set forth a broader ambition for this project before discussing the use and need of Doc2Vec. We set and evaluate criteria in order to examine the feasibility of Doc2Vec for accomplishing this broader ambition.

Advisor(s) or Committee Chair

Uta Ziegler, Ph.D.

Disciplines

Artificial Intelligence and Robotics | Computer Sciences | Linguistics | Mathematics | Other Computer Sciences

Share

COinS