You are looking at archived content from my "Bookworm" blog, an experiment that ran from 2014-2016. Not all content may work. For current posts, see here.

Posts with tag word2vec


← Back to all posts
Dec 09 2015

2015-11-19

Oct 29 2015

My last post provided a general introduction to the new word embedding of language (WEMs), and introduced an R package for easily performing basic operations on them. It was geared mostly towards people in the Digital Humanities community. This post looks more closely at a single word2vec model Ive trained, on about 14 million reviews of faculty members from ratemyprofessors.com,1 The point of this one is to provide a more concrete exploration of how these models can help us think about gendered language. I hope it will be interesting even to people who arent interesting in training a machine learning model themselves; theres code in here, but its freely skippable.

Oct 24 2015

Recent advances in vector-space representations of vocabularies have created an extremely interesting set of opportunities for digital humanists. These models, known collectively as word embedding models, may hold nearly as many possibilities for digital humanitists modeling texts as do topic models. Yet although theyre gaining some headway, they remain far less used than other methods (such as modeling a text as a network of words based on co-occurrence) that have considerably less flexibility. As useful as topic modeling is a large claim, given that topic models are used so widely. DHers use topic models because it seems at least possible that each individual topic can offer a useful operationalization of some basic and real element of humanities vocabulary: topics (Blei), themes (Jockers), or discourses (Underwood/Rhody).1 The word embedding models offer something slightly more abstract, but equally compelling: a spatial analogy to relationships between words. WEMs (to make up for this post a blanket abbreviation for the two major methods)2 take an entire corpus, and try to encode the various relations between word into a spatial analogue.