Reverse engineering Signs@40 – Hacking the Humanities 2021

Signs@40 documents and analyzes 40 years of feminist scholarship. There are many interesting subsections to this open-access DH project, such as an exhibit of Signs‘ past cover designs, a cocitation network analysis, and many more, but I will focus on the Topic Model Browser — a data visualization of word occurrences in this journal.

What sources did this project use?
- The archive of Signs from 1975-2014.

What’s its purpose?
- To celebrate Signs‘ 40th anniversary.
- To visualize its contribution to the field from multiple perspectives: individual topics over time; articles that are most strongly associated with particular topics; multiple topics contained in each article; and the words that contribute most significantly to each topic.

Awesome features of the Topic Model Browser:
- The basic unit of this section is the “topic”, or “a collection of words that tend to occur in the same documents”. In the grid view below, each circle represents a “topic”. The topic “Lesbian, gay, queer” contains words that frequently co-occur in articles, for example, heterosexual, identity, queer, lesbian(s), community, etc.
- In the “Topics over time” view, the height of each topic stream (differentiated by colors) shows the proportion of the topic in a given year. Interestingly, topics like “Feminist movements”, “Political movements” only made up 3.7% of the corpus in total (I believe the second wave of feminism was around the 1960s-80s and we’re currently in the fourth-wave). Unsurprisingly, words related to traditionally marginalized groups like the LGBTQ+ community and indigenous peoples were respectively less than 1% of the corpus in total.
- Each topic is also accompanied by several editorial commentaries, some of which were written by “Future Signs Editors”!
- *Note that each topic was labeled by the contributors according to their interpretive judgments. And yes, there were word associations that weren’t labeled, for instance, “rice, Dutch, Jewish, Somali” (I wonder what could be the* connection between them🤔).

What are the processes that helped create this model?
- Simplification of archival material: the order of words, identity of authors, composition dates, titles, very rare words, and other info. was all excluded.
- Remaining words from articles were then inserted into the Latent Dirichlet Allocation algorithm.
- The algorithm requires the user to specify the no. of topics in advance so our contributors settled on 70 topics (not sure how this no. was chosen).
- More details on the software used to build Signs@40’s website and topic model:
  - Source code and R scripts used.
  - MALLET by Andrew K. McCallum et al.
  - R mallet package by David Mimno.
  - dfrtopics package by Andrew Goldstone.
  - dfr-browser by Andrew Goldstone.
  - Codes of the following open-source projects: d3 by Mike Bostock; bootstrap by Twitter, Inc.; jQuery by the jQuery Foundation; and JSZip by Stuart Knightley.

One Comment


January 31, 2021
Ray

Hello Jadie! I like how you discussed the importance of textual analysis to this project and how it is organized by topics. I wonder to what extent the contributors’ judgments influenced the network.

Jadie

One Comment

Leave a Reply Cancel reply