Minor in Data Science in the Humanities

Requirements & Course Descriptions


The 15-unit minor is necessarily flexible to accommodate the various backgrounds and goals of its students. The curriculum addresses data management, statistics, text analysis, geospatial analysis, digital prosopography, data visualization and information design. It entails experience in digital project work, and features a good deal of cross-disciplinary engagement. Our goal is to enrich the analytic skills that students can bring to bear on traditional and emerging topics across the humanities.

Course Requirements

Take 6 units from Core Curriculum:

  • IPH 312: Introduction to Digital Humanities (3 units)
  • IPH 431: DASH 1 Statistics for Humanities (3 units) or               
  • IPH 430, DAMS Data Management Skills (1 unit) and
  • IPH 432, PROTA Programming for Text Analysis (2 units)

Take 3 units of work on a faculty project in the HDW.  Most students will earn these units by participating in the HDW summer workshop; some will pursue this as an independent study during the academic year.

Complete the minor by taking 6 units from the following:

  • CSE 131: Introduction to Computer Science I (3 units)
  • CSE 247: Data Structures and Algorithms (3 units)
  • CSE 330S: Rapid Prototype Development and Creative Programming (3 units)     
  • Ling 317: Introduction to Computational Linguistics (3 units)
  • IPH 425: Humanities by the Numbers: Shakespeare (3 units)
  • U13 221: Spatial Data Modeling and Design (3 units)
  • U69 3003: Digital Cartography (3 units)
  • XCore 304: Visualizing Data (3 units)
  • L93 399: Internship in Digital Humanities (Independent research/Additional work on a faculty project) (up to 6 units)  
  • Additional courses from Core Curriculum (up to 6 units)
  • Digital Humanities courses, 300- or 400-level, home-based in a Humanities dept. (up to 6 units)

Course Descriptions

IPH 312 Introduction to Digital Humanities: Cultural Analysis
(3 units)

While computers have changed the way we think and interact, systematic efforts to apply current technologies to the study of history and culture have been rare. This course will consider how these technologies might transform the humanities. We will explore the various ways in which ideas and data in the humanities can be represented, analyzed, and communicated using computational tools and techniques. We will also reflect on how the expansion of information technology has transformed and is continuing to transform the humanities, both with regard to their role in the university and in society at large. Readings and class work will be supplemented by small assigned digital projects culminating in a project chosen by the students themselves. No prior experience with technology is required.

IPH 431, DASH 1: Statistics for Humanities Scholars
(3 units)

A survey of statistical ideas and principles.  The course will expose students to tools and techniques useful for quantitative research in the humanities, many of which will be addressed more extensively in other courses: tools for text-processing and information extraction, natural language processing techniques, clustering & classification, and graphics.  The course will consider how to use qualitative data and media as input for modeling and will address the use of statistics and data visualization in academic and public discourse.  By the end of the course students should be able to evaluate statistical arguments and visualizations in the humanities with appropriate appreciation and skepticism.

Details: Core topics include sampling, experimentation, chance phenomena, distributions, exploration of data, measures of central tendency and variability, and methods of statistical testing and inference.  In the early weeks, students will develop some facility in the use of Excel; thereafter, students will learn how to use Python or R for statistical analyses.

IPH 430, DAMS: Data Manipulation for the Humanities
(1 unit, offered, ideally, in the same semester as DASH 1)

The course will present basic data modeling concepts and will focus on their application to data clean-up and organization (text markup, Excel, and SQL).  Aiming to give humanities students the tools they will need to assemble and manage large data sets relevant to their research, the course will teach fundamental skills in programming relevant to data management (using Python); it will also teach database design and querying (SQL).  

Details: The course will cover a number of “basics”: the difference between word processing files, plain text files, and structured XML; best practices for version control and software “hygiene”; methods for cleaning up data; regular expressions (and similar tools built into most word processors).  It will proceed to data modeling: lists (Excel, Python); identifiers/keys and values (Excel, Python, SQL); tables/relations (SQL and/or data frames); joins (problem in Excel, solution in SQL, or data frames); hierarchies (problem in SQL/databases, solution in XML); and network graph structures (nodes and edges in CSV).  It will entail basic scripting in Python, concentrating on using scripts to get data from the web, and the mastery of string handling.

IPH 432, PROTA: Programming for Text Analysis
(2 units, offered as an independent sequel to DAMS.  We will make an effort to schedule this during most summer terms.)

This course will cover the core data-scientific concepts required for analyzing large corpora of texts and will introduce basic programming together with text-analysis techniques relevant to the humanities. (There will be very slight overlap with the programming instruction in the statistics and data-management courses.)

Details: Students will learn to calculate basic corpus-statistics, and will develop facility with such techniques as tokenization, chunking, extraction of thematically significant words, stylometrics and authorship attribution. Later in the course, more advanced topics from natural language processing such as stemming, lemmatization, named-entity recognition, part-of-speech tagging will be introduced along with a survey of text-classification terminology.

IPH 4xx, DASH 2: Advanced Data Science for the Humanities
(3 units; prerequisite, either DASH 1, DAMS, or PROTA)

This course will offer a broad survey of advanced data-analysis techniques widely used in digital humanities scholarship.  It will present basic data-mining and machine-learning terminology and techniques, an overview of network analysis and visualization, and spatial analysis. Designed for students with some familiarity with programming, text-analysis, and statistics, the course will look at a wide range of information analysis, visualization, and, perhaps, sonification techniques in the context of qualitative humanistic data. Specific techniques and algorithms that are widely used in digital humanities literature such as principal component analysis, topic-modeling, and the use of force-directed networks will be covered in detail. The focus of the course will not be on a rigorous understanding of the mathematical foundations of these techniques but a broader survey that will allow students to engage critically with scholarship in the field and also to have a clear sense of what approaches might be applicable to their own work.

Details: As a pre-requisite, students should take one of the three courses listed above (in statistics, data management, or text analysis). Topics will include vector-spaces, data-mining and pattern identification using clustering and classification, cross-validation, the extraction and analysis of relationships with networks and basic graph-theoretic techniques, and a survey of spatial thinking and computational modeling of geospatial data in the humanities. Attention will be given to techniques linking the results of analyses to other resources, e.g. transforming recognized name-entities into triples, and mapping to shared, unified ontology schema.  Other topics might be added.