The course will present basic data modeling concepts and will focus on their application to data clean-up and organization (text markup, Excel, and SQL). Aiming to give humanities students the tools they will need to assemble and manage large data sets relevant to their research, the course will teach fundamental skills in programming relevant to data management (using Python); it will also teach database design and querying (SQL). The course will cover a number of "basics": the difference between word processing files, plain text files, and structured XML; best practices for version control and software "hygiene"; methods for cleaning up data; regular expressions (and similar tools built into most word processors). It will proceed to data modeling: lists (Excel, Python); identifiers/keys and values (Excel, Python, SQL); tables/relations (SQL and/or data frames); joins (problem in Excel, solution in SQL, or data frames); hierarchies (problem in SQL/databases, solution in XML); and network graph structures (nodes and edges in CSV). It will entail basic scripting in Python, concentrating on using scripts to get data from the web, and the mastery of string handling.
Data Manipulation for the Humanities
