Computer Vision in Digital Humanities
Martijn Kleppe, Matthew Lincoln, Melvin Wevers, Mark Williams, Benoit Seguin, Thomas Smits
This workshop will focus on how computer vision can be applied within the realm of Audiovisual Materials in Digital Humanities. During the workshop, attendees will both present (ongoing) work on applying computer vision and experiment with computer vision in their own work in a hands-on session.
The workshop will consist of:

  • a keynote by Lindsay King & Peter Leonard (Yale University) on “Processing Pixels: Towards Visual Culture Computation”.
  • Paper presentations. Papers will be selected by a review commission.
  • A hands-on session to experiment with open source Computer Vision tools. This session will be led by Benoit Seguin of École Polytechnique Fédérale de Lausanne, (EPFL).
  • Lightning Talks allowing participants to share their ideas, projects or ongoing work in a short presentation of two minutes.

All details can be found on the Call for Abstracts at the website:

Séance d’initiation à la vidéo adaptée à la recherche SHS
Christian Dury

Cadrage de la problématique. L’image fixe ou animée est un outil de plus en plus important dans les recherches en sciences humaines et sociales. Son usage vient questionner le chercheur dans l’évolution du processus de recherche : recueil des données, analyse et restitution des résultats. Partant de ce constat, cette séance d’initiation se veut un espace de réflexion sur la place de l’image animée dans une recherche. Comment l’intégrer dans le travail scientifique ? Que peut-elle apporter au chercheur ? Quelles modifications relationnelles peut-elle entraîner dans le rapport au terrain ? Dans une perspective de construction des savoirs, une vidéo permet de rendre compte de façon explicite des résultats de recherche en prenant plusieurs formes. Par exemple, elle peut être utilisée brute dans le cadre de recherche expérimentale, pour faire réagir à un phénomène par exemple. Elle peut se construire sous forme de documentaire pour expliciter, communiquer ou valoriser une théorie.Qu’il s’agisse d’écrire un documentaire, de filmer une manifestation, d’enregistrer un entretien ou de capter une expérience, l’objectif de mise en image amène les chercheurs à s’interroger autour de la question : « qu’est-ce que filmer en Sciences Humaines et Sociales ? ».

Objectifs. Rien ne vaut la pratique pour se familiariser avec la technique audiovisuelle. Cette séance permet de découvrir les moyens d’analyser/restituer/communiquer/valoriser avec l’image animée autour de thématiques de recherche en SHS. Cette séance d’initiation a pour objectif d’acquérir une autonomie avec le matériel sur des tournages légers et aborder l’ensemble des phases de réalisation (de l’écriture à la diffusion). Lors de cette séance, les principales règles de captation vidéo seront abordées sous forme d’exercices pratiques autour de l’entretien filmée, par exemple. Ces images serviront de base pour aborder le montage numérique de données vidéos sur un logiciel de traitement des images et du son.


Let’s Develop an Infrastructure for Historical Research Tools
Julia Luise Damerow, Dirk Wintergrün, Robert Casties
Scholars conducting historical research are provided with a growing range of digital humanities tools, supporting different phases of the research process. However, many of these tools cannot easily be combined into an integrated workflow by a researcher because of incompatible interfaces or because they require programming skills. We propose a full-day workshop to connect different tools and services in order to build a tool infrastructure for historical research. We would like to gather developers and programming-literate scholars to share their tool-building experiences and to present our first practical steps to create a system that integrates multiple applications to work with historical documents from scan to analysis. We envision the results of this workshop to be a concrete integration roadmap and an organizational strategy for cooperation and collaboration among tool developers.

The Design of Historical Data Projects. Les Registres de la Comédie Française, and Le Laboratoire Paris XVIII
Jamie Folsom, Pascal Bastien, Jeffrey Ravel, Sara Harvey, Julien Puget, Benjamin Deruelle
The Comédie Française Registers Project (CFRP) and the Laboratoire Paris XVIII (LP18) aim to understand social and cultural phenomena by collecting and utilizing data from relevant historical documents, and by creating communities of practice around those data using web technologies.
Through hands-on engagement with the CFRP dataset, participants will gain a shared context for discussion of historical data projects.
We will introduce LP18, which aims to create a collaborative workspace to support the compilation of datasets about life in 18th-century Paris from primary source materials, and visualizations of those data in time and space.
Participants in the workshop will then engage in a discussion about how LP18 and similar projects can maximize accessibility and utility.
This workshop is aimed at researchers and developers, and presented in French and English.

Transkribus: Handwritten Text Recognition Technology for Historical Documents
Louise Seaward, Maria Kallio
Transkribus offers the world’s first implementation of a freely available Handwritten Text Recognition engine, capable of being trained to recognise handwriting of all types and languages. This technology facilitates the full-text transcription and searching of historical collections, ultimately making it easier for archivists, researchers and members of the public to access and explore cultural heritage records. This workshop will show how the Transkribus transcription platform can be used to perform the automated transcription and searching of handwritten documents.

Hands on Text Analytics with Orange
Ajda Pretnar, Niko Colnerič, Lan Žagar
This tutorial will introduce the participants to Orange, a visual programming environment for data mining, suitable for both beginners and experts. A particular emphasis will be placed on its Text add-on, which offers components for text mining, visualization and deep- learning-based embedding.
This is a hands-on tutorial, where the participants will actively construct analytical workflows and go through case studies with the help of the instructors. At the end of the workshop, they will know how to manage textual data, preprocess it, use machine learning, data projection and visualisation techniques to expose hidden patterns and evaluate the resulting models. And most of all, they will learn how to use visual programming to seamlessly construct powerful data analysis workflows, which can be applied to a wide range of challenges in digital humanities.

Shaping Humanities Data: Use, Reuse, and Paths Toward Computationally Amenable Cultural Heritage Collections
Thomas Padilla, Sarah Potvin, Laurie Allen, Stewart Varner
Galleries, libraries, archives, and museums (GLAMs) increasingly seek to make their collections accessible as data, optimized for computational methods and tools common to the Digital Humanities. GLAMs are hampered in this approach by the absence of best practices and an incomplete understanding of how digital humanists, among others, are using and reusing cultural heritage data. This day-long workshop aims to make progress towards bridging that gap by engaging directly with digital humanists’ existing and projected research and pedagogical practices that draw upon collections-as-data. Blending talks and demos (solicited through a CFP), guided discussion, and workshopping of the draft “Always Already Computational” community framework and guides, the session will focus on how researchers and educators use cultural heritage collections that have been made accessible as data, and will extend to consider how these uses should inform collection creation and access.

From Texts to Networks: Combining Entity and Segment Annotations in the Analysis of Large Text Corpora
Nils Reiter, Maximilian Overbeck, Sandra Murr
In this half-day tutorial we will offer a full-fledged, implemented and tested workflow that has been developed in the interdisciplinary Center for Reflected Text Analytics (CRETA). Our focus is the valid and reliable identification of various kinds of entities and segments from raw, un-annotated texts and the extraction of specific relational information via network visualizations. Given the recent interest in networks for data representation and visualization, we argue that the following three-step-workflow is applicable to many research questions in the Social Sciences and Humanities.

CATMA 5.0 Tutorial
Jan Christoph Meister, Evelyn Gius, Jan Horstmann, Janina Jacke, Marco Petris
CATMA (Computer Aided Textual Markup and Analysis) is a web-based text analysis and annotation tool that combines explorative text analysis via (semi-)automated functions with flexible manual annotation options. What sets CATMA apart from other digital annotation methods is its ‘undogmatic’ approach: the system does neither prescribe defined annotation schemata or rules, nor does it force the user to apply rigid yes/no, right/wrong taxonomies to texts (even though it allows for more prescriptive schemata as well). The tool is especially designed for humanists with little technical knowledge. Participants of the tutorial will be taken in a step-by-step, hands-on approach through the full cycle of a CATMA-based text investigation, and will finally have the opportunity of testing the tool with regards to their own research interests.

XQuery for Digital Humanists
Joseph Wicentowski, Clifford Anderson
This half-day tutorial introduces digital humanists at any level of experience to XQuery, a mature, high-level programming language used in many DH projects because it is purpose-built for analyzing, manipulating, and publishing data stored in the XML-based data formats that many DH projects use, e.g., TEI, EAD, MODS, METS. Prominent XQuery-based projects include the Carl Maria von Weber Gesamtausgabe, Foreign Relations of the United States, and
Led by two experts who each have a decade of experience using and teaching XQuery and who have co-authored XQuery for Humanists, this half-day tutorial introduces key concepts underlying the XQuery language and the kinds of analysis that it makes possible. The focus will be on exploring TEI-encoded editions with simple XQuery expressions, providing participants with sufficient hands-on experience to start exploring their own scholarly editions and metadata with XQuery. We presuppose participants will come with a basic understanding of XML and TEI.

XQuery for Data Integration
Joseph Wicentowski, Clifford Anderson
This half-day tutorial shows how XQuery integrates digital humanities data from multiple sources and formats. Drawing on the latest features of XQuery 3.1, the instructors demonstrate how to draw together information from the most common structured data formats, namely, JSON, CSV, RDF, and XML. We will teach some of the latest features of the XQuery language, including how to work with maps, arrays, and new functions like json-doc(), parse-json() and json-to-xml().
Specifically, we will explore the following data sources: (1) dictionary data (in JSON) from the Oxford English Dictionary; (2) an Open Publication Distribution System-based ebook catalog that make publications at the U.S. Department of State searchable, browsable and downloadable via OPDS-compliant ereader apps like Shubook, Hyphen, etc.; (3) an OpenRefine reconciliation endpoint API built to let people run their own lists of people against biographical databases; and (4) interacting with IIIF APIs.

High Performance Computing for Photogrammetry and OCR Made Easy
Quinn Dombrowski, Tassie Gniady, Megan Meredith-Lobay, John Simpson
Computationally-intensive research methods have seen increasing adoption among digital humanities scholars, but for scholars outside R1 institutions with robust computing environments, techniques like photogrammetry or text recognition within images can easily monopolize desktop computers for days at a time. National compute infrastructures in North America (Compute Canada and XSEDE) are a compelling alternative, providing no-cost compute allocations for researchers and offering support from technical staff interested in and familiar with humanities computing needs. This workshop will introduce participants to Compute Canada and XSEDE, cover how to obtain a compute allocation (including for researchers outside of the US and Canada), and proceed through two hands-on tutorials on research methods that benefit from the additional compute power provided by these infrastructures: 1) photogrammetry using PhotoScan and 2) using OCR via Tesseract to extract metadata from images.

Introduction to Electronic Books: EPub 3.0 and Beyond
Michael Sperberg-McQueen, Liam Quin
This tutorial offers an introduction to the EPub 3.0 standard for electronic books and the outlook for further work in the area. An introductory consideration of electronic books in general places them (and EPub) in the context of a general discussion of textuality and digitization; an overview of the various existing technologies used by the EPub specification places EPub in the context of Web standards and the development of the web. Various challenges in text presentation will be considered, and the capabilities and limits of EPub will be discussed. The tutorial will end with a discussion of the current state of EPub standardization and the prospects for further progress, after the recent merger of the International Digital Publishing Federation (IDPF), which developed the EPub specification, with the World Wide Web Consortium.

Construire sa Bibliothèque Numérique avec l’Outil Libre Omeka pour Valoriser ses Documents Numérisés
Daniel Berthereau
Cet atelier d’initiation a pour objectif de permettre aux participants de construire leur première bibliothèque numérique avec le logiciel libre Omeka S.
Les bibliothèques et leur déclinaison numérique demeurent un lieu essentiel pour les étudiants et les chercheurs de toutes disciplines. C’est aussi un lieu où peuvent se retrouver les chercheurs et le public des amateurs éclairés, notamment dans la logique de la recherche participative (identification d’informations, folksonomie, commentaires, etc.).
L’atelier vise à montrer que créer une bibliothèque virtuelle est à la portée de tous, que l’on soit une grande institution, un service d’archives, un laboratoire ou toute autre structure disposant de fonds documentaires ou archivistiques, ou même un simple bibliothécaire.
Il sera divisé en quatre temps : réflexions autour de la conception de sa bibliothèque numérique, découverte et installation d’Omeka et de ses modules, création de documents et import de métadonnées et de fichiers, réalisation d’une exposition virtuelle.

Advancing Linked Open Data in the Humanities
Susan Brown, Abigel Lemak, Kim Martin, Robert Warren
Since its inception, Linked Open Data (LOD) has been primarily about publishing and defining data standards. As the technologies have matured and the amount of data available for consumption has dramatically increased, the question of consumption and processing is now at the forefront.
We invite scholars working on LOD projects within the larger spectrum of the humanities to participate in a workshop that aims to understand the limits of current work in this area.
Participants will circulate summaries of their work and a pitch for an idea that would help advance LOD for the humanities. Pitches will be ranked and the top few selected for pecha kuchas at the workshop, the bulk of which will be devoted to smaller groups working towards advancing specific ideas. Outcomes will be summarized and published as a report or white paper along with contributions from participants and other members of the LOD community.

Digital Scholarship and Privacy-sensitive Collections
Unmil Karadkar, King Davis
Recent developments in digitization and dissemination technologies present the possibility of making restricted or privacy-sensitive archival collections broadly available for humanities scholarship. However, broad exposure can exacerbate threats to the privacy of individuals named in these records as well as their families and descendants, who may bear no responsibility for the acts or afflictions contained therein. In the physical world, access to these records is protected by distance, physical access, and a variety of national and local legal statutes. The legal framework for digital records is substantially behind that for physical records.
This workshop will invite broad participation from scholars and practitioners who work with or are interested in issues surrounding humanities scholarship supported or enhanced by digital, privacy-sensitive collections.