Data Sources


Sharon Howard

This is a list of historical datasets, APIs and other data resources that I may use on this site. I created some or worked on the projects that produced them, and have already used them for research or plan to do so. But I know very little about most of them and the list is a resource to dip into, and try out new things with a variety of data. There is also a list of resources for code, tools, methodologies etc.

The vast majority of the datasets are English-language textual data, and focus on British, north American or Australian sources and history. Further information should be found at the links (and via google searches). Many will be licensed as open data, but this should be verified before use.

I’ll add more as I find them. In part it’s a reference for my own use - to give me ideas and draw together scattered bookmarks and links in a place I might actually be able find them when I want them. But others may also find it useful.

Reflecting the diversity of historical sources and the priorities of projects that have digitised them, the datasets can be expected to range from relatively small ‘boutique’ data to large text corpora. They might have been transcribed by hand or using OCR. They’ll undoubtedly vary in terms of complexity, structure, clean/tidyness and ease of use.

My data projects


Literary/linguistic text corpora

Women and gender

Social history


Economic/environmental/material culture

Visual/literary/performing arts


Art/museum collections


Geospatial data

Useful R data packages for historians

More data sources

Code resources, tools, tutorials

What I’m (not getting round to) reading



Working with data

Data mining and textual analysis

R resources


Digital history: discussion, methodology