This is a list of historical datasets, APIs and other data resources that I may use on this site. I created some or worked on the projects that produced them, and have already used them for research or plan to do so. But I know very little about most of them and the list is a resource to dip into, and try out new things with a variety of data. There is also a list of resources for code, tools, methodologies etc.
The vast majority of the datasets are English-language textual data, and focus on British, north American or Australian sources and history. Further information should be found at the links (and via google searches). Many will be licensed as open data, but this will need further verification.
I'll add more as I find them. In part it's a reference for my own use - to give me ideas and draw together scattered bookmarks and links in a place I might actually be able find them when I want them. But others may also find it useful.
Reflecting the diversity of historical sources and the priorities of projects that have digitised them, the datasets can be expected to range from relatively small 'boutique' data to large text corpora. They might have been transcribed by hand or using OCR. They'll undoubtedly vary in terms of complexity, structure, clean/tidyness and ease of use.