

Enjoy! Exploring the Digitised Books Collection from Trove by Adel Rahmani ¶ Then we recombine the various parts in random combinations to create delicious recipes for all occasions.

We try to clean things up a bit, using regular expressions to discard likely OCR errors. In this notebook we use TextBlob to extract nouns, verbs, and sentences from the OCRd text of a 19th century cookery book. The text is downloaded from the Cloudstor repository created by the full harvest of Trove digitised books. This notebook provides a simple example of extracting word and ngram frequencies from the OCRd text of a digitised book using TextBlob and Wordcloud. Exploring harvested books ¶ Counting words and phrases ¶ But it occured to me it might be possible to get the full text of other books in Trove by making use of the links to the Open Library. Previously I've harvested the text of books digitised by the National Library of Australia and made available through Trove.
#THE TROVE BOOKZ ARCHIVE#
Getting the text of Trove books from the Internet Archive ¶ Most of this metadata isn't available through the Trove API. In poking around to try and find a way of automating the download of OCR text from Trove's digitised books, I discovered that there's lots of useful metadata embedded in the page of a digitised work. Results of the harvest are available below.

This notebook harvests metadata and OCRd text from digitised works in Trove's book zone. Harvesting data ¶ Harvesting the text of digitised books (and ephemera) ¶ Or just take them for a spin using Binder. See below for information on running these notebooks in a live computing environment. You can access metadata from the book zone through the Trove API. Trove's 'book' zone includes books (of course), but also ephemera (like pamphlets and leaflets) and theses.
