T&D - Assignment

Picking the Text

Pick material that’s out of copyright or has been placed in the public domain. Make sure to verify the parameters specifically associated with your class.

Suggestions

When picking your text, think about who and what kind of text you might be interested in working with. 

For searching any of the following catalogs, consider looking for ((travel OR voyage OR description) AND ([COUNTRY] OR [CITY]) AND ([EVENT] OR [DATE]). Example search. 

Note: Also think about the quality of the actual type. 

Ask whether:

  • Has a human already transcribed the text?  You can often find some of those texts in services like Project Gutenberg (https://www.gutenberg.org/). 
  • Has the text already had OCR  (for what OCR is, see wikipedia) run across it?
    You can find several of these texts in the internet archive (http://archive.org) or Hathi Trust (if you institution has an account, https://www.hathitrust.org/).
    If the text is clean typeface and you want to run OCR across it, go to Transkribus Lite (https://transkribus.eu/lite/). 
    Remember that OCR is machine generated and will often include mistakes. You will want to make sure to clean up the transcription. 
  • Can you read the text well enough to transcribe it yourself?
    If you want to transcribe a text from the original, you’ll want to use a program like Scripto through omeka s (talk to the project manager). 
    Consider submitting the final transcriptions to dataverse for preservation as a Text file (.txt). 

Transform your text into a structured data set

Once you have your chosen text cleaned in a machine readable format, you’re ready to get it restructured in a spreadsheet form.

Why?

Transforming semi-structured data (like a book) into structured data (like a Google Sheet) is necessary for working with most computer software, including Leaflet, which is what the World Travel and Description project uses. Leaflet–the open access software processing the data behind Travel & Description–cannot read a book. It can read columns and rows. In consequence, any data to be added must be done as a Sheet (in Google Sheet or Excel) following a set of precise rules. If the rules aren’t followed, Leaflet won’t be able to process the data.

Steps

To move forward,

  1. Download a copy of the Travel & Description Data Set Template.
  2. Rename is to be Travel Description–[Author]–[Date].
  3. Set up your book or other material as a structured dataset following these guidelines for the different fields.

Note that for field (like book title) that are the same in every row, you can opt to just fill in the first row and then hide those rows (fill down using ⌘ + d (Ctrl + d in Windows)).

Do up a short author bio-set

In the Travel & Description Data Set Template, there is a second tabbed sheet for author bios. For information on each field, look at the guide.

For each traveler you work with, you should put together a short dataset looking at who they were. A lot of the information can be found in the texts themselves - or in Encyclopedia and other scholarship.

Why?

The traveler bio-set allows the Travel & Description interface to permit users to search by different groups of people traveling for different reasons. Without this kind of dataset up, the only information users could search by would be place and date.

Steps

The fields you should pay particular attention to include for “factual” biographical data the travel first name; their family name; the years they were born and died.

On the geographic side you should then include the person’s city of birth and the modern country name where that city is located.

You should also do your best to include a series of identity labels that may be uncertain. Those identity labels include the person’s likely sex and gender; their probably religion; and their occupation.

You should also include their type of voyage as

  • personal
  • diplomatic
  • forced, military
  • religious

If you aren’t sure how to handle a field, speak with your instructor or the T&D project manager.

Submit your dataset

Send the link to your dataset to the project manager for review and approval. The project manager will check your fields and make sure the data is clean. 

The project manager may write back and request changes. The most common request is to request an update for citations and the inclusion of page numbers. 

Wait for upload

The project manager updates the map and data inclusions twice a year, at the end of the U.S. academic semester calendar. That is usually early December for the Fall semester and early May for the Spring semester. 

Note: For background information, this dataset is hosted in GitHub and then embedded in the dhprojects.bc.edu site.

Share your work

You are encouraged to share your work.