--- title: "An Introduction to stoRy" author: - Paul Sheridan - Mikael Onsjö - Oshan Modi date: "`r format(Sys.time(), '%B %d, %Y')`" bibliography: assets/stoRy_refs.bib output: rmarkdown::html_vignette description: This vignette introduces stoRy package functions for downloading, exploring, and analyzing Literary Theme Ontology (LTO) data. Familiarity with LTO themes and the related collaborative system of story thematic annotation is helpful, but not absolutely necessary for working through this vignette. vignette: > %\VignetteIndexEntry{stoRy} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Background The *Theme Ontology Project* is an open access, community-based fiction studies undertaking to * define common literary themes (or "themes" for short), * classify defined themes into a hierarchically structured controlled-vocabulary, and * annotate works of fiction with the themes within a collaborative framework. The LTO refers to the hierarchically structured collection of literary themes [@Sheridan2019b]. The current developmental version of the LTO contains over 3,000 carefully defined themes. To date over 3,000 stories (e.g., films, novels, TV series episodes, video game plots, etc.) have been annotated with LTO themes. All data is hosted on the Theme Ontology GitHub repository [theming](https://github.com/theme-ontology/theming). It can be explored on the Theme Ontology [website](https://www.themeontology.org/). The `stoRy` package is used to perform various statistical analyses on the data. These tells us interesting things about the kind of stories we humans invent. ## Installation The package is hosted on CRAN and can be installed by running the command ```{r, eval = FALSE} install.packages("stoRy") ``` The developmental version is hosted on GitHub and can be installed using the devtools package: ```{r, eval = FALSE} # install.packages("devtools") # devtools::install_github("theme-ontology/stoRy") ``` Once installed, the package can be loaded by running the standard library command ```{r, eval = FALSE} library(stoRy) ``` ## Accessing Documentation Each function in the package is documented. The command ```{r, eval = FALSE} help(package = "stoRy") ``` gives a cursory overview of the package and a complete list of package functions. Help with using functions is obtained in the usual R manner. For instance, the documentation for the `get_similar_stories` function can be accessed with the command ```{r, eval = FALSE} ?get_similar_stories ``` The command ```{r, eval = FALSE} citation("stoRy") ``` prints to console everything needed to cite the package in a publication. ## Quick Start: The Twilight Zone Franchise Demo Data This section is a good starting point for first time users. Included in the package is a toy dataset, extracted from the latest LTO version, comprising some 2,945 LTO themes and 335 thematically annotated *The Twilight Zone* American media franchise stories [@Wikipedia2021]. The themes are hierarchically arranged into three domains descended from an abstract root theme: ``` literary thematic entity ├── the human world ├── the natural world └── alternate reality ``` Contained in the demo data are the following *Twilight Zone* thematically annotated stories: * 156 *The Twilight Zone* (1959) television series episodes * 3 *Twilight Zone: The Movie* (1983) film sub-stories * 110 *The Twilight Zone* (1985) television series episodes * 3 *Twilight Zone: Rod Serling's Lost Classics* (1994) film sub-stories * 43 *The Twilight Zone* (2002) television series episodes * 20 *The Twilight Zone* (2019) television series episodes Users may avail themselves of the demo data to experiment with package functions without having to download any official LTO version data to their local machine. To begin check that the `demo` LTO version is active ```{r, eval = FALSE} which_lto() ``` Should it be that another LTO version is actively loaded, switch to the `demo` version ```{r, eval = FALSE} set_lto(version = "demo") ``` Print LTO `demo` version summary information to console ```{r, eval = FALSE} print_lto() ``` A detailed description of the demo data included in the package can be viewed by running the command ```{r, eval = FALSE} ?`lto-demo` ``` **Pro Tip**: The demo data, and LTO version data more generally, is internally stored in such a way that prohibits users from modifying it. The following sequence of commands, however, clones the data: ```{r, eval = FALSE} demo_metadata_tbl <- clone_active_metadata_tbl() demo_themes_tbl <- clone_active_themes_tbl() demo_stories_tbl <- clone_active_stories_tbl() demo_collections_tbl <- clone_active_collections_tbl() ``` See `?lto-demo` for more details on exploring the output tibble contents. ### Example: Initialize a Theme The social phenomenon of dangers, be they real or imagined, spreading through a community as a result of rumors and fear is explored in numerous works of fiction, including several *Twilight Zone* episodes. The LTO captures hysteria of this kind with the theme [mass hysteria](https://www.themeontology.org/theme/mass%20hysteria). To explore "mass hysteria" theme ("demo" version) initialize the `Theme` class object ```{r, eval = FALSE} theme <- Theme$new(theme_name = "mass hysteria") ``` The theme entry can be printed to console in two ways ```{r, eval = FALSE} # Print stylized text: theme # Print in plain text .th.txt file format: theme$print(canonical = TRUE) ``` Story thematic annotations are stored as a tibble ```{r, eval = FALSE} theme$annotations() ``` See `?Theme` for more on `Theme` class objects. **A Note on Finding Themes**: LTO developmental themes are easily explored using the [theme search box](https://www.themeontology.org/themes) on the project website. Chances are that any theme found in the developmental version will also exist in the demo version. So searching for themes on the website offers a practical approach to finding interesting themes to initialize in an R session. **Pro Tip**: Demo version themes are explorable in tibble format. For example, here is one way to search for "mass hysteria" directly in the demo themes: ```{r, eval = FALSE} # install.packages("dplyr") suppressMessages(library(dplyr)) # install.packages("stringr") library(stringr) demo_themes_tbl <- clone_active_themes_tbl() demo_themes_tbl %>% filter(str_detect(theme_name, "mass")) ``` Notice that all themes containing the substring `"mass"` are returned. The `dplyr` package is required to run the `%>%` mediated pipeline. ### Example: Initialize a Story Thematically annotated stories are initialized by story ID. For example, run ```{r, eval = FALSE} story <- Story$new(story_id = "tz1959e1x22") ``` to initialize a `Story` class object representing the "mass hysteria" featuring classic *Twilight Zone* (1959) television series episode *The Monsters Are Due on Maple Street*. Story thematic annotations along with episode identifying metadata can be printed to console ```{r, eval = FALSE} # In stylized text format: story # In plain text .st.txt file format: story$print(canonical = TRUE) ``` A tibble of thematic annotations is obtained by running ```{r, eval = FALSE} themes <- story$themes() themes ``` See `?Theme` for more on `Theme` class objects. **A Note on Finding Story IDs**: The project website [story search box](https://www.themeontology.org/stories) offers a quick-and-dirty way of locating LTO developmental version story IDs of interest. Since story IDs are stable, developmental version *The Twilight Zone* story IDs can be expected to agree with their demo data counterparts. **Pro Tip**: A demo data story ID is directly obtained from an episode title as follows: ```{r, eval = FALSE} title <- "The Monsters Are Due on Maple Street" demo_stories_tbl <- clone_active_stories_tbl() story_id <- demo_stories_tbl %>% filter(title == !!title) %>% pull(story_id) story_id ``` The `dplyr` package is again required to run the `%>%` mediated pipeline. ### Example: Initialize a Collection Each story belongs to at least one collection (i.e. a set of related stories). *The Monsters Are Due on Maple Street*, for instance, belongs to the two collections ```{r, eval = FALSE} story$collections() ``` To initialize a `Collection` class object for *The Twilight Zone* (1959) television series, of which *The Monsters Are Due on Maple Street* is an episode, run: ```{r, eval = FALSE} collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)") ``` Collection info is printed to console in the same way as with themes and stories ```{r, eval = FALSE} # Print stylized text: collection # Print in plain text .st.txt file format: collection$print(canonical = TRUE) ``` **A Note on Finding Collection IDs**: As with stories, LTO developmental version collections can be explored from the project website [story search box](https://www.themeontology.org/stories). Developmental and demo version collection IDs should generally match up. This is in particular the case with *Twilight Zone* collection IDs. **Pro Tip**: Demo version collections can be directly explored in the usual way ```{r, eval = FALSE} demo_collections_tbl <- clone_active_collections_tbl() demo_collections_tbl ``` ### Example: Find the Topmost Featured Themes in a Collection To view the top 10 most featured themes in the *The Twilight Zone* (1959) series run: ```{r, eval = FALSE} collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)") result_tbl <- get_featured_themes(collection) result_tbl ``` To view the top 10 most featured themes in the demo data as a whole run ```{r, eval = FALSE} result_tbl <- get_featured_themes() result_tbl ``` ### Example: Find the Topmost Enriched Themes in a Collection To view the top 10 most enriched, or over-represented themes in *The Twilight Zone* (1959) series with all *The Twilight Zone* stories as background run ```{r, eval = FALSE} test_collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)") result_tbl <- get_enriched_themes(test_collection) result_tbl ``` To run the same analysis not counting *minor* level themes run ```{r, eval = FALSE} result_tbl <- get_enriched_themes(test_collection, weights = list(choice = 1, major = 1, minor = 0)) result_tbl ``` The theory and methods implemented in the `get_enriched_themes` function are described in [@Onsjo2020]. ### Example: Find Similar Stories to a Story of Interest To view the top 10 most thematically similar *Twilight Zone* franchise stories to *The Monsters Are Due on Maple Street* run ```{r, eval = FALSE} query_story <- Story$new(story_id = "tz1959e1x22") result_tbl <- get_similar_stories(query_story) result_tbl ``` The theory and methods implemented in the `get_similar_stories` function are described in [@Sheridan2019a]. ### Example : Find Clusters of Similar Stories Cluster *The Twilight Zone* franchise stories according to thematic similarity by running ```{r, eval = FALSE} set.seed(123) result_tbl <- get_story_clusters() result_tbl ``` The command `set.seed(123)` is run here for the purpose of reproducibility. Explore a cluster of stories related to traveling back in time ```{r, eval = FALSE} cluster_id <- 3 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] ``` Explore a cluster of stories related to mass panics ```{r, eval = FALSE} cluster_id <- 5 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] ``` Explore a cluster of stories related to executions ```{r, eval = FALSE} cluster_id <- 7 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] ``` Explore a cluster of stories related to space aliens ```{r, eval = FALSE} cluster_id <- 10 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] ``` Explore a cluster of stories related to old people wanting to be young ```{r, eval = FALSE} cluster_id <- 11 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] ``` Explore a cluster of stories related to wish making ```{r, eval = FALSE} cluster_id <- 13 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] ``` ## Download and Configure LTO Version Data The package works with data from these LTO versions ```{r, eval = FALSE} lto_version_statuses() ``` ### Example: Download and Configure the Latest LTO Version To download and cache the latest versioned LTO release run ```{r, eval = FALSE} configure_lto(version = "latest") ``` This can take awhile. Load the newly configured LTO version as the active version in the R session: ```{r, eval = FALSE} set_lto(version = "latest") ``` To double check that it has been loaded successfully run ```{r, eval = FALSE} which_lto() ``` Now that the latest LTO version is loaded into the R session, its stories and themes can be analyzed in the same way as with the "demo" LTO version data as shown above. ## References