Title: | Download, Explore, and Analyze Literary Theme Ontology Data |
---|---|
Description: | Download, explore, and analyze Literary Theme Ontology themes and thematically annotated story data. To learn more about the project visit <https://github.com/theme-ontology/theming> and <https://www.themeontology.org>. |
Authors: | Paul Sheridan [aut, cre] |
Maintainer: | Paul Sheridan <[email protected]> |
License: | GPL-3 |
Version: | 0.2.2 |
Built: | 2025-02-11 04:58:23 UTC |
Source: | https://github.com/theme-ontology/story |
Download, explore, and analyze Literary Theme Ontology themes and thematically annotated story data. To learn more about the project visit https://github.com/theme-ontology/theming and https://www.themeontology.org.
The stoRy package provides utilities for working with LTO data. The LTO is a hierarchically organized collection of carefully defined "themes" that can be expected to arise in multiple "stories" (i.e. works of fiction). Included in the package are functions to download and cache LTO data, explore LTO themes and thematically annotated stories, and analyze the thematically annotated story data in interesting ways.
General resources:
stoRy package GitHub repository: https://github.com/theme-ontology/stoRy
LTO project website: https://www.themeontology.org
LTO project GitHub repositories: https://github.com/theme-ontology
LTO conference paper in Proceedings of the Joint Ontology Workshops 2019 Episode V: The Styrian Autumn of Ontology
Maintainer: Paul Sheridan [email protected] (ORCID)
Authors:
Oshan Modi
Mikael Onsjö
Useful links:
Report bugs at https://github.com/theme-ontology/stoRy/issues
Clone internally stored active LTO version data.
clone_active_themes_tbl()
returns a tibble of active LTO
version themes.
clone_active_stories_tbl()
returns a tibble of active LTO
version stories.
clone_active_collections_tbl()
returns a tibble of active LTO
version collections.
clone_active_metadata_tbl()
returns a tibble of active LTO
version metadata.
clone_active_themes_tbl() clone_active_stories_tbl() clone_active_collections_tbl() clone_active_metadata_tbl()
clone_active_themes_tbl() clone_active_stories_tbl() clone_active_collections_tbl() clone_active_metadata_tbl()
Run lto-demo to view the LTO demo data help page.
Run lto to find out how to load LTO versions.
## Not run: # Make copies of the LTO demo data: set_lto("demo") themes_tbl <- clone_active_themes_tbl() stories_tbl <- clone_active_stories_tbl() collections_tbl <- clone_active_collections_tbl() metadata_tbl <- clone_metadatae_stories_tbl() ## End(Not run)
## Not run: # Make copies of the LTO demo data: set_lto("demo") themes_tbl <- clone_active_themes_tbl() stories_tbl <- clone_active_stories_tbl() collections_tbl <- clone_active_collections_tbl() metadata_tbl <- clone_metadatae_stories_tbl() ## End(Not run)
The stoRy package uses the Collection
R6 class to represent a set
of related LTO thematically annotated stories. This class is mostly useful
for accessing information about a collection of stories for which the
collection ID is known in advance.
The class operates on the story collection of whichever LTO version happens
to be actively loaded into the stoRy package level environment. This
is the LTO demo
version by default. Run which_lto()
to check which LTO
version is active in your R session.
Search the latest LTO dev
version collections on the Theme Ontology
website at https://www.themeontology.org/stories.
Alternatively, it is possible to read in a user-defined collection from
file. In this case, the collection ID as defined in the file must match the
collection_id
input parameter.
new()
Initialize a collection of LTO thematically annotated stories.
Collection$new(collection_id, file = NULL, verbose = TRUE)
collection_id
A length-one character vector corresponding to the ID of an LTO collection of stories.
file
A file name of a collection file or path to a collection file or a string. Files must end with the standard .st.txt extension used for story and collection files.
If file
is a file name, then the file is assumed to reside in the
current working directory.
verbose
A logical value indicating whether status messages should be output to console.
A new Collection
object.
collection_id()
return A length-one character vector corresponding to the collection ID.
Collection$collection_id()
title()
return A length-one character vector corresponding to the collection title.
Collection$title()
description()
return A length-one character vector of collection defining text.
Collection$description()
date()
return A length-one character vector typically of the form "yyyy-yyyy" indicating the start and end year for stories in the collection.
Collection$date()
references()
return A tibble of collection reference urls, if any.
Collection$references()
component_story_ids()
return A tibble of member story IDs.
Collection$component_story_ids()
themes()
return A tibble of thematic annotations.
Collection$themes()
source()
return The path of the st.txt collection file. This is the file path as it occurs on the Theme Ontology GitHub repository at https://github.com/theme-ontology/theming.
Collection$source()
size()
return A length-one numeric vector containing the number of stories in the collection.
Collection$size()
obj_internal_tbl()
return A special tibble that is used internally by package functions.
Collection$obj_internal_tbl()
print()
Print collection object info to console.
Collection$print(canonical = FALSE, n = NULL, width = NULL, ...)
canonical
Set to FALSE for pretty output.
n
Maximum number of component story IDs to print to console.
This defaults to NULL which means the
stoRy_opt("print_min")
value is used. Run
options(stoRy.print_min = 25L)
to set the minimum number of
printed component story IDs to be 25. Run
stoRy_opt("print_max")
to check the maximum number of stories
that can be printed to console. This value can be changed in the same
way as with stoRy.print_min
.
width
Width of text output to generate. This defaults to NULL,
which means the stoRy_opt("width")
value is used. Run
options(stoRy.width = 120L)
to change the column width to be 120
characters, etc.
...
Additional arguments
clone()
The objects of this class are cloneable with this method.
Collection$clone(deep = FALSE)
deep
Whether to make a deep clone.
Use Story()
to initialize an LTO thematically annotated story.
Use Theme()
to initialize an LTO theme.
Use Themeset()
to initialize a set of related LTO themes.
## Not run: # Initialize a collection: set_lto("demo") collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)") # Print collection info to console: collection # Print collection info in canonical st.txt format: collection$print(canonical = TRUE) # Initialize a collection from file: set_lto("demo") file <- system.file("extdata/rolling-stone-best-ttz1959-episodes.st.txt", package = "stoRy") collection_id <- "Collection: Rolling Stone 25 Best Twilight Zone Original Series Episodes" collection <- Collection$new(collection_id, file) collection # Initialize a collection from a string: set_lto("demo") file <- I("Collection: Rolling Stone 25 Best Twilight Zone Original Series Episodes ======================================================================== :: Title Rolling Stone 25 Best Twilight Zone Original Series Episodes :: Date 1959-1964 :: Description Rolling Stone Magazine's list of the 25 best episodes from the original Twilight Zone anthology television series, which ran for five seasons on CBS from 1959 to 1964, as compiled by David Fear, Sean T. Collins, and Angie Martoccio. :: References https://www.rollingstone.com/tv/tv-features/25-best-twilight-zone-episodes-list-812043/ :: Collections Collection: Rolling Stone 25 Best Twilight Zone Original Series Episodes :: Component Stories tz1959e3x24 tz1959e1x22 tz1959e2x06 tz1959e5x03 tz1959e2x15 tz1959e2x28 tz1959e1x08 tz1959e3x14 tz1959e3x05 tz1959e5x06 tz1959e3x08 tz1959e1x01 tz1959e1x21 tz1959e1x34 tz1959e2x07 tz1959e1x13 tz1959e1x09 tz1959e3x10 tz1959e1x16 tz1959e1x28 tz1959e1x30 tz1959e3x33 tz1959e3x01 tz1959e2x22 tz1959e5x25") collection_id <- unlist(strsplit(file, split = "\n"))[1] collection <- Collection$new(collection_id, file) collection ## End(Not run)
## Not run: # Initialize a collection: set_lto("demo") collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)") # Print collection info to console: collection # Print collection info in canonical st.txt format: collection$print(canonical = TRUE) # Initialize a collection from file: set_lto("demo") file <- system.file("extdata/rolling-stone-best-ttz1959-episodes.st.txt", package = "stoRy") collection_id <- "Collection: Rolling Stone 25 Best Twilight Zone Original Series Episodes" collection <- Collection$new(collection_id, file) collection # Initialize a collection from a string: set_lto("demo") file <- I("Collection: Rolling Stone 25 Best Twilight Zone Original Series Episodes ======================================================================== :: Title Rolling Stone 25 Best Twilight Zone Original Series Episodes :: Date 1959-1964 :: Description Rolling Stone Magazine's list of the 25 best episodes from the original Twilight Zone anthology television series, which ran for five seasons on CBS from 1959 to 1964, as compiled by David Fear, Sean T. Collins, and Angie Martoccio. :: References https://www.rollingstone.com/tv/tv-features/25-best-twilight-zone-episodes-list-812043/ :: Collections Collection: Rolling Stone 25 Best Twilight Zone Original Series Episodes :: Component Stories tz1959e3x24 tz1959e1x22 tz1959e2x06 tz1959e5x03 tz1959e2x15 tz1959e2x28 tz1959e1x08 tz1959e3x14 tz1959e3x05 tz1959e5x06 tz1959e3x08 tz1959e1x01 tz1959e1x21 tz1959e1x34 tz1959e2x07 tz1959e1x13 tz1959e1x09 tz1959e3x10 tz1959e1x16 tz1959e1x28 tz1959e1x30 tz1959e3x33 tz1959e3x01 tz1959e2x22 tz1959e5x25") collection_id <- unlist(strsplit(file, split = "\n"))[1] collection <- Collection$new(collection_id, file) collection ## End(Not run)
get_enriched_themes()
calculates the top m
most over-represented (or
enriched) themes in a sub-collection of interest from a background
collection.
get_enriched_themes( test_collection, background_collection = NULL, top_m = 10, weights = list(choice = 3, major = 2, minor = 1), explicit = TRUE, min_freq = 1, blacklist = NULL, metric = c("hgt", "tfidf") )
get_enriched_themes( test_collection, background_collection = NULL, top_m = 10, weights = list(choice = 3, major = 2, minor = 1), explicit = TRUE, min_freq = 1, blacklist = NULL, metric = c("hgt", "tfidf") )
test_collection |
A |
background_collection |
A If |
top_m |
Maximum number of themes to report. The default is If |
weights |
A list assigning nonnegative weights to choice, major, and
minor theme levels. The default weighting
|
explicit |
Set to |
min_freq |
Drop themes occurring less than this number of times from
the analysis. The default |
blacklist |
A If |
metric |
A character vector specifying the choice of scoring function.
Use |
The test collection of n
stories, , is
represented as a weighted bag-of-words, where each choice theme in
story
is counted
weights$choice
times,
each major theme weights$major
times, and each minor
theme weights$choice
times.
The background collection of N
stories, , is a
superset of the test collection that is likewise represented as a weighted
bag-of-words.
Theme enrichment scores are calculated according to the hypergeometric test
by default. Set metric = "tfidf"
to use TF-IDF weights for the enrichment
scores.
Returns a tibble
with top_m
rows (themes)
and 10 columns:
theme_name : |
m -th most over-represented theme in the test
collection |
k : |
Number of test collection stories featuring the theme |
k_bar : |
Weighted counts of the theme summed over the test collection stories |
n : |
Number of stories in the test collection |
n_bar : |
Sum of all weighted counts of test collection themes |
K : |
Number of background collection stories featuring the theme |
K_bar : |
Weighted counts of the theme summed over the background collection stories |
N : |
Number of stories in the background collection |
N_bar : |
Sum of all weighted counts of background collection themes |
score : |
Either the negative base 10 logarithm of the Hypergeometric
test (if metric = "hgt" ) or TF-IDF (if metric = "tfidf" ) |
Mikael Onsjö, Paul Sheridan (2020). Theme Enrichment Analysis: A Statistical Test for Identifying Significantly Enriched Themes in a List of Stories with an Application to the Star Trek Television Franchise. Digital Studies/le Champ Numérique, 10(1), 1. DOI: doi:10.16995/dscn.316
## Not run: # Retrieve the top 10 most enriched themes in "The Twilight Zone" (1959) # series episodes with all demo version stories as background: set_lto("demo") test_collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)") result_tbl <- get_enriched_themes(test_collection) result_tbl # Run the same analysis on "The Twilight Zone" (1959) series without # including minor level themes: result_tbl <- get_enriched_themes(test_collection, weights = list(choice = 1, major = 1, minor = 0)) result_tbl ## End(Not run)
## Not run: # Retrieve the top 10 most enriched themes in "The Twilight Zone" (1959) # series episodes with all demo version stories as background: set_lto("demo") test_collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)") result_tbl <- get_enriched_themes(test_collection) result_tbl # Run the same analysis on "The Twilight Zone" (1959) series without # including minor level themes: result_tbl <- get_enriched_themes(test_collection, weights = list(choice = 1, major = 1, minor = 0)) result_tbl ## End(Not run)
get_featured_themes()
calculates the top m
most frequently occurring
themes in a collection.
get_featured_themes( collection = NULL, top_m = 10, weights = list(choice = 3, major = 2, minor = 1), explicit = TRUE, min_freq = 1, blacklist = NULL )
get_featured_themes( collection = NULL, top_m = 10, weights = list(choice = 3, major = 2, minor = 1), explicit = TRUE, min_freq = 1, blacklist = NULL )
collection |
A If |
top_m |
Maximum number of themes to report. The default is If |
weights |
A list assigning nonnegative weights to choice, major, and
minor theme levels. The default weighting
|
explicit |
Set to |
min_freq |
Drop themes occurring less than this number of times from
the analysis. The default |
blacklist |
A If |
The input collection of n
stories, , is
represented as a weighted bag-of-words, where each choice theme in
story
is counted
weights$choice
times,
each major theme weights$major
times, and each minor
theme weights$choice
times.
Returns a tibble
with top_m
rows (themes)
and 6 columns:
theme_name : |
m -th most frequently occurring theme in the
collection |
k : |
Number of collection stories featuring the theme |
k_bar : |
Weighted counts of the theme summed over the collection stories |
n : |
Number of stories in the collection |
n_bar : |
Sum of all weighted counts of collection themes |
tp : |
Theme weighted term proportion (i.e. k_bar /n_bar ) |
## Not run: # Retrieve the top 10 most featured themes in "The Twilight Zone" franchise # stories: set_lto("demo") result_tbl <- get_featured_themes() result_tbl # Retrieve the top 10 most featured themes in "The Twilight Zone" franchise # stories not including any minor level themes: set_lto("demo") result_tbl <- get_featured_themes(weights = list(choice = 1, major = 1, minor = 0)) result_tbl # Retrieve the top 10 most featured themes in "The Twilight Zone" (1959) # television series episodes: collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)") result_tbl <- get_featured_themes(collection) result_tbl ## End(Not run)
## Not run: # Retrieve the top 10 most featured themes in "The Twilight Zone" franchise # stories: set_lto("demo") result_tbl <- get_featured_themes() result_tbl # Retrieve the top 10 most featured themes in "The Twilight Zone" franchise # stories not including any minor level themes: set_lto("demo") result_tbl <- get_featured_themes(weights = list(choice = 1, major = 1, minor = 0)) result_tbl # Retrieve the top 10 most featured themes in "The Twilight Zone" (1959) # television series episodes: collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)") result_tbl <- get_featured_themes(collection) result_tbl ## End(Not run)
get_similar_stories
calculates the top n
most thematically
similar stories to a given story.
get_similar_stories( query_story, background_collection = NULL, top_n = 10, weights = list(choice = 3, major = 2, minor = 1), explicit = TRUE, min_freq = 1, blacklist = NULL, metric = c("hgt", "tfidf") )
get_similar_stories( query_story, background_collection = NULL, top_n = 10, weights = list(choice = 3, major = 2, minor = 1), explicit = TRUE, min_freq = 1, blacklist = NULL, metric = c("hgt", "tfidf") )
query_story |
A |
background_collection |
A If |
top_n |
Maximum number of similar stories to report. The default is
If |
weights |
A list assigning nonnegative weights to choice, major, and
minor theme levels. The default weighting
|
explicit |
Set to |
min_freq |
Drop themes occurring less than this number of times from
the analysis. The default |
blacklist |
A If |
metric |
A character vector specifying the choice of weighting to use
in the cosine similarity measure used to evaluate story thematic
similarity.
Use The default specification of |
Returns a tibble
with top_n
rows (stories)
and 5 columns:
story_id : |
n -th most thematically similar story to the query
story |
title : |
Reference story title |
description : |
Reference story description |
score : |
Cosine similarity score with hypergeometric test weights
(if metric = "hgt" ) or TF-IDF weights (if metric = "tfidf" ). |
common_themes : |
List of themes common to both the query and reference story |
Paul Sheridan, Mikael Onsjö, Claudia Becerra, Sergio Jimenez, Georg Dueñas (2019). An Ontology-Based Recommender System with an Application to the Star Trek Television Franchise. Future Internet, 11(9), 182. DOI: doi:10.3390/fi11090182
## Not run: # Retrieve the top 10 most similar stories to the classic "The Twilight # Zone" series episode "Nightmare at 20,000 Feet" (1959): set_lto("demo") query_story <- Story$new(story_id = "tz1959e5x03") result_tbl <- get_similar_stories(query_story) result_tbl # Retrieve the top 10 most similar stories to the classic "The Twilight # Zone" series episode "Nightmare at 20,000 Feet" (1959) without taking # minor themes into account: set_lto("demo") query_story <- Story$new(story_id = "tz1959e5x03") result_tbl <- get_similar_stories(query_story, weights = list(choice = 3, major = 2, minor = 0)) result_tbl # Retrieve the top 10 most similar stories to the classic "The Twilight # Zone" series episode "Nightmare at 20,000 Feet" (1959) when implicitly # featured themes are included in the similarity calculation: set_lto("demo") query_story <- Story$new(story_id = "tz1959e5x03") result_tbl <- get_similar_stories(query_story, explicit = FALSE) result_tbl ## End(Not run)
## Not run: # Retrieve the top 10 most similar stories to the classic "The Twilight # Zone" series episode "Nightmare at 20,000 Feet" (1959): set_lto("demo") query_story <- Story$new(story_id = "tz1959e5x03") result_tbl <- get_similar_stories(query_story) result_tbl # Retrieve the top 10 most similar stories to the classic "The Twilight # Zone" series episode "Nightmare at 20,000 Feet" (1959) without taking # minor themes into account: set_lto("demo") query_story <- Story$new(story_id = "tz1959e5x03") result_tbl <- get_similar_stories(query_story, weights = list(choice = 3, major = 2, minor = 0)) result_tbl # Retrieve the top 10 most similar stories to the classic "The Twilight # Zone" series episode "Nightmare at 20,000 Feet" (1959) when implicitly # featured themes are included in the similarity calculation: set_lto("demo") query_story <- Story$new(story_id = "tz1959e5x03") result_tbl <- get_similar_stories(query_story, explicit = FALSE) result_tbl ## End(Not run)
get_story_clusters
classifies the stories in a collection according to
thematic similarity.
get_story_clusters( collection = NULL, weights = list(choice = 3, major = 2, minor = 1), explicit = TRUE, min_freq = 1, min_size = 3, blacklist = NULL )
get_story_clusters( collection = NULL, weights = list(choice = 3, major = 2, minor = 1), explicit = TRUE, min_freq = 1, min_size = 3, blacklist = NULL )
collection |
A If |
weights |
A list assigning nonnegative weights to choice, major, and
minor theme levels. The default weighting
|
explicit |
Set to |
min_freq |
Drop themes occurring less than this number of times from
the analysis. The default |
min_size |
Minimum cluster size. The default is |
blacklist |
A If |
The input collection of n
stories, , is
represented as a weighted bag-of-words, where each choice theme in
story
is counted
weights$choice
times,
each major theme weights$major
times, and each minor
theme weights$choice
times.
The function classifies the stories according to thematic similarity
using the Iterative Signature Algorithm (ISA) biclustering algorithm as
implemented in the isa2
R package. The clusters are "soft" meaning
that a story can appear in multiple clusters.
Install isa2
package by running the command
install.packages(\"isa2\")
before calling this function.
Returns a tibble
with r
rows (story
clusters) and 4 columns:
cluster_id : |
Story cluster integer ID |
stories : |
A tibble of stories comprising the cluster |
themes : |
A tibble of themes common to the clustered stories |
size : |
Number of stories in the cluster |
Gábor Csárdi, Zoltán Kutalik, Sven Bergmann (2010). Modular analysis of gene expression data with R. Bioinformatics, 26, 1376-7.
Sven Bergmann, Jan Ihmels, Naama Barkai (2003). Iterative signature algorithm for the analysis of large-scale gene expression data. Physical Review E, 67, 031902.
Gábor Csárdi (2017). isa2: The Iterative Signature Algorithm. R package version 0.3.5. https://cran.r-project.org/package=isa2
## Not run: # Cluster "The Twilight Zone" franchise stories according to thematic # similarity: library(dplyr) set_lto("demo") set.seed(123) result_tbl <- get_story_clusters() result_tbl # Explore a cluster of stories related to traveling back in time: cluster_id <- 3 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] # Explore a cluster of stories related to mass panics: cluster_id <- 5 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] # Explore a cluster of stories related to executions: cluster_id <- 7 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] # Explore a cluster of stories related to space aliens: cluster_id <- 10 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] # Explore a cluster of stories related to old people wanting to be young: cluster_id <- 11 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] # Explore a cluster of stories related to wish making: cluster_id <- 13 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] ## End(Not run)
## Not run: # Cluster "The Twilight Zone" franchise stories according to thematic # similarity: library(dplyr) set_lto("demo") set.seed(123) result_tbl <- get_story_clusters() result_tbl # Explore a cluster of stories related to traveling back in time: cluster_id <- 3 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] # Explore a cluster of stories related to mass panics: cluster_id <- 5 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] # Explore a cluster of stories related to executions: cluster_id <- 7 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] # Explore a cluster of stories related to space aliens: cluster_id <- 10 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] # Explore a cluster of stories related to old people wanting to be young: cluster_id <- 11 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] # Explore a cluster of stories related to wish making: cluster_id <- 13 pull(result_tbl, stories)[[cluster_id]] pull(result_tbl, themes)[[cluster_id]] ## End(Not run)
Download, configure, view metadata about, and navigate among different LTO versions.
which_lto()
returns a length-one character vector corresponding to the
active LTO version. This is the version that is loaded into the stoRy
package environment. It is the demo
version by default.
print_lto()
prints basic LTO version metadata to console.
fetch_lto_version_tags()
returns a character vector consisting of all
exiting LTO version tags. The LTO versioned release tags are
downloaded from the Theme Ontology GitHub website at
https://github.com/theme-ontology/theming/releases.
lto_version_statuses()
prints to console the status (available for
download/cached/defunct) for each LTO version.
configure_lto()
downloads and configures LTO version releases
hosted on the Theme Ontology website at
https://www.themeontology.org/data.
set_lto()
sets an LTO version as the active version. This means
that package functions will act on this version. The active version is
set to demo
by default.
which_lto() print_lto() fetch_lto_version_tags(verbose = TRUE) lto_version_statuses(verbose = TRUE) configure_lto( version, verbose = TRUE, overwrite_json = FALSE, overwrite_rds = FALSE ) set_lto(version, verbose = TRUE, load_background_collection = TRUE)
which_lto() print_lto() fetch_lto_version_tags(verbose = TRUE) lto_version_statuses(verbose = TRUE) configure_lto( version, verbose = TRUE, overwrite_json = FALSE, overwrite_rds = FALSE ) set_lto(version, verbose = TRUE, load_background_collection = TRUE)
Paul Sheridan, Mikael Onsjö, and Janna Hastings, The Literary Theme Ontology for Media Annotation and Information Retrieval, Proceedings of JOWO2019: The Joint Ontology Workshops, Graz, Austria, September 22-26 (https://ceur-ws.org/Vol-2518/paper-WODHSA8.pdf).
Run lto-demo to view the LTO demo data help page.
Run print_stoRy_cache_info()
to list all cached files.
## Not run: # Check which LTO version is active: which_lto() # Print summary info about active LTO version to console: print_lto() # Print summary of existing LTO versions: fetch_lto_version_tags() # Store LTO version tags as a character vector: lto_version_tags <- fetch_lto_version_tags() lto_version_tags # Configure the latest LTO version, only if it is not already setup: configure_lto(version = "latest") # Reconfigure the latest LTO version from scratch: configure_lto(version = "latest", overwrite_json = TRUE, overwrite_rds = TRUE) # Change to latest LTO version: set_lto(version = "latest") ## End(Not run)
## Not run: # Check which LTO version is active: which_lto() # Print summary info about active LTO version to console: print_lto() # Print summary of existing LTO versions: fetch_lto_version_tags() # Store LTO version tags as a character vector: lto_version_tags <- fetch_lto_version_tags() lto_version_tags # Configure the latest LTO version, only if it is not already setup: configure_lto(version = "latest") # Reconfigure the latest LTO version from scratch: configure_lto(version = "latest", overwrite_json = TRUE, overwrite_rds = TRUE) # Change to latest LTO version: set_lto(version = "latest") ## End(Not run)
A family of datasets extracted from LTO v0.3.3, comprising 2872 LTO themes and 335 thematically annotated The Twilight Zone American media franchise stories.
Found in the data are thematic annotations for
156 The Twilight Zone (1959) television series episodes
3 Twilight Zone: The Movie (1983) film sub-stories
110 The Twilight Zone (1985) television series episodes
3 Twilight Zone: Rod Serling's Lost Classics (1994) film sub-stories
43 The Twilight Zone (2002) television series episodes
20 The Twilight Zone (2019) television series episodes
The data consists of four tibble
class objects:
metadata_tbl
: a tibble
of LTO demo data summary
information with 7 rows and 2 columns:
name: | Metadatum name (e.g. version, encoding, etc.) |
value: | Metadatum value (e.g. "demo", "UTF-8", etc.) |
themes_tbl
: a tibble
of LTO demo data themes
with 2872 rows (themes) and 11 columns:
theme_index: | Integer ID (unique) |
theme_name: | Theme name (unique) |
description: | Theme definition |
notes: | Theme definition elaborations and caveats |
aliases: | Theme name aliases |
template: | Theme special cases |
parents: | Parent themes |
ancestors: | Ancestor themes |
examples: | Example usages |
references: | Reference URLs |
source: | Path to file where theme is defined |
stories_tbl
: a tibble
of LTO demo data stories
with 335 rows (stories) and 10 columns:
story_index: | Integer ID (unique) |
story_id: | Story ID (unique) |
title: | Official title |
date: | Release date |
description: | Information used for identifying the story |
component_story_ids: | Sub-story story IDs (used for frame stories) |
collections: | Collection IDs of collections to which the story belongs |
references: | Reference URLs |
themes: | Thematic annotations |
source: | Path to file where story is defined |
collections_tbl
: a tibble
of LTO demo data
collections with 5 rows (story collections) and 9 columns:
collection_index: | Integer ID (unique) |
collection_id: | Collection ID (unique) |
title: | The collection ID stripped of all colon separated prefixes |
date: | Earliest and latest dates of stories in the collection |
description: | Minimal information defining the collection |
component_story_ids: | Story IDs of member stories of the collection |
references: | Reference URLs |
themes: | Collection level thematic annotations |
source: | Path to file where collection is defined |
The data is stored internally in the package ‘R/sysdata.rda’ file and cannot be accessed directly. Check the examples below for more on how to best explore the data.
Theme Ontology. (2021, October 31). LTO v0.3.3. https://github.com/theme-ontology/theming/releases/tag/v0.3.3
The Twilight Zone. (2021, July 25). In Wikipedia https://en.wikipedia.org/wiki/The_Twilight_Zone
# Print a copy of LTO demo version metadata to console: set_lto("demo") demo_metadata_tbl <- clone_active_metadata_tbl() demo_metadata_tbl # Print a copy of LTO demo themes to console: set_lto("demo") demo_themes_tbl <- clone_active_themes_tbl() demo_themes_tbl # Print a copy of LTO demo stories to console: set_lto("demo") demo_stories_tbl <- clone_active_stories_tbl() demo_stories_tbl # Print a copy of LTO demo collections to console: set_lto("demo") demo_collections_tbl <- clone_active_collections_tbl() demo_collections_tbl # Print collection IDs to console: demo_collections_tbl$collection_id # Print The Twilight Zone (2019) component story IDs to console: library(dplyr) collection_id <- "Collection: tvseries: The Twilight Zone (2019)" demo_collections_tbl %>% filter(collection_id %in% !!collection_id) %>% pull(component_story_ids) %>% unlist(use.names = FALSE)
# Print a copy of LTO demo version metadata to console: set_lto("demo") demo_metadata_tbl <- clone_active_metadata_tbl() demo_metadata_tbl # Print a copy of LTO demo themes to console: set_lto("demo") demo_themes_tbl <- clone_active_themes_tbl() demo_themes_tbl # Print a copy of LTO demo stories to console: set_lto("demo") demo_stories_tbl <- clone_active_stories_tbl() demo_stories_tbl # Print a copy of LTO demo collections to console: set_lto("demo") demo_collections_tbl <- clone_active_collections_tbl() demo_collections_tbl # Print collection IDs to console: demo_collections_tbl$collection_id # Print The Twilight Zone (2019) component story IDs to console: library(dplyr) collection_id <- "Collection: tvseries: The Twilight Zone (2019)" demo_collections_tbl %>% filter(collection_id %in% !!collection_id) %>% pull(component_story_ids) %>% unlist(use.names = FALSE)
The stoRy package uses the Story
R6 class to represent the LTO
thematic annotations for individual works of fiction. This class is mostly
useful for accessing information about an LTO thematically annotated story
for which the story ID is known in advance.
The class operates on the stories of whichever LTO version happens to be
actively loaded into the stoRy package level environment. This is
the LTO demo
version by default. Run which_lto()
to check which LTO
version is active in your R session.
Search the latest LTO dev
version stories on the Theme Ontology website
at https://www.themeontology.org/stories.
new()
Initialize an LTO thematically annotated story.
Story$new(story_id)
story_id
A length-one character vector corresponding to the ID of an LTO thematically annotated work of fiction.
A new Story
object.
story_id()
return A length-one character vector corresponding to the story ID.
Story$story_id()
title()
return A length-one character vector corresponding to the story title.
Story$title()
description()
return A length-one character vector corresponding to some summary information about the story. This is typically a synopsis and/or details about the authorship, production, distribution, etc.
Story$description()
date()
return A length-one character vector corresponding to the story release date.
Story$date()
references()
return A tibble of story reference urls, if any.
Story$references()
collections()
return A tibble of LTO collections to which the story belongs, if any.
Story$collections()
themes()
return A tibble of thematic annotations.
Story$themes()
source()
return The path of the st.txt file containing the story thematic annotations. This is the file path as it occurs on the Theme Ontology GitHub repository at https://github.com/theme-ontology/theming.
Story$source()
obj_internal_tbl()
return A special tibble that is used internally by package functions.
Story$obj_internal_tbl()
print()
Print story object info to console.
Story$print(canonical = FALSE, width = NULL, ...)
canonical
Set to FALSE for pretty output.
width
Width of text output to generate. This defaults to NULL,
which means the stoRy_opt("width")
value is used. Run
options(stoRy.width = 120L)
to change the column width to be 120
characters, etc.
...
Additional arguments
clone()
The objects of this class are cloneable with this method.
Story$clone(deep = FALSE)
deep
Whether to make a deep clone.
Use Collection()
to initialize an collection of LTO thematically
annotated stories.
Use Theme()
to initialize an LTO theme.
Use Themeset()
to initialize a set of related LTO themes.
## Not run: # Initialize the LTO `demo` version of a classic The Twilight Zone (1959) story: set_lto("demo") story <- Story$new(story_id = "tz1959e1x22") # Print story info and thematic annotations to console: story # Print story info and thematic annotations in st.txt format: story$print(canonical = TRUE) # Return the story title: story$title() # Return the story description: story$description() # Return a tibble of thematic annotations: story$themes() ## End(Not run)
## Not run: # Initialize the LTO `demo` version of a classic The Twilight Zone (1959) story: set_lto("demo") story <- Story$new(story_id = "tz1959e1x22") # Print story info and thematic annotations to console: story # Print story info and thematic annotations in st.txt format: story$print(canonical = TRUE) # Return the story title: story$title() # Return the story description: story$description() # Return a tibble of thematic annotations: story$themes() ## End(Not run)
Manage stoRy package cached files.
stoRy_cache_path()
returns the file path where stoRy package files
are cached.
stoRy_cache_details()
returns a tibble of cached file names with
accompanying metadata. File size is in MB.
delete_lto_version_cached_files()
delete cached files associated with an
LTO version.
delete_stoRy_cached_files()
delete cached files. Takes a vector of file
path strings as input.
delete_all_cached_stoRy_files()
clear cache.
print_stoRy_cache_info()
print to console cache contents.
stoRy_cache_path() stoRy_cache_details() delete_lto_version_cached_files(version, force = TRUE, verbose = TRUE) delete_stoRy_cached_files(files, force = TRUE) delete_all_cached_stoRy_files(force = TRUE) print_stoRy_cache_info()
stoRy_cache_path() stoRy_cache_details() delete_lto_version_cached_files(version, force = TRUE, verbose = TRUE) delete_stoRy_cached_files(files, force = TRUE) delete_all_cached_stoRy_files(force = TRUE) print_stoRy_cache_info()
version |
A length-one character vector specifying an LTO version tag. Set to "latest" to configure the latest numbered version, and "dev" to configure the LTO developmental version. |
force |
Set to TRUE to force delete files. |
verbose |
A logical value indicating whether status messages should be output to console. |
files |
A list of file path strings to delete from cache. |
## Not run: # list files in cache stoRy_list_cached_files() # List info for all files print_stoRy_cache_info() # delete all files in cache # stoRy_delete_all_cached_files() ## End(Not run)
## Not run: # list files in cache stoRy_list_cached_files() # List info for all files print_stoRy_cache_info() # delete all files in cache # stoRy_delete_all_cached_files() ## End(Not run)
stoRy_opt()
sets stoRy package global options.
stoRy_opt(x)
stoRy_opt(x)
x |
A character string holding an option name. The possible values are
|
The package options control the formatting of the Story()
, Theme()
,
Collection()
, and Themeset()
R6 classes when printing to console. The
options are as follows:
width : |
The column width in characters of printed to console output (78 characters by default) |
print_min : |
The minimum number of entries to print to console (10 characters by default) |
print_max : |
The maximum number of entries to print to console (100 characters by default) |
Use Story()
to initialize an LTO thematically annotated story.
Use Theme()
to initialize an LTO theme.
Use Collection()
to initialize an collection of LTO thematically
annotated stories.
Use Themeset()
to initialize a set of related LTO themes.
## Not run: # Check the current option values: stoRy_opt("width") stoRy_opt("print_min") stoRy_opt("print_max") # Set the column width to 120 characters: options(stoRy.width = 120L) # Set the minimum number of printed entries to be 25: options(stoRy.print_min = 25L) # Set the maximum number of printed entries to be 250: options(stoRy.print_max = 250L) ## End(Not run)
## Not run: # Check the current option values: stoRy_opt("width") stoRy_opt("print_min") stoRy_opt("print_max") # Set the column width to 120 characters: options(stoRy.width = 120L) # Set the minimum number of printed entries to be 25: options(stoRy.print_min = 25L) # Set the maximum number of printed entries to be 250: options(stoRy.print_max = 250L) ## End(Not run)
The stoRy package uses the Theme
R6 class to represent individual
LTO literary themes. This class is mostly useful for accessing information
about an LTO theme for which the theme name is known in advance.
The class operates on the themes of whichever LTO version happens to be
actively loaded into the stoRy package level environment. This is
the LTO demo
version by default. Run which_lto()
to check which LTO
version is active in your R session.
Search the latest LTO dev
version themes on the Theme Ontology website at
https://www.themeontology.org/themes.
new()
Initialize an LTO theme.
Theme$new(theme_name)
theme_name
A length-one character vector corresponding to an LTO theme name.
A new Theme
object.
theme_name()
return A length-one character vector corresponding to the theme name.
Theme$theme_name()
aliases()
return A tibble of theme name aliases.
Theme$aliases()
description()
return A length-one character vector corresponding to the theme description.
Theme$description()
notes()
return A tibble of caveats that accompany that theme description. This is empty for the vast majority of themes.
Theme$notes()
references()
return A tibble of references.
Theme$references()
examples()
return A tibble of example usages of the theme, if any.
Theme$examples()
parents()
return A tibble of the theme's parent theme names.
Theme$parents()
ancestors()
return A tibble of the theme's ancestor theme names.
Theme$ancestors()
source()
return The path of the th.txt file containing the theme. This is the file path as it occurs on the Theme Ontology GitHub repository at https://github.com/theme-ontology/theming.
Theme$source()
annotations()
return A tibble with one row for each story in which the theme is featured. The first column is the LTO story ID, the second is the release data, the third is the level (choice/major/minor) at which the theme is featured, and the fourth a justification for applying the theme. Each column is of type character.
Theme$annotations()
print()
Print theme object info to console.
Theme$print(canonical = FALSE, width = NULL, ...)
canonical
Set to FALSE for pretty output.
width
Width of text output to generate. This defaults to NULL,
which means the stoRy_opt("width")
value is used. Run
options(stoRy.width = 120L)
to change the column width to be 120
characters, etc.
...
Additional arguments.
clone()
The objects of this class are cloneable with this method.
Theme$clone(deep = FALSE)
deep
Whether to make a deep clone.
Use Collection()
to initialize an collection of LTO thematically
annotated stories.
Use Story()
to initialize an LTO thematically annotated story.
Use Themeset()
to initialize a set of related LTO themes.
## Not run: # Initialize an LTO `demo` version theme pertaining to Martians: set_lto("demo") theme <- Theme$new(theme_name = "Martian extraterrestrial") # Print theme info to console: theme # Print theme info to console in the canonical th.txt format: theme$print(canonical = TRUE) # Return the theme description: theme$description() # Return references associated with the description, if any: theme$references() # Return theme name aliases, if any: theme$aliases() # Return the theme's parent theme names: theme$parents() # Return the theme's ancestor theme names: theme$ancestors() # Return a tibble of thematically annotated stories: theme$annotations() ## End(Not run)
## Not run: # Initialize an LTO `demo` version theme pertaining to Martians: set_lto("demo") theme <- Theme$new(theme_name = "Martian extraterrestrial") # Print theme info to console: theme # Print theme info to console in the canonical th.txt format: theme$print(canonical = TRUE) # Return the theme description: theme$description() # Return references associated with the description, if any: theme$references() # Return theme name aliases, if any: theme$aliases() # Return the theme's parent theme names: theme$parents() # Return the theme's ancestor theme names: theme$ancestors() # Return a tibble of thematically annotated stories: theme$annotations() ## End(Not run)
The stoRy package uses the Themeset
R6 class to represent user
defined collections of themes. Themesets are currently in an experimental
stage of development, but can be expected to become an integrate part of
future stoRy package analysis functions.
Various themesets are hosted on the themesets Theme Ontology GitHub repository https://github.com/theme-ontology/themesets.
new()
Initialize a collection of LTO themes.
Themeset$new(file, verbose = TRUE)
file
Either a file name, a path to a file, a url, or a single single must contain at least one newline to be recognized as such (as string as opposed to a path or url). Files must end with the standard .thset.txt extension used for themeset files.
If file
is a file name, then the file is assumed to reside in the
current working directory.
verbose
A logical value indicating whether status messages should be output to console.
A new Themeset
object.
themeset_id()
return A length-one character vector corresponding to the themeset ID.
Themeset$themeset_id()
description()
return A length-one character vector corresponding to the themeset description.
Themeset$description()
component_theme_names()
return A tibble of member themes.
Themeset$component_theme_names()
size()
return A length-one numeric vector containing the number of themes in the themeset.
Themeset$size()
obj_internal_tbl()
a pre-computed table used internally by package functions
Themeset$obj_internal_tbl()
print()
Print collection object info to console.
Themeset$print(canonical = FALSE, n = NULL, width = NULL, ...)
canonical
Set to FALSE for pretty output.
n
Maximum number of component theme names to print to console.
This defaults to NULL which means the
getOption("stoRy.print_min")
value is used. Run
options(stoRy.print_min = 25L)
to set the minimum number of
printed component theme names to be 25. Run
stoRy_opt("print_max")
to check the maximum number of themes
that can be printed to console. This value can be changed in the same
way as with stoRy.print_min
.
width
Width of text output to generate. This defaults to NULL,
which means the stoRy_opt("width")
value is used. Run
options(stoRy.width = 120L)
to change the column width to be 120
characters, etc.
...
Additional arguments.
clone()
The objects of this class are cloneable with this method.
Themeset$clone(deep = FALSE)
deep
Whether to make a deep clone.
Use Collection()
to initialize an collection of LTO thematically
annotated stories.
Use Story()
to initialize an LTO thematically annotated story.
Use Theme()
to initialize an LTO theme.
## Not run: # Initialize a themeset from file: set_lto("demo") file <- system.file("extdata/immortality.thset.txt", package = "stoRy") themeset <- Themeset$new(file) # Print themeset info to console: themeset #' # Read themeset from a url and print to console: set_lto("demo") file <- paste0( "https://raw.githubusercontent.com/theme-ontology/", "master/demo/immortality.thset.txt" ) themeset <- Themeset$new(file) themeset # Initialize a themeset directly from a string and print to console: set_lto("demo") file <- I("Themeset: immortality ===================== :: Description Themes related to people living on well beyond what is considered to be a normal human lifespan. :: Component Themes immortality the flip side of immortality the quest for immortality") themeset <- Themeset$new(file) themeset ## End(Not run)
## Not run: # Initialize a themeset from file: set_lto("demo") file <- system.file("extdata/immortality.thset.txt", package = "stoRy") themeset <- Themeset$new(file) # Print themeset info to console: themeset #' # Read themeset from a url and print to console: set_lto("demo") file <- paste0( "https://raw.githubusercontent.com/theme-ontology/", "master/demo/immortality.thset.txt" ) themeset <- Themeset$new(file) themeset # Initialize a themeset directly from a string and print to console: set_lto("demo") file <- I("Themeset: immortality ===================== :: Description Themes related to people living on well beyond what is considered to be a normal human lifespan. :: Component Themes immortality the flip side of immortality the quest for immortality") themeset <- Themeset$new(file) themeset ## End(Not run)