Title: | USGS INL Project Office Publications |
---|---|
Description: | Contains bibliographic information for the U.S. Geological Survey (USGS) Idaho National Laboratory (INL) Project Office. |
Authors: | Jason C. Fisher [aut, cre] , Kerri C. Treinen [aut] , Allison R. Trcka [aut] |
Maintainer: | Jason C. Fisher <[email protected]> |
License: | CC0 |
Version: | 1.1.3 |
Built: | 2024-10-31 03:20:48 UTC |
Source: | https://github.com/cran/inlpubs |
Authors who have contributed to the publications by the U.S. Geological Survey (USGS), Idaho Water Science Center, Idaho National Laboratory Project Office (INLPO).
authors
authors
An object of class 'author' that inherits behavior from the 'data.frame' class and includes the following columns:
author_id
Unique identifier for the author.
name
Name of author, surname first and initials or given name.
person
Information about the person like email address and ORCiD identifier.
pub_id
Identifier(s) of the publication(s) the author has contributed to,
referes to the primry key of the pubs
data table.
total_pub
Total number of publications.
single_authored
Number of single-authored publications.
multi_authored
Number of multi-authored publications.
first_authored
Number of multi-authored publications where the researcher appears as first author.
first_year
First year author published.
last_year
Last year author published.
Curated by INLPO staff.
# Subset Jason Fisher's information and display structure: author <- authors["jfisher", ] str(author, max.level = 3, width = 75, strict.width = "cut") # Print author's given name: author$person |> format(include = "given")
# Subset Jason Fisher's information and display structure: author <- authors["jfisher", ] str(author, max.level = 3, width = 75, strict.width = "cut") # Print author's given name: author$person |> format(include = "given")
Extract an image from any PDF document. Requires that the pdftools and magick packages are available.
extract_pdf_image( input, output = tempfile(fileext = ".jpg"), page = 1, width = 300, depth = 8, quality = 70 )
extract_pdf_image( input, output = tempfile(fileext = ".jpg"), page = 1, width = 300, depth = 8, quality = 70 )
input |
'character' string. File path to PDF document. |
output |
'character' string. Location to write the JPEG image file. |
page |
'integer' number. Page number in the document. Defaults to page 1. |
width |
'integer' number. Image width in pixels. |
depth |
'integer' number. Image color depth (either 8 or 16). Defaults to 8. |
quality |
'integer' number. JPEG quality, a number between 0 and 100. Defaults to 70. |
Returns the path to the image file.
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
add_content
function to add cover images to the inlpubs package.
input <- system.file("extdata", "test.pdf", package = "inlpubs") path <- extract_pdf_image(input) unlink(path)
input <- system.file("extdata", "test.pdf", package = "inlpubs") path <- extract_pdf_image(input) unlink(path)
Extract text from any PDF document. Requires that the pdftools and tesseract packages are available.
extract_pdf_text( input, output = tempfile(fileext = ".txt"), dpi = 600, psm = 1 )
extract_pdf_text( input, output = tempfile(fileext = ".txt"), dpi = 600, psm = 1 )
input |
'character' string. File path to PDF document. |
output |
'character' string. Location to write the text file. |
dpi |
'integer' number between 100 and 1200. Dots per inch (DPI). The resolution of an image, specifically the number of pixels per inch. For optimal optical character recognition (OCR) accuracy, 600 DPI (the default) is recommended. |
psm |
|
Returns the path to the text file. Each page from the PDF is transcribed as a separate line in the file.
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
add_content
function to add texts to the inlpubs-package corpus.
## Not run: input <- system.file("extdata", "test.pdf", package = "inlpubs") path <- extract_pdf_text(input) unlink(path) ## End(Not run)
## Not run: input <- system.file("extdata", "test.pdf", package = "inlpubs") path <- extract_pdf_text(input) unlink(path) ## End(Not run)
Create a word cloud from a frequency table of words, and save to a PNG file.
Requires R-packages htmltools, htmlwidgets, magick, webshot2,
and wordcloud2 are available.
System dependencies include the the following:
ImageMagick for displaying the PNG image,
OptiPNG for PNG file compression, and
Chrome- or a Chromium-based browser
with support for the Chrome DevTools protocol.
Use find_chromate
function to find the path to the Chrome browser.
make_wordcloud( x, max_terms = 200, size = 1, shape = "circle", ellipticity = 0.65, ..., width = 910, output = NULL, display = FALSE )
make_wordcloud( x, max_terms = 200, size = 1, shape = "circle", ellipticity = 0.65, ..., width = 910, output = NULL, display = FALSE )
x |
'data.frame'. A frequency table of terms that includes "term" and "freq" in each column. |
max_terms |
'integer' number. Maximum number of terms to include in the word cloud. |
size |
'numeric' number. Font size. |
shape |
'character' string. Shape of the “cloud” to draw. Possible shapes include a "circle", "cardioid", "diamond", "triangle-forward", "triangle", "pentagon", and "star". |
ellipticity |
'numeric' number. Degree of “flatness” of the shape to draw, a value between 0 and 1. |
... |
Additional arguments to be passed to the |
width |
'integer' number. Desired image width in pixels. |
output |
'character' string. Path to the output file, by default the word cloud is copied to a temporary file. |
display |
'logical' flag. Whether to display the saved PNG file in a graphics window. Requires access to the magick package. |
File path to the word cloud plot in PNG format.
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
mine_text
function to perform a term frequency text analysis.
## Not run: d <- wordcloud2::demoFreq |> head(n = 10) colnames(d) <- c("term", "freq") file <- make_wordcloud(d, display = interactive()) unlink(file) ## End(Not run)
## Not run: d <- wordcloud2::demoFreq |> head(n = 10) colnames(d) <- c("term", "freq") file <- make_wordcloud(d, display = interactive()) unlink(file) ## End(Not run)
Performs a term frequency text analysis. A term is defined as a word or group of words.
mine_text(docs, ngmin = 1, ngmax = ngmin, sparse = NULL)
mine_text(docs, ngmin = 1, ngmax = ngmin, sparse = NULL)
docs |
'list' or 'character' vector. Document text to analyze. Each list item contains the extracted text from a single document. |
ngmin , ngmax
|
integer number. Splits strings into n-grams with given minimal and maximal numbers of grams. An n-gram is an ordered sequence of n words taken from the body of a text. Requires the RWeka package is available and that the environment variable JAVA_HOME points to where the Java software is located. Recommended for single text compoents only. |
sparse |
'numeric' number that is greater than 0 and less than 1.
A threshold of relative document frequency for a term.
It specifies the proportion of documents in which a term must appear to be retained.
For example if you specify |
HTML entities are decoded when the textutils package is available.
A term-frequency data table giving the number of times each word occurs in the text.
A column in the table represents a single component in the docs
argument,
and each row provides frequency counts for a particular word (also known as a 'term').
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
search_terms
function to search for terms within the resulting term-frequency data table.
make_wordcloud
function to create a word cloud.
d <- c( "The quick brown fox jumps over the lazy lazy dog.", "Pack my brown box.", "Jazz fly brown dog." ) |> mine_text() d <- list( "A" = "The quick brown fox jumps over the lazy lazy dog.", "B" = c("Pack my brown box.", NA, "Jazz fly brown dog."), "C" = NA_character_ ) |> mine_text()
d <- c( "The quick brown fox jumps over the lazy lazy dog.", "Pack my brown box.", "Jazz fly brown dog." ) |> mine_text() d <- list( "A" = "The quick brown fox jumps over the lazy lazy dog.", "B" = c("Pack my brown box.", NA, "Jazz fly brown dog."), "C" = NA_character_ ) |> mine_text()
Bibliographic information for reports, articles, maps, and theses related to scientific monitoring and research conducted by the U.S. Geological Survey (USGS), Idaho Water Science Center, Idaho National Laboratory Project Office (INLPO).
pubs
pubs
An object of class 'pub' that inherits behavior from the 'data.frame' class and includes the following columns:
pub_id
Unique identifier for the publication.
institution
Name of the institution that published and/or sponsored the report.
type
Type of publication.
text_ref
Text reference (also known as the in-text citation) that excludes the year of publication.
year
Year of publication.
author_id
Identifier(s) of the author(s),
referes to the primry key of the authors
data table.
title
Title of publication.
bibentry
Bibliographic entry of class bibentry
.
abstract
Abstract of publication.
annotation
Annotation of publication.
annotation_src
Identifier for the annotation source publication (Knobel and others, 2005; Bartholomay, 2022).
files
File names associated with the publication.
Many of these publications are available through the USGS Publications Warehouse.
Bartholomay, R.C., 2022, Historical development of the U.S. Geological Survey hydrological monitoring and investigative programs at the Idaho National Laboratory, Idaho, 2002-2020: U.S. Geological Survey Open-File Report 2022-1027 (DOE/ID-22256), 54 p., doi:10.3133/ofr20221027.
Knobel, L.L., Bartholomay, R.C., and Rousseau, J.P., 2005, Historical development of the U.S. Geological Survey hydrologic monitoring and investigative programs at the Idaho National Engineering and Environmental Laboratory, Idaho, 1949 to 2001: U.S. Geological Survey Open-File Report 2005–1223 (DOE/ID–22195), 93 p., doi:10.3133/ofr20051223.
# Subset Fisher and others (2012) and display structure: id <- "FisherOthers2012" pub <- pubs[id, ] str(pub, max.level = 3, width = 75, strict.width = "cut") # Print suggested citation: attr(unclass(pub$bibentry[[1]])[[1]], which = "textVersion") # Print authors full name: format(pub$bibentry[[1]]$author, include = c("given", "family")) # Print abstract: pub$abstract
# Subset Fisher and others (2012) and display structure: id <- "FisherOthers2012" pub <- pubs[id, ] str(pub, max.level = 3, width = 75, strict.width = "cut") # Print suggested citation: attr(unclass(pub$bibentry[[1]])[[1]], which = "textVersion") # Print authors full name: format(pub$bibentry[[1]]$author, include = c("given", "family")) # Print abstract: pub$abstract
Pattern matches a search term within the term-frequency data table.
search_terms( x, data = inlpubs::terms, ignore.case = TRUE, ..., low_freq = 1, high_freq = Inf, simplify = TRUE )
search_terms( x, data = inlpubs::terms, ignore.case = TRUE, ..., low_freq = 1, high_freq = Inf, simplify = TRUE )
x |
'character' string. Term searched for in the term-frequency data table. |
data |
'term' and 'data.frame' class.
Term-frequency data table.
Defaults to using the term frequencies from the INLPO publications,
see |
ignore.case |
'logical' flag. Whether to ignore character case during pattern matching. |
... |
Additional arguments passed to the |
low_freq |
'numeric' number. Lower frequency bound. |
high_freq |
'numeric' number. Upper frequency bound. |
simplify |
'logical' flag. Whether to return only the unique publication identifiers. |
A subset of the data table sorted by decreasing frequency.
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
mine_text
function to perform a term frequency text analysis.
search_terms("mlms") out <- search_terms("mlms", simplify = FALSE) head(out)
search_terms("mlms") out <- search_terms("mlms", simplify = FALSE) head(out)
Term frequency from publications by the U.S. Geological Survey (USGS), Idaho Water Science Center, Idaho National Laboratory Project Office (INLPO).
terms
terms
An object of class 'term' that inherits behavior from the 'data.frame' class and includes the following columns:
term
Term, a word or group of words, represented by an ASCII character string in lowercase.
pub_id
Identifier for a publication,
referes to the primry key of the pubs
data table.
freq
Frequency count from text analysis.
The publication text was sourced from the original PDF documents using the extract_pdf_text
function,
and term frequencies were extracted from the text using the mine_text
function.
str(terms, max.level = 3, width = 75, strict.width = "cut")
str(terms, max.level = 3, width = 75, strict.width = "cut")