Download PDFs — download

Takes the results of an arxiv API query and downloads papers to disk by default. Saving to an AWS S3 bucket is also supported, though credentials should be set as environment variables prior to using download_pdf (see example). Papers are saved with their arXiv id by default, e.g. "2001.12345v1.pdf". This function imposes a minimum five-second delay between PDF downlaods to avoid abusing shared web resources.

download_pdf(
  data,
  links = "link_pdf",
  fnames = "id",
  dir = ".",
  bucket = NULL,
  delay = 5
)

Arguments

data	data frame of arXiv records.
links	name of column containing the PDF locations. `link_pdf` by default.
fnames	name of column whose values should be used for saved file names. `id` by default.
dir	directory to download files to. Current working directory by default.
bucket	name of AWS S3 bucket to save files to.
delay	number of seconds to wait between paper downloads. Default (and minimum) is 5.

Value

None -- files are saved to the specified location

Examples

if (FALSE) {
# get paper metadata, then download it to working directory
ml <- get_records("cs.LG", "20200801 TO 20200802", 1)
download_pdf(ml)

# set up S3 credentials, then download to bucket
Sys.setenv("AWS_ACCESS_KEY_ID" = "key")
Sys.setenv("AWS_SECRET_ACCESS_KEY" = "secretkey")
Sys.setenv("AWS_DEFAULT_REGION" = "us-east-2")
download_pdf(ml, bucket="mybucket")
}