Takes the results of an arxiv API query and downloads papers to disk by default. Saving to an AWS S3 bucket is also supported, though credentials should be set as environment variables prior to using download_pdf (see example). Papers are saved with their arXiv id by default, e.g. "2001.12345v1.pdf". This function imposes a minimum five-second delay between PDF downlaods to avoid abusing shared web resources.

download_pdf(
  data,
  links = "link_pdf",
  fnames = "id",
  dir = ".",
  bucket = NULL,
  delay = 5
)

Arguments

data

data frame of arXiv records.

links

name of column containing the PDF locations. link_pdf by default.

fnames

name of column whose values should be used for saved file names. id by default.

dir

directory to download files to. Current working directory by default.

bucket

name of AWS S3 bucket to save files to.

delay

number of seconds to wait between paper downloads. Default (and minimum) is 5.

Value

None -- files are saved to the specified location

Examples

if (FALSE) { # get paper metadata, then download it to working directory ml <- get_records("cs.LG", "20200801 TO 20200802", 1) download_pdf(ml) # set up S3 credentials, then download to bucket Sys.setenv("AWS_ACCESS_KEY_ID" = "key") Sys.setenv("AWS_SECRET_ACCESS_KEY" = "secretkey") Sys.setenv("AWS_DEFAULT_REGION" = "us-east-2") download_pdf(ml, bucket="mybucket") }