Takes the results of an arxiv API query and downloads papers to disk by default. Saving to an AWS S3 bucket is also supported, though credentials should be set as environment variables prior to using download_pdf
(see example).
Papers are saved with their arXiv id by default, e.g. "2001.12345v1.pdf".
This function imposes a minimum five-second delay between PDF downlaods to avoid abusing shared web resources.
download_pdf( data, links = "link_pdf", fnames = "id", dir = ".", bucket = NULL, delay = 5 )
data | data frame of arXiv records. |
---|---|
links | name of column containing the PDF locations. |
fnames | name of column whose values should be used for saved file names. |
dir | directory to download files to. Current working directory by default. |
bucket | name of AWS S3 bucket to save files to. |
delay | number of seconds to wait between paper downloads. Default (and minimum) is 5. |
None -- files are saved to the specified location
if (FALSE) { # get paper metadata, then download it to working directory ml <- get_records("cs.LG", "20200801 TO 20200802", 1) download_pdf(ml) # set up S3 credentials, then download to bucket Sys.setenv("AWS_ACCESS_KEY_ID" = "key") Sys.setenv("AWS_SECRET_ACCESS_KEY" = "secretkey") Sys.setenv("AWS_DEFAULT_REGION" = "us-east-2") download_pdf(ml, bucket="mybucket") }