Title: | Work with Chunks of Data |
---|---|
Description: | Helps to work with chunks of data in parallel and to cache the results of each chunk. It's basic approach to handling somewhat large datasets and long runtimes. |
Authors: | Mauro Lepore [aut, cre] , 2 Degrees Investing Initiative [cph, fnd] |
Maintainer: | Mauro Lepore <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.0.9001 |
Built: | 2024-11-12 03:10:03 UTC |
Source: | https://github.com/maurolepore/dchunkr |
file
column to a data frameAdds the file
column to a data frame
add_file(data, parent, ext = ".rds")
add_file(data, parent, ext = ".rds")
data |
A data frame. |
parent |
A directory. |
ext |
An extension. |
A data frame.
data <- tibble::tibble(chunk = 1) add_file(data, parent = tempdir())
data <- tibble::tibble(chunk = 1) add_file(data, parent = tempdir())
path()
constructs a relative path, path_wd()
constructs an absolute path
from the current working directory.
cache_path(..., cache_dir = rappdirs::user_cache_dir("dchunkr"))
cache_path(..., cache_dir = rappdirs::user_cache_dir("dchunkr"))
... |
character vectors, if any values are NA, the result will also be NA. The paths follow the recycling rules used in the tibble package, namely that only length 1 arguments are recycled. |
cache_dir |
Character. A directory for the cache. |
Character. A path.
path_home()
, path_package()
for functions to construct paths
relative to the home and package directories respectively.
dir <- withr::local_tempfile() fs::dir_exists(dir) cache_path("b", cache_dir = dir) fs::dir_exists(dir)
dir <- withr::local_tempfile() fs::dir_exists(dir) cache_path("b", cache_dir = dir) fs::dir_exists(dir)
Nest a data frame by chunks containing all elements of a group
nest_chunk(data, .by, chunks)
nest_chunk(data, .by, chunks)
data |
A data frame. |
.by |
A column which values should be. |
chunks |
Integer. Number of chunks. |
A nested data frame.
data <- tibble::tibble(id = c(1, 1, 1, 2, 3)) out <- nest_chunk(data, .by = "id", chunks = 3) out$data
data <- tibble::tibble(id = c(1, 1, 1, 2, 3)) out <- nest_chunk(data, .by = "id", chunks = 3) out$data
Order the rows of a data frame
order_rows(data, .fun = c("identity", "rev", "sample"))
order_rows(data, .fun = c("identity", "rev", "sample"))
data |
Data frame. |
.fun |
Character. A function name. |
Data frame.
withr::local_seed(123) data <- tibble::tibble(x = 1:5) order_rows(data) order_rows(data, "rev") order_rows(data, "sample")
withr::local_seed(123) data <- tibble::tibble(x = 1:5) order_rows(data) order_rows(data, "rev") order_rows(data, "sample")
Add and filter (un)done files in a data frame
pick_undone(data)
pick_undone(data)
data |
A data frame with the column |
A data frame.
data <- tibble::tibble(file = c( withr::local_tempdir(pattern = "exists_"), withr::local_tempfile(pattern = "doesnt_exist_") )) data pick_undone(data)
data <- tibble::tibble(file = c( withr::local_tempdir(pattern = "exists_"), withr::local_tempfile(pattern = "doesnt_exist_") )) data pick_undone(data)