| Title: | Work with Chunks of Data |
|---|---|
| Description: | Helps to work with chunks of data in parallel and to cache the results of each chunk. It's basic approach to handling somewhat large datasets and long runtimes. |
| Authors: | Mauro Lepore [aut, cre] (ORCID: <https://orcid.org/0000-0002-1986-7988>), 2 Degrees Investing Initiative [cph, fnd] |
| Maintainer: | Mauro Lepore <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.0.9001 |
| Built: | 2026-05-23 22:20:04 UTC |
| Source: | https://github.com/maurolepore/dchunkr |
file column to a data frameAdds the file column to a data frame
add_file(data, parent, ext = ".rds")add_file(data, parent, ext = ".rds")
data |
A data frame. |
parent |
A directory. |
ext |
An extension. |
A data frame.
data <- tibble::tibble(chunk = 1) add_file(data, parent = tempdir())data <- tibble::tibble(chunk = 1) add_file(data, parent = tempdir())
path() constructs a relative path, path_wd() constructs an absolute path
from the current working directory.
cache_path(..., cache_dir = rappdirs::user_cache_dir("dchunkr"))cache_path(..., cache_dir = rappdirs::user_cache_dir("dchunkr"))
... |
character vectors, if any values are NA, the result will also be NA. The paths follow the recycling rules used in the tibble package, namely that only length 1 arguments are recycled. |
cache_dir |
Character. A directory for the cache. |
Character. A path.
path_home(), path_package() for functions to construct paths
relative to the home and package directories respectively.
dir <- withr::local_tempfile() fs::dir_exists(dir) cache_path("b", cache_dir = dir) fs::dir_exists(dir)dir <- withr::local_tempfile() fs::dir_exists(dir) cache_path("b", cache_dir = dir) fs::dir_exists(dir)
Nest a data frame by chunks containing all elements of a group
nest_chunk(data, .by, chunks)nest_chunk(data, .by, chunks)
data |
A data frame. |
.by |
A column which values should be. |
chunks |
Integer. Number of chunks. |
A nested data frame.
data <- tibble::tibble(id = c(1, 1, 1, 2, 3)) out <- nest_chunk(data, .by = "id", chunks = 3) out$datadata <- tibble::tibble(id = c(1, 1, 1, 2, 3)) out <- nest_chunk(data, .by = "id", chunks = 3) out$data
Order the rows of a data frame
order_rows(data, .fun = c("identity", "rev", "sample"))order_rows(data, .fun = c("identity", "rev", "sample"))
data |
Data frame. |
.fun |
Character. A function name. |
Data frame.
withr::local_seed(123) data <- tibble::tibble(x = 1:5) order_rows(data) order_rows(data, "rev") order_rows(data, "sample")withr::local_seed(123) data <- tibble::tibble(x = 1:5) order_rows(data) order_rows(data, "rev") order_rows(data, "sample")
Add and filter (un)done files in a data frame
pick_undone(data)pick_undone(data)
data |
A data frame with the column |
A data frame.
data <- tibble::tibble(file = c( withr::local_tempdir(pattern = "exists_"), withr::local_tempfile(pattern = "doesnt_exist_") )) data pick_undone(data)data <- tibble::tibble(file = c( withr::local_tempdir(pattern = "exists_"), withr::local_tempfile(pattern = "doesnt_exist_") )) data pick_undone(data)