Package 'dchunkr'

Title: Work with Chunks of Data
Description: Helps to work with chunks of data in parallel and to cache the results of each chunk. It's basic approach to handling somewhat large datasets and long runtimes.
Authors: Mauro Lepore [aut, cre] , 2 Degrees Investing Initiative [cph, fnd]
Maintainer: Mauro Lepore <[email protected]>
License: MIT + file LICENSE
Version: 0.0.0.9001
Built: 2024-11-12 03:10:03 UTC
Source: https://github.com/maurolepore/dchunkr

Help Index


Adds the file column to a data frame

Description

Adds the file column to a data frame

Usage

add_file(data, parent, ext = ".rds")

Arguments

data

A data frame.

parent

A directory.

ext

An extension.

Value

A data frame.

Examples

data <- tibble::tibble(chunk = 1)
add_file(data, parent = tempdir())

Create a path to the cache directory, ensuring it exists

Description

path() constructs a relative path, path_wd() constructs an absolute path from the current working directory.

Usage

cache_path(..., cache_dir = rappdirs::user_cache_dir("dchunkr"))

Arguments

...

character vectors, if any values are NA, the result will also be NA. The paths follow the recycling rules used in the tibble package, namely that only length 1 arguments are recycled.

cache_dir

Character. A directory for the cache.

Value

Character. A path.

See Also

path_home(), path_package() for functions to construct paths relative to the home and package directories respectively.

Examples

dir <- withr::local_tempfile()
fs::dir_exists(dir)
cache_path("b", cache_dir = dir)
fs::dir_exists(dir)

Nest a data frame by chunks containing all elements of a group

Description

Nest a data frame by chunks containing all elements of a group

Usage

nest_chunk(data, .by, chunks)

Arguments

data

A data frame.

.by

A column which values should be.

chunks

Integer. Number of chunks.

Value

A nested data frame.

Examples

data <- tibble::tibble(id = c(1, 1, 1, 2, 3))
out <- nest_chunk(data, .by = "id", chunks = 3)
out$data

Order the rows of a data frame

Description

Order the rows of a data frame

Usage

order_rows(data, .fun = c("identity", "rev", "sample"))

Arguments

data

Data frame.

.fun

Character. A function name.

Value

Data frame.

Examples

withr::local_seed(123)

data <- tibble::tibble(x = 1:5)
order_rows(data)
order_rows(data, "rev")
order_rows(data, "sample")

Add and filter (un)done files in a data frame

Description

Add and filter (un)done files in a data frame

Usage

pick_undone(data)

Arguments

data

A data frame with the column file.

Value

A data frame.

Examples

data <- tibble::tibble(file = c(
  withr::local_tempdir(pattern = "exists_"),
  withr::local_tempfile(pattern = "doesnt_exist_")
))
data

pick_undone(data)