Skip to content

[R] Bug: Partial matching on $metadata$r causes errors with schema metadata keys starting with "r" #50163

@pdmetcalfe

Description

@pdmetcalfe

Describe the bug, including details regarding any error messages, version, and platform.

Description

Three methods in the arrow R package access R metadata using x$metadata$r. Because $ on a named list uses partial matching, any schema-level metadata key that starts with "r" but is not "r" (e.g. "rachel", "row_count", "result") will be erroneously matched and its value passed to apply_arrow_r_metadata() or used as group var metadata. This causes spurious "Invalid metadata$r" warnings or hard errors depending on the matched value.

The fix in all three locations is to replace x$metadata$r with x$metadata[["r"]].

Affected code

  • collect.ArrowTabular: apply_arrow_r_metadata(df, x$metadata$r)
  • as.data.frame.ArrowTabular: apply_arrow_r_metadata(df, x$metadata$r)
  • group_vars.ArrowTabular: x$metadata$r$attributes$.group_vars

Reprex

library(arrow)
library(dplyr)

# Build a table with a schema metadata key that starts with "r" but isn't "r".
# This can happen when integrating with systems that attach their own metadata
# (e.g., a key called "rachel", "row_count", "result", etc.).
tbl <- arrow_table(x = 1:3)
tbl_rachel <- tbl$cast(
  tbl$schema$WithMetadata(list(rachel = "some_value"))
)

# Confirm that $r partial-matches to $rachel, while [["r"]] correctly returns NULL
meta <- tbl_rachel$metadata
meta$r        # "some_value"  <-- partial match: WRONG
meta[["r"]]   # NULL          <-- exact match: correct

# as.data.frame() spuriously warns "Invalid metadata$r"
as.data.frame(tbl_rachel)
#> Warning message: Invalid metadata$r

# collect() same spurious warning
collect(tbl_rachel)
#> Warning message: Invalid metadata$r

# group_vars() hard errors because it does x$metadata$r$attributes$.group_vars
# and "$" is invalid on an atomic vector
group_vars(tbl_rachel)
#> Error in x$metadata$r$attributes : $ operator is invalid for atomic vectors

Expected behaviour

  • as.data.frame() and collect() should return the data without any warning — there is no "r" metadata key, so no R metadata should be applied.
  • group_vars() should return character(0) — there are no group vars encoded.

Actual behaviour

  • as.data.frame() and collect() emit a spurious "Invalid metadata$r" warning.
  • group_vars() throws "$ operator is invalid for atomic vectors".

Root cause

schema$metadata returns a plain R list. R's $ operator performs partial matching on lists, so meta$r resolves to meta$rachel when no exact "r" key exists. The fix is to use [[ (which never partial-matches) everywhere $metadata$r appears:

# Before (all three methods)
x$metadata$r

# After
x$metadata[["r"]]

Session Info

R version 4.6.0 (2026-04-24)
Platform: aarch64-apple-darwin25.4.0
Running under: macOS Tahoe 26.5.1

Matrix products: default
BLAS:   /opt/homebrew/Cellar/openblas/0.3.33/lib/libopenblasp-r0.3.33.dylib
LAPACK: /opt/homebrew/Cellar/r/4.6.0/lib/R/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] dplyr_1.2.1  arrow_24.0.0

loaded via a namespace (and not attached):
 [1] assertthat_0.2.1 R6_2.6.1         bit_4.6.0        tidyselect_1.2.1
 [5] magrittr_2.0.5   glue_1.8.1       tibble_3.3.1     pkgconfig_2.0.3
 [9] bit64_4.8.2      generics_0.1.4   lifecycle_1.0.5  cli_3.6.6
[13] vctrs_0.7.3      compiler_4.6.0   purrr_1.2.2      pillar_1.11.1
[17] rlang_1.2.0

Component(s)

R

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions