Addition of R table objects with the frab package

Robin K. S. Hankin

To cite in publications, please use Hankin (2023).

TLDR: In R, adding two objects of class table has a natural interpretation. However, in base R, adding two tables can give plausible but incorrect results. The frab package provides a consistent and efficient way to add table objects, subject to disordR discipline (Hankin 2022). The underlying mathematical structure is the Free Abelian group, hence “frab”.

Prologue: table()

Suppose we have three R tables:

x <- table(c("a","a","b","c","d","d","a"))
y <- table(c("a","a","b","d","d","d","e"))
z <- table(c("a","a","b","d","d","e","f"))

Can we ascribe any meaning to x+y without referring to the arguments sent to table()? Well yes, we should simply sum the counts of the various letters. However:

x
## 
## a b c d 
## 3 1 1 2
y
## 
## a b d e 
## 2 1 3 1
x+y
## 
## a b c d 
## 5 2 4 3

The sum is defined in this case. However, close inspection shows that the result is clearly incorrect. Although entries for a and b are correct, the third and fourth entries are not as expected: in this case R idiom simply adds the entries elementwise with no regard to labels. We would expect x+y to respect the fact that we have 5 d entries, even though element d is the fourth entry of x and the third of y. Further:

x
## 
## a b c d 
## 3 1 1 2
z
## 
## a b d e f 
## 2 1 2 1 1
x+z
## Error in x + z: non-conformable arrays

Above we see that x and z do not have a well-defined sum, in the sense that x+z returns, quite reasonably, an error.

Named vectors

A named vector is a vector with a names attribute. Each element of a named vector is associated with a name or label. The names are not necessarily unique. It allows you to assign a name to each element, making it easier to refer to specific values within the vector using their respective names. Named vectors are a convenient and useful feature of the R programming language (R Core Team 2022). However, consider the following two named vectors:

x <- c(a=1,b=2,c=3)
y <- c(c=4,b=1,a=1)

Given that x+y returns a named vector, there are at least two plausible values that it might give, viz:

c(a=5,b=3,c=4)

or

c(a=2,b=3,c=7).

In the first case the elements of x and y are added pairwise, and the names attribute is taken from the first of the addends. In the second, the names are considered to be primary and the value of each name in the sum is the sum of the values of that name of the addends. Note further that there is no good reason why the first answer could not be c(c=5,b=3,a=4), obtained by using the names attribute of y instead of x.

The frab package

The frab package furnishes efficient methods to give a consistent and meaningful way of adding two R tables together, using standard R syntax. It uses the names of a named vector as the indexing mechanism. Package idiom is straightforward:

(x <- frab(c(a=1,b=2,d=7)))
## A frab object with entries
## a b d 
## 1 2 7
(y <- frab(c(c=4,b=1,a=-1)))
## A frab object with entries
##  a  b  c 
## -1  1  4
x+y
## A frab object with entries
## b c d 
## 3 4 7

Above, note how y is defined with its entries in non-standard order, but the resulting frab object has its entries ordered alphabetically. In x+y, the entry for a has vanished, as it cancels in the summation. The numeric entries for each letter are summed, accounting for the different names [viz a,b,d and a,b,c respectively]. The result is presented using the frab print method.

Package idiom includes extraction and replacement methods, all of which should work as expected:

x <- frab(c(x=5,d=1,e=2,f=4,a=3,c=3,g=9))
x
## A frab object with entries
## a c d e f g x 
## 3 3 1 2 4 9 5
x>3
## A disord object with hash 8e06d464d006d7ce8c6fa1e5101a1e042bddadf6 and elements
## [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
## (in some order)
x<3
## A disord object with hash 8e06d464d006d7ce8c6fa1e5101a1e042bddadf6 and elements
## [1] FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE
## (in some order)
x[x>3]
## A frab object with entries
## f g x 
## 4 9 5
x[x<3]
## A frab object with entries
## d e 
## 1 2
x[x<3] <- 100
x
## A frab object with entries
##   a   c   d   e   f   g   x 
##   3   3 100 100   4   9   5

Above we see that extraction and replacement methods follow disordR discipline (Hankin 2022). Results are coerced to disord objects if needed. Tables may be added to frab objects:

a <- rfrab()
b <- table(sample(letters[1:8],12,replace=T))
a
## A frab object with entries
## a b c d g i 
## 3 6 1 5 7 5
b
## 
## a b c d e f g 
## 2 2 1 2 2 2 1
a+b
## A frab object with entries
## a b c d e f g i 
## 5 8 2 7 2 2 8 5

Above we see the + operator is defined between a frab and a table, coercing R tables to frab objects to give consistent results.

Note on repeated entries

If we pass a named vector with repeated names, the values are added:

frab(c(a=4,b=2,a=1))
## A frab object with entries
## a b 
## 5 2

The general rule is, given two named vectors x and y, that frab(x)+frab(y) is identical to frab(c(x,y)).

Two dimensional R tables

The ideas above have a natural generalization to two-dimensional R tables.

(x <- rspar2(9))
##    bar
## foo A  B C D F
##   b 3  0 8 0 2
##   d 5 16 0 0 6
##   f 1  0 0 4 0
(y <- rspar2(9))
##    bar
## foo A C D E F
##   a 0 0 0 9 0
##   b 0 0 0 0 8
##   e 0 0 4 0 0
##   f 7 9 8 0 0
x+y
##    bar
## foo A  B C  D E  F
##   a 0  0 0  0 9  0
##   b 3  0 8  0 0 10
##   d 5 16 0  0 0  6
##   e 0  0 0  4 0  0
##   f 8  0 9 12 0  0

Above, note that the resulting sum is automatically resized to accommodate both addends, and also that entries with nonzero values in both x and y are correctly summed.

Arbitrary-dimensioned R tables

The one- and two- dimensional R tables above have somewhat specialized print methods and the general case with dimension \(\geqslant 3\) uses methods similar to those of the spray package. We can generate a sparsetable object quite easily:

A <- matrix(0.95,3,3)
diag(A) <- 1
x <- round(rmvnorm(300,mean=rep(10,3),sigma=A/7))
x[] <- letters[x]
head(x)
##      [,1] [,2] [,3]
## [1,] "i"  "i"  "i" 
## [2,] "j"  "j"  "j" 
## [3,] "j"  "j"  "k" 
## [4,] "j"  "j"  "j" 
## [5,] "j"  "j"  "i" 
## [6,] "j"  "j"  "j"
(sx  <- sparsetable(x))
##            val
##  i i i  =   22
##  i i j  =    2
##  i j i  =    5
##  i j j  =    4
##  j i i  =    2
##  j i j  =    1
##  j j i  =    3
##  j j j  =  223
##  j j k  =    7
##  j k j  =    3
##  j k k  =    1
##  k j j  =    2
##  k j k  =    4
##  k k j  =    1
##  k k k  =   20

But we can add sx to other sparsetable objects:

(sz <- sparsetable(matrix(sample(letters[9:11],12,replace=TRUE),ncol=3),1001:1004))
##             val
##  i k k  =  1003
##  j j j  =  1004
##  j j k  =  1001
##  k k j  =  1002

Then the usual semantics for addition operate:

sx + sz
##             val
##  i i i  =    22
##  i i j  =     2
##  i j i  =     5
##  i j j  =     4
##  i k k  =  1003
##  j i i  =     2
##  j i j  =     1
##  j j i  =     3
##  j j j  =  1227
##  j j k  =  1008
##  j k j  =     3
##  j k k  =     1
##  k j j  =     2
##  k j k  =     4
##  k k j  =  1003
##  k k k  =    20

Note on nomenclature

The word “table” means something unrelated in SQL. A short discussion of frab functionality implemented in SQL “table” objects is given in inst/sql.Rmd.

References

Hankin, Robin K. S. 2022. “Disordered Vectors in R: Introducing the disordR Package.” arXiv. https://doi.org/10.48550/ARXIV.2210.03856.
———. 2023. “The Free Abelian Group in R: The frab Package.” arXiv. https://doi.org/10.48550/ARXIV.2307.13184.
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.