sqlhelper
was written to streamline SQL integration in
packages designed to meet organizations’ particular analytical needs, in
stable data contexts. The pattern of integrating SQL with R code in
analysis packages is intended to:
- make use of the data-wrangling capacity of databases and SQL;
- take advantage of R’s packaging system to write reliable, understandable code;
- keep all the code for a task in one place.
This note describes how to use sqlhelper
safely inside a
package.
Connections
If more than one connection is needed for a project, the easiest
approach will be to define them in a yaml file as described in
vignette("connections")
. The yaml file is placed under
inst
in the root of the package and accessed with
devtools::system.file()
(see r-pkgs on using
installed files).
sqlhelper::connect(
system.file( "sqlhelper_connection_conf.yml" ),
exclusive=TRUE
)
Connections that cannot be defined this way may be defined within the package. How this is handled will depend on considerations such as how long each connection will be needed for, how widely it will need to be shared between other package components, and whether it will need to be exposed to users.
If a connection is needed once and may be used and closed during the execution of a single function it may be sufficient to define it within the namespace of that function and close it on or before the function exits, for example:
get_some_data <- function(){
conn <- DBI::dbConnect(a_driver, "a connection string")
d <- sqlhelper::run_files(
system.file("SQL/my_sql_file.SQL"),
default.conn = conn
)
DBI::dbDisconnect(conn)
d
}
A connection that needs to be shared across functions or function calls, but not exposed to users, may be stored in an environment in the package’s top-level namespace, for example:
assign("connection_store",
new.env(parent = emptyenv()),
environment())
<- function(){
connect assign(
"c1",
::dbConnect(a_driver, "a connection string"),
DBIenvir = connection_store
)
}
<- function(){
get_some_data ::run_files(
sqlhelpersystem.file("SQL/my_sql_file.SQL"),
default.conn = connection_store$c1
)
}
# This is a bit belt-and-braces, but thorough.
<- function(){
disconnect ::dbDisconnect(connection_store$c1)
DBI$c1 <- NULL
connection_storerm(list=c("c1), envir=connection_store)
}
(this approach is in fact more or less the one take by
sqlhelper
itself)
SQL files
If multiple connections are required but cannot be defined in yaml it
will somewhat diminish the ability to control execution on a
query-by-query basis and may produce a need to split queries into more
files than would otherwise be necessary. mf a package contains many SQL
files, it may be useful to store them in a SQL directory under
inst/
. They may be accessed in the same way, with
system.file()
, either
system.file("SQL", "file_name.SQL")
or
system.file("SQL/file_name.SQL")
(the latter may
be somewhat less portable).
It is often convenient to define SQL parameters in a different scope
to the calling scope of run_files()
. In this case it is
important to ensure that the package does not interfere with the user’s
global environment. The easiest way to do this is to store them in an
environment (e.g. in the same way as illustrated for connections, above)
and pass them to the values
parameter of
run_files()
, run_queries()
or
prepare_sql()
. See run_queries()
or
vignette("execution")
for examples.
Exit
It may be desirable to close sqlhelper
’s connections
when a calling package has completed it’s operations. This can be
achieved easily with sqlhelper::disconnect()
.