sqlhelper
’s main purpose is the execution of
files of SQL, with options for controlling the execution of
individual SQL queries within each file. The function
run_files()
executes .SQL files. Internally,
run_files()
calls the functions read_sql()
,
prepare_sql()
and run_queries()
and these
functions can be used to read and prepare SQL files without executing
them, and to execute SQL query strings.
Executing SQL files
Executing SQL code requires a connection to a database, and
sqlhelper
provides ways to automate creating and managing
connections. These are described in
vignette("connections")
.
Once connections are configured, the run_files()
command
can be used to execute files of SQL code.
library(sqlhelper)
connect("examples/sqlhelper_db_conf.yml", exclusive = TRUE)
DBI::dbWriteTable( default_conn(), name = "IRIS", iris)
readLines("examples/example.sql") |>
writeLines()
#> -- qname = how_many_irises
#> SELECT count(*) as N FROM IRIS;
#>
#> -- qname = short_petal_setosa
#> select Species, `Petal.Length`
#> FROM IRIS
#> WHERE Species = "setosa"
#> AND `Petal.Length` < {petal_length}
petal_length <- 1.3
results <- run_files("examples/example.sql")
results
#> $how_many_irises
#> N
#> 1 150
#>
#> $short_petal_setosa
#> Species Petal.Length
#> 1 setosa 1.1
#> 2 setosa 1.2
#> 3 setosa 1.0
#> 4 setosa 1.2
As well as individual file names, run_files()
accepts a
vector of file names.
Accessing results of queries
run_files()
returns a list of results of the same length
as the number of queries. In the above example, names are assigned to
queries with interpreted comments, of the form
-- qname = my_query_name
. If queries are named, individual
results can be accessed by the same name:
results$short_petal_setosa
#> Species Petal.Length
#> 1 setosa 1.1
#> 2 setosa 1.2
#> 3 setosa 1.0
#> 4 setosa 1.2
Results returned as list may also be accessed by index, of course. However, if a file contains a single query, the result of that query will be returned as is, (i.e. an object, not a single element list).
Beware of the usual gotchas around list names. sqlhelper
will not complain if you give two queries the same name, but if you then
try to access the results by name, you will only get the result of the
first query with that name. This is particularly relevant if your
project executes queries from multiple files and those files are
developed by different people. Similarly, be careful not to use anything
in query names that that R will interpret as an operator or special
character. In the above example, naming the query
short-petal-setosa
will cause an error because R will
interpret -
as a subtraction.
Controlling execution of individual queries
As well as naming queries, interpreted comments can be used to
control aspects of execution on a per-query basis. For example, queries
are executed with DBI::dbGetQuery()
by default, but
sqlite
will complain if you use this to send a statement to
the database. You can control the execution function with the
execmethod
keyword:
-- qname = bobby_tables
-- execmethod = sendq
DROP TABLE Students;
All interpreted comments follow the form
-- <keyword> = <value>
.
Interpretable keywords are:
- qname A name for this query
- interpolate “yes” or “no” - should this query be parameterized?
- quotesql “yes” or “no” - should interpolated characters be quoted?
-
execmethod One of “get”, “execute”, “sendq”,
“sends” or “spatial” - which method should be used to execute the query?
“get” means
DBI::dbGetQuery()
; “execute” meansDBI::dbExecute()
; “sendq” meansDBI::dbSendQuery()
; “sends” meansDBI::dbSendStatement()
; “spatial” meanssf::st_read()
. -
geometry The name of a spatial column. Ignored if
execmethod
is not ‘spatial’ - conn_name The name of a connection to execute this query against
## add combined standard/spatial example
Cascaded execution parameters
All interpreted comments except qname
are cascaded
within their file, meaning that if you want to use the same
values throughout, you need only set them for the first query. See also
read_sql()
for details.
If you want to change the execution parameters for the first query only and retain the defaults for the remainder you will need to either:
- use interpreted comments to explicitly reset the defaults for the second query; or
- put the second and subsequent queries in a different file.
You can prevent cascading by passing
cascade = FALSE
.
## cascaded comments example
Interpolation
By default, sqlhelper
will attempt to parameterize SQL
queries with glue::glue_sql()
using values from the current
environment. This means that values from R can be easily inserted in
your SQL code, or calculated in situ:
readLines("examples/petal_length_params.sql") |>
writeLines()
#> -- qname = short_petal_setosa
#> select Species, `Petal.Length`
#> FROM IRIS
#> WHERE Species = "setosa"
#> AND `Petal.Length` < {petal_length};
#>
#> -- qname = setosa_petal_1sd_below
#> select Species, `Petal.Length`
#> FROM IRIS
#> WHERE Species = "setosa"
#> AND `Petal.Length` < {
#> mean(iris$Petal.Length[iris$Species == "setosa"])
#> - sd(iris$Petal.Length[iris$Species == "setosa"])
#> }
petal_length = 1.2
run_files("examples/petal_length_params.sql")
#> $short_petal_setosa
#> Species Petal.Length
#> 1 setosa 1.1
#> 2 setosa 1.0
#>
#> $setosa_petal_1sd_below
#> Species Petal.Length
#> 1 setosa 1.1
#> 2 setosa 1.2
#> 3 setosa 1.0
#> 4 setosa 1.2
Interpolation behaviour can be controlled using the keywords
interpolate and quotesql, and the
values
parameter of prepare_sql()
(which can
be also passed to run_files()
and
run_queries()
).
The default behaviour is to quote SQL strings (i.e. interpolate with
glue::glue_sql()
); if this is not desired it can be avoided
with -- quotesql = no
(for an individual query in a file;
but see the ‘cascaded comments’ example, above)
or by passing quotesql = "no"
as a parameter to
run_files()
(for all queries). If strings are not quoted
they will be inserted bare; whilst this is occasionally useful, great
care should be taken to sanitize any interpolated values.
If you want to skip interpolation for an individual query, precede it
with -- interplate = no
. If you want to skip interpolation
altogether, pass interpolate = "no"
as a parameter and see
also the ‘cascaded comments’ example, above.
Passing parameter values
Sometimes you may need to parameterize your SQL with values that are
not in the calling environment. This is particularly important if your
are executing SQL code from within a package: you cannot rely on, and
should not risk writing to, your users’ globalenv()
. To
supply interpolation values to run_files()
and
run_queries()
, pass a populated environment as the
values
parameter.
# reusing the petal length parameter example
# A user may have a petal_length parameter in the globalenv already
print(petal_length)
#> [1] 1.2
result_from_globalenv <-
run_files("examples/petal_length_params.sql")
result_from_globalenv$short_petal_setosa
#> Species Petal.Length
#> 1 setosa 1.1
#> 2 setosa 1.0
# a bespoke environment can provide a specific set of values for interpolation
my_values <- new.env()
my_values$petal_length <- 1.4
result_from_my_values <-
run_files("examples/petal_length_params.sql", values = my_values)
result_from_my_values$short_petal_setosa
#> Species Petal.Length
#> 1 setosa 1.3
#> 2 setosa 1.1
#> 3 setosa 1.2
#> 4 setosa 1.3
#> 5 setosa 1.0
#> 6 setosa 1.2
#> 7 setosa 1.3
#> 8 setosa 1.3
#> 9 setosa 1.3
#> 10 setosa 1.3
#> 11 setosa 1.3
Binding
Binding can be performed alongside interpolation. Queries and
statements are first interpolated and then should then be executed with
DBI::dbSendQuery()
or DBI::dbSendStatement()
.
They may then be bound and the result fetched.
readLines("examples/binding.SQL") |>
writeLines()
#> -- execmethod = sendq
#> -- qname = binding_example
#> SELECT species, [Petal.Width]
#> FROM IRIS
#> WHERE SPECIES = ? AND
#> [Petal.Width] < {petal_width};
petal_width <- 0.2
result <- run_files("examples/binding.SQL")
DBI::dbBind(result$binding_example, list("setosa"))
DBI::dbFetch(result$binding_example)
#> Species Petal.Width
#> 1 setosa 0.1
#> 2 setosa 0.1
#> 3 setosa 0.1
#> 4 setosa 0.1
#> 5 setosa 0.1
DBI::dbClearResult(result$binding_example)
Reading and preparing SQL files
SQL files and strings can be read and prepared without being executed
by the read_sql()
and prepare_sql()
functions.
These functions return tibbles containing the prepared SQL, associated
metadata (e.g. filename), and execution parameters. These functions
enable both inspection of prepared SQL and parameters for debugging, and
further manipulation of SQL queries prior to execution.
Executing SQL strings
One of the main objectives of sqlhelper
is to reduce the
incidence of SQL written as strings in R code. However, it is
occasionally convenient for interactive exploratory work to type a query
as a string. For this you may use run_queries()
. This
function can also be used to execute queries that have been read from
files (e.g. with read_sql()
) and then manipulated
programmatically before execution.
Passing ad-hoc connections to functions
It may not always be possible or desirable to have
sqlhelper
manage your database connections. For example,
the use of secrets is not yet supported in sqlhelper
connections. In these cases, connections created outside
sqlhelper
may be passed to to run_files()
or
run_queries()
.
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
cars <- mtcars
cars$model <- row.names(mtcars)
DBI::dbWriteTable(con, "cars", cars)
minmpg = 30
run_queries("SELECT model, mpg, cyl FROM CARS WHERE mpg >= {minmpg}",
default.conn = con)
#> model mpg cyl
#> 1 Fiat 128 32.4 4
#> 2 Honda Civic 30.4 4
#> 3 Toyota Corolla 33.9 4
#> 4 Lotus Europa 30.4 4