Chapter 19

SQL and JSON

IN THIS CHAPTER

Bullet Bridging the gap between JSON-formatted data and SQL-formatted data

Bullet Understanding the elements of the SQL/JSON data model

Bullet Converting data with SQL/JSON functions

Bullet Testing whether a string is valid JSON data

In the early days of computing, there were no databases. Data was kept in flat files, with no organizing structure. Clearly, there had to be a better way, and database architectures were devised in the 1950s and 1960s, primarily the hierarchical architecture and the network architecture. These early architectures were largely superseded by the relational architecture, which became dominant in the 1980s and the decades since. Relational databases are based on that relational architecture, and SQL is the universally used language for operating on relational databases.

In recent years, primarily with the advent of “big data,” a variety of non-relational architectures have taken hold in different application areas. Collectively known as NoSQL database architectures, each one has its own way of organizing and storing data.

It became clear that there was much to be gained if data stored in one kind of NoSQL database could be sent to and used by a different kind of NoSQL database. To get around the problem of different file formats in different NoSQL databases, a data interchange format was developed that all the NoSQL databases could use to share data with each other. That data interchange format is named JSON, which is an acronym for JavaScript Object Notation.

Although derived from JavaScript, JSON can be used with any programming language that supports it. The Object part of the name refers to the fact that data can be transferred in the form of JSON objects. Don’t get too hung up on that, however. Data can also be transferred in the form of arrays, numbers, and strings, as well as true, false, and null values.

Using JSON with SQL

Now, as NoSQL databases have become increasingly popular, the value of exchanging data with relational databases has become clear. There are decades worth of data stored in relational databases that could be of value to applications designed to operate on NoSQL databases. The reverse is also true. Huge quantities of data are being stored in NoSQL databases that could be used by applications designed to operate on relational databases. To meet this need, functionality has been added to the ISO/IEC SQL specification to enable the translation of JSON data into a form that can be handled by SQL, and conversely, the translation of SQL-compatible relational data into JSON data. The tool for making these translations is the SQL/JSON data model, which defines a variety of data items, as well as a set of built-in functions to operate on those items.

Ingesting and storing JSON data into a relational database

To store JSON data in a relational database, it is ingested into a relational database as either a character string or a binary string. That string is stored as an ordinary database column. Once stored, it can be retrieved and operated on by the SQL/JSON built-in functions, described later in this chapter.

Generating JSON data from relational data

In response to an SQL query, built-in functions can generate JSON objects or arrays regardless of whether the source of the data was originally JSON data that had been parsed into an SQL table column or ordinary SQL data that had not originally been JSON data.

Querying JSON data stored in relational tables

A new language, SQL/JSON path language, is embedded in SQL operators to enable the querying of JSON data stored in SQL-compatible relational database tables.

The SQL/JSON Data Model

Because SQL and JSON store data in fundamentally different ways, if data is to be shared across those two environments, a way of bridging that gap must exist. That bridge is the SQL/JSON data model.

JSON data comes in a variety of forms, including JSON arrays, JSON objects, JSON members, JSON literal null values, JSON literal true values, JSON literal false values, JSON numbers, and JSON strings. SQL has no counterpart for JSON arrays, JSON objects, or JSON members. In addition, JSON nulls, numbers, and strings are not exactly the same as SQL nulls, numbers, and strings. Going the other way, JSON has no counterpart for SQL datetime data. To bridge these gaps, a set of SQL/JSON items have been defined. They reside within the SQL environment, but are able to communicate with JSON data stored outside that environment.

SQL/JSON items

JSON data, stored in the form of character or binary strings, can be parsed into SQL/JSON items. An SQL/JSON item can be

  • An SQL/JSON scalar
  • An SQL/JSON null
  • An SQL/JSON array
  • An SQL/JSON object

SQL/JSON scalar

The SQL/JSON scalar is defined as a non-null value of any of the following SQL types:

  • Character string, using the Unicode character set
  • Numeric
  • Boolean
  • Datetime

SQL/JSON null

The SQL/JSON null is defined as a value distinct from any value of any SQL type. It is even distinct from any SQL null value.

SQL/JSON array

The SQL/JSON array is defined as an ordered list of zero or more SQL/JSON items. These items are called the SQL/JSON elements of the SQL/JSON array. Elements in the array are separated by commas and enclosed in square brackets. For example:

[ 3.1415927, "string theory", false]

SQL arrays are 1-relative, meaning that the first element is called element 1. SQL/JSON, however, follows the JavaScript standard of being 0-relative. The first element of a SQL/JSON array is called element 0.

SQL/JSON object

The SQL/JSON object is defined as an unordered collection of zero or more SQL/JSON members, where a member is a pair whose first value is a character string with character set Unicode and whose second value is an SQL/JSON item. The first value of an SQL/JSON member is called the key, and the second value is called the bound value. Members are sometimes called key/value pairs, and sometimes name/value pairs. SQL/JSON objects can be serialized by separating the members with commas and enclosing the entire object in curly braces. An example might be

{ "name" : "Joe Friday", "badge" : 714, "objective" : "the facts" }

SQL/JSON sequences

An SQL/JSON sequence is defined as an ordered list of zero or more SQL/JSON items. It can be thought of as a container of zero or more SQL/JSON items.

Parsing JSON

Parsing is the importing of data in some storage format into the SQL/JSON data model. Normally the format would be a Unicode character string, although other, implementation-dependent formats are possible.

Serializing JSON

Serializing JSON is the reverse of the parsing operation. It is the exporting of a value from the SQL/JSON data model back to some storage format. One thing to note is that SQL/JSON datetimes cannot be serialized, and another is that SQL/JSON sequences of length greater than one also cannot be serialized.

SQL/JSON Functions

Operations on JSON data are performed by built-in functions. These SQL/JSON functions belong to two groups: query functions and constructor functions. Query functions evaluate SQL/JSON path language expressions against JSON values, producing values of SQL/JSON types, which are then converted to SQL types. SQL/JSON path language is described later in this chapter.

Constructor functions use values of SQL types to produce JSON values (either JSON objects or JSON arrays), which are represented in SQL character or binary string types.

JSON API common syntax

There are several query functions, all of which share a common syntax. They all require a path expression, the JSON value to be input to that path expression for querying and processing, and possibly optional parameter values passed to the path expression.

The syntax is

<JSON API common syntax> ::=

<JSON context item> <comma>

<JSON path specification>

[ AS <JSON table path name> ]

[ <JSON passing clause> ]

<JSON context item> ::= <JSON value expression>

<JSON path specification> ::=

<character string literal>

<JSON passing clause> ::=

PASSING <JSON argument>

[ { <comma> <JSON argument> } ]

<JSON argument> ::=

<JSON value expression> AS <identifier>

The type of the value expression contained in the <JSON value expression> immediately contained in the <JSON context item> (second line above) is a string type.

JSON value expression

As noted in the previous BNF syntax definition, a JSON context item is just a JSON value expression. A JSON value expression can be defined as

<JSON value expression> ::=

<value expression> [ <JSON input clause> ]

<JSON input clause> ::= FORMAT <JSON representation>

<JSON representation> ::=

JSON [ ENCODING { UTF8 | UTF16 | UTF32 } ]

| Implementation-defined JSON representation option>

This shows that a JSON value expression is a value expression with an optional input clause. The input clause specifies the format of the JSON representation, which specifies the encoding as being either one of three Unicode formats or an implementation-defined alternative.

Path expression

Following the JSON context item and a comma is the JSON path specification, which must be a character string literal. The table path name and passing clause are optional parts of the JSON path specification.

PASSING clause

The PASSING clause is used to pass parameters to the SQL/JSON path expression.

JSON OUTPUT clause

When JSON data comes back to an application as the result of the operation of a function, the application author can specify the data type, format, and encoding of the JSON text created by the function. The syntax for the output clause is

<JSON output clause> ::=

RETURNING ,data type>

[ FORMAT <JSON representation> ]

<JSON representation> ::=

JSON [ ENCODING { UTF8 | UTF16 | UTF32 } ]

| Implementation-defined JSON representation option>

If FORMAT is not specified, then FORMAT JSON is the default.

Query functions

The SQL/JSON query functions are JSON_EXISTS, JSON_VALUE, JSON_QUERY, and JSON_TABLE. These functions evaluate path language expressions against JSON values. The results returned are values of SQL/JSON types, which are then converted to SQL types. Path language is described later in this chapter.

JSON_EXISTS

JSON_EXISTS determines whether a JSON value satisfies a search condition in the path specification. The syntax is

<JSON exists predicate> ::=

JSON_EXISTS <left paren>

<JSON API common syntax>

[ <JSON exists error behavior> ON ERROR ]

<right paren>

<JSON exists error behavior> ::=

TRUE | FALSE | UNKNOWN | ERROR

If the optional ON ERROR clause is not included, the default assumption is FALSE ON ERROR. JSON_EXISTS evaluates the SQL/JSON path expression, returning a True result if the path expression finds one or more SQL/JSON items.

Sample data that can be used to learn how to use the query functions, including JSON_EXISTS, can be found on pages 24 and 25 of Section 6 of the SQL Technical Report ISO/IEC TR 19075-6:2017(E), which can be downloaded from

http://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip

The data consists of two columns, K and J of a table T. K is the primary key of the table, and J is the data, consisting of key-value pairs and arrays of key-value pairs. JSON_EXISTS tests for the existence of a given character string literal in the JSON path specification. For example:

SELECT T.K

FROM T

WHERE JSON_EXISTS (T.J, 'lax $.where') ;

The primary keys of the rows that contain the word 'where' are returned as the result set of the SELECT query. The keyword 'lax' refers to error handling that is more forgiving than “strict” error handling. It has no effect on the result of this query. $ is the accessor that accesses the word 'where' in the current JSON object.

JSON_VALUE

The JSON_VALUE function extracts an SQL scalar value from a JSON value. The syntax is

<JSON value function> ::=

JSON VALUE <left paren>

<JSON API common syntax>

[ <JSON returning clause> ]

[ <JSON value empty behavior> ON EMPTY ]

[ <JSON value error behavior ON ERROR ]

<right paren>

<JSON returning clause> ::= RETURNING <data type>

<JSON value empty behavior> ::=

ERROR

| NULL

| DEFAULT <value expression>

<JSON value error behavior> ::=

ERROR

| NULL

| DEFAULT <value expression>

As you can probably surmise, <JSON value empty behavior> tells what to return if the result of the SQL/JSON path expression is empty.

  • NULL ON EMPTY means the result of JSON_VALUE is empty.
  • ERROR ON EMPTY means an exception is raised.
  • DEFAULT <value expression> ON EMPTY means that the value expression is evaluated and cast to the target type.
  • <JSON value error behavior> is similar. It specifies what to do if there is an unhandled error.

In the previous JSON_EXISTS example, all the rows where the keyword 'where' was present in the J value column were returned. With JSON_VALUE, the value associated with a target keyword is returned. Using the same data set as the JSON_EXISTS example, where the keyword 'who' is paired with a person's name, the following SQL code will return the names of people from all the rows where the keyword 'who' is present.

SELECT T.K,

JSON_VALUE (T.J, 'lax $.who') AS Who

FROM T ;

The result set will contain a column named K, containing the primary keys of the rows being returned, and a column named Who, containing the names that were paired with the 'who' keyword in the source data.

By default, JSON_VALUE returns an implementation-defined character string data type. The user can specify other types with a RETURNING clause.

JSON_QUERY

JSON_VALUE does a fine job of extracting a scalar from an SQL/JSON value, but is unable to extract an SQL/JSON array or an SQL/JSON object from an SQL/JSON value. JSON_QUERY is designed to perform those functions. The syntax for JSON_QUERY is

<JSON query> ::=

JSON_QUERY <left paren>

<JSON API common syntax>

[ <JSON output clause> ]

[ <JSON query wrapper behavior> ]

[ <JSON query quotes behavior> QUOTES

[ ON SCALAR STRING ] ]

[ <JSON query empty behavior> ON EMPTY ]

[ <JSON query error behavior> ON ERROR ]

<right paren>

The ON EMPTY and ON ERROR clauses are similar to the ones in JSON_VALUE and are handled the same way, the difference here being that the user can specify behavior when either the empty case or the error case arises.

  • If <JSON output clause> is not specified, RETURNING JSON FORMAT is the default.
  • If <JSON query empty behavior> is not specified, then NULL ON EMPTY is the default.
  • If <JSON query error behavior> is not specified, then NULL ON ERROR is the default.
  • If <JSON query wrapper behavior> is not specified, then WITHOUT ARRAY is the default.
  • If <JSON query wrapper behavior> specifies WITH and if neither CONDITIONAL nor UNCONDITIONAL is specified, then UNCONDITIONAL is the default.
  • If the value of the <JSON context item> simply contained in the <JSON API common syntax> is the null value, then the result of <JSON query> is the null value.

Using the same sample data that was used for the JSON_EXISTS sample query and the JSON_VALUE sample query, you can add array data to the result set, along with the results obtained with the JSON_VALUE clauses.

SELECT T.K,

JSON_VALUE (T.J, 'lax $.who') AS Who,

JSON_VALUE (T.J, 'lax $.where' NULL ON EMPTY)

AS Nali,

JSON_QUERY (T.J, 'lax $.friends') AS Friends

FROM T

WHERE JSON_EXISTS (T.J, 'lax $.friends')

The WHERE JSON_EXISTS clause eliminates any rows that do not have a key-value pair for friends. If WITH ARRAY WRAPPER is specified, then array elements returned are enclosed in a pair of square brackets.

JSON_TABLE

The JSON_TABLE function is significantly more complex than the other query functions. It takes JSON data and generates a relational output table from the valid input data. The syntax definition for the simplest variant of the JSON_TABLE function takes up more than a full page of text. Adding nested paths and plan clauses adds enormous complexity to what is already pretty complex. I do not have enough room here to cover JSON_TABLE in the depth that it deserves, so instead, I refer you to page 35 and following of the SQL Technical Report ISO/IEC TR 19075-6:2017(E), which can be downloaded from

http://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip

Constructor functions

The SQL/JSON constructor functions serve to construct JSON objects, arrays, and aggregates, based on information stored in relational tables. They are performing operations that take data in the opposite direction from the direction that SQL/JSON query functions take.

JSON_OBJECT

The JSON_OBJECT function constructs JSON objects from explicit name/value pairs. The syntax is

<JSON object constructor> ::=

JSON_OBJECT <left paren>

[ <JSON name and value> [ { <comma>

<JSON name and value> } … ]

[ <JSON constructor null clause> ]

[ <JSON key uniqueness constraint> ] ]

[ <JSON output clause> ]

<right paren>

<JSON name and value> ::=

[KEY] <JSON name> VALUE <JSON value expression>

| <JSON name> <colon> <JSON value expression>

<JSON name> ::= <character value expression>

<JSON constructor null clause> ::=

NULL ON NULL

| ABSENT ON NULL

<JSON key uniqueness constraint> ::=

WITH UNIQUE [ KEYS ]

| WITHOUT UNIQUE [ KEYS ]

There are a few rules that must be followed along with the preceding syntax:

  • <JSON name> may not be NULL.
  • <JSON value expression>s may be NULL.
  • The <JSON constructor null clause>, if NULL ON NULL, produces a SQL/JSON null. If ABSENT ON NULL, it omits the key-value pair from the resulting SQL/JSON object.
  • If no JSON constructor null clause is present, the default is NULL ON NULL.

JSON_OBJECTAGG

An application developer may want to construct a JSON object as an aggregation of the data in a relational table. If such a table contains two columns, one with JSON names and the other with JSON values, the JSON_OBJECTAGG function can act on that data to create a JSON object. The syntax to perform this operation is

<JSON object aggregate constructor> ::=

JSON_OBJECTAGG <left paren>

<JSON name and value>

[ <JSON constructor null clause> ]

[ <JSON key uniqueness constraint> ]

[ <JSON output clause> ]

<right paren>

If <JSON constructor clause> is not present, NULL ON NULL is the default.

JSON_ARRAY

To create a JSON array, based on a list of data items in a relational database table, the JSON_ARRAY function can be used. The syntax is

<JSON array constructor> ::=

<JSON array constructor by enumeration>

| <JSON array constructor by query>

<JSON array constructor by enumeration ::=

JSON_ARRAY <left paren>

[ <JSON value expression> [ { <comma>

<JSON value expression> }… ]

<JSON constructor null clause> ] ]

<JSON output clause>

<right paren>

<JSON array constructor by query> ::=

JSON_ARRAY <left paren>

<query expression>

[ <JSON input clause> ]

[ <JSON constructor null clause> ]

[ <JSON output clause> ]

JSON_ARRAY has two variants, one that produces its result from an input list of SQL values, and the other that produces its results from a query expression invoked from within the function. If the optional <JSON constructor null clause> is absent, the default is ABSENT ON NULL, which is the opposite of the default behavior for JSON_OBJECT.

JSON_ARRAYAGG

Just as you may want to construct a JSON object based on an aggregation of relational data, you may also want to construct a JSON array based on an aggregation of relational data. To do that, the JSON_ARRAYAGG function is now part of the SQL standard. The syntax is

<JSON array aggregate constructor> ::=

JSON_ARRAYAGG <left paren>

<JSON value expression>

[ <JSON array aggregate order by clause> ]

[ <JSON constructor null clause> ]

[ <JSON output clause> ]

<right paren>

<JSON array aggregate order by clause> ::=

ORDER BY <sort specification list>

If there is no <JSON constructor null clause>, the default is ABSENT ON NULL. The <JSON array order by clause> enables the developer to order output array elements according to one or more sort specifications, similar to the way ORDER BY works on ordinary SQL data.

IS JSON predicate

The IS JSON predicate tests whether a string purported to be JSON data is indeed valid JSON data. The syntax for use of the IS JSON predicate is

<JSON predicate> ::=

<string value expression> [ <JSON input clause> ]

IS [NOT] JSON

[ <JSON predicate type constraint> ]

[ <JSON key uniqueness constraint> ]

<JSON predicate type constraint> ::=

VALUE

| ARRAY

| OBJECT

| SCALAR

If the optional <JSON input clause> is not specified, then FORMAT JSON is the default. If <JSON key uniqueness constraint> is not specified, then WITHOUT UNIQUE KEYS is the default.

JSON nulls and SQL nulls

JSON nulls are not exactly the same as SQL nulls. In SQL, a zero-length string ("") is distinct from an SQL null value, which represents the absence of a definite value. In JSON, null is an actual value, and is represented by a JSON literal (“null”). JSON nulls must be distinguishable from SQL nulls. SQL/JSON syntax enables the application author to select whether SQL null values are included in a JSON object or array being constructed, or whether they should be omitted from the JSON object or array being constructed.

SQL/JSON Path Language

SQL/JSON Path language is a query language used by the SQL/JSON query functions. It accepts a context item, a path specification, and a PASSING clause as inputs, potentially along with ON ERROR and other clauses, to execute the JSON_EXISTS, JSON_VALUE, JSON_QUERY, and JSON_TABLE functions. These functions are executed by a path “engine” that returns results to the function, and ultimately back to the user.

In path language, the dollar sign ($) represents the current context element, and the period (.) represents an object member. Square brackets enclose array elements. Thus:

  • $.name denotes the value of the name attribute of the current JSON object.
  • $.phones[last] denotes the last element of the array stored in the phones attribute of the current JSON object.

There's More

JSON is a complex subject and there is much more to it than I have had space to cover here. For additional information on how Microsoft uses JSON with their SQL Server database products, you can find information specific to that implementation here:

https://docs.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-2017