Tag Pipeline Extension Steps

Contact: nSymbol support team


1. Introduction

This document describes XProc 3.0 pipeline extension steps developed by nSymbol Technology (nsymbol.com). These steps are only available within the Tag desktop application.

Jump to specific steps using these links.

2. tag:connector

The tag:connector step uses a pre-defined Tag connector to call a web API. The connector must be loaded in the Connect app and is referenced using the ref option. Connectors can be imported and exported in the "Manage preferences" panel (top-right Account menu).

<p:declare-step type="tag:connector">
<p:output port="result" content-types="any"/>
<p:option name="ref" required="true" as="xs:QName"/>
</p:declare-step>

Connectors store all information needed to make a web API call including the URL, headers and user authentication information. When an apikey is required to authenticate web API users, Tag can securely save apikeys using preferences and access them via this step.

When content must be uploaded to the web API as part of a call (e.g., for HTTP POST requests), the connector must store the post body to upload. When the tag:connector step is used in a pipeline, the p:insert step can be used to update the post body before the call is made.

The output of this step depends on the web API called. The most common formats are JSON, XML and text. The response received from the web API is copied to the result output port as-is.

3. tag:csv

The tag:csv step converts a CSV (comma-separated values) document into an XML document.

<p:declare-step type="tag:csv">
<p:input port="source" content-types="text"/>
<p:output port="result" content-types="xml"/>
<p:option name="namespace" as="xs:anyURI"/>
<p:option name="read-headers" as="xs:boolean" select="true()"/>
</p:declare-step>

A simple XML structure is created comprised of multiple <r> elements that each contains one child for every column.

<csv>
<r>
<name>Joe</name>
<email>joe@example.org</email>
</r>
</csv>

CSV headers are read from the first row unless the read-headers option is false. Headers are used to name <r> child elements - if not available, <v> elements are used.

The namespace option may be used to define a namespace in the result XML.

Note that tag:google can download CSV content from the Google Sheets API. This step can convert the downloaded CSV to XML for further processing.

A future version of Tag may extend this step to handle XML to CSV conversion.

3. tag:docx

The tag:docx step converts an XSL-FO document (the default rich text format in Tag) into a DOCX document (*.docx file) that can be opened in one of several popular word processors.

<p:declare-step type="tag:connector">
<p:input port="source" content-types="xml"/>
<p:output port="result" content-types="any"/>
</p:declare-step>

The output of this step is considered binary from a pipeline perspective. Typically a p:store step is used to save it to a file.

Only a subset of format settings are converted, roughly corresponding to the available format tools in the Tag rich text editor.

A future version of Tag may extend this step to handle DOCX to XSL-FO conversion.

3. tag:google

The tag:google step allows you to call Google APIs if you have a Google business account. Google has a vast selection of APIs available to access Google resources like Drive, Docs, Sheets, Email and much more.

<p:declare-step type="tag:connector">
<p:input port="source" content-types="any"/>
<p:output port="result" content-types="any"/>
<p:output port="report" content-types="json"/>
<p:option name="href" required="true" as="xs:anyURI"/>
<p:option name="scope" required="true" as="xs:string"/>
<p:option name="method" select="'GET'" values="('GET','POST','PUT','PATCH','DELETE')"/>
<p:option name="parameters" as="map(xs:QName,item()*)?"/>
<p:option name="user" as="xs:string"/>
</p:declare-step>

At a minimum, you need to provide the href and scope options for each API call. These are defined by Google documentation. API calls must be enabled in your Google Cloud account (see below).

When calling a Google API for the first time, a login challenge is made. You must be logged in to your Google account in a web browser. Tag will detect this, and open a web page that allows you to authorize the scope(s) required for that web API call (this is the same OAuth 2.0 permission granting mechanism used in mobile apps).

This permission can be reused many times, until it eventually expires and displays the permission form to you again. Importantly, it can be reused by other API calls that require the same scope.

The user option is normally not needed. It may be useful if you are calling multiple APIs with differing scopes. It is used to cache permissions on your computer.

The response from the API call is stored on the result output port. The report output port is used to store a JSON report if one is returned by the Google API.

To get started, create a project in Google Cloud (be sure to log in with the Google identity that you want to use in the API calls). In the project console (there is a link at the top-right of Google’s welcome page) create a new project or select an existing one. On the dashboard select APIs & Services > Enabled APIs & Services.

Search for the API you want and enable it. Try it in the browser outside of Tag to confirm what scope is needed to call it.

Select APIs & Services > Credentials and create an "OAuth 2.0 client ID". Click on the download icon and save it to a *.json file in a safe location - Tag needs this file.

Finally, in Tag go to Manage preferences > Google API setup. Add a link to *.json file downloaded above. Now you can create tag:google steps in a pipeline to call APIs enabled for your Google Cloud project.

3. tag:html

The tag:html step converts an XSL-FO document (the default rich text format in Tag) into an HTML document (website page) that can be opened in any web browser.

<p:declare-step type="tag:connector">
<p:input port="source" content-types="xml"/>
<p:output port="result" content-types="xml html"/>
<p:option name="save-as-xhtml" as="xs:boolean" select="false()"/>
</p:declare-step>

The save-as-xhtml option allows you to save the result as an XHTML document, which is a form of pure XML. While Tag tries to treat HTML and XHTML in a consistent way, there may be situations (in particular with other software programs) where using XHTML provides an advantage.

The output of this step is HTML or XML, which can both be processed further by other pipeline steps. A p:store step can be used to save it to a file.

Only a subset of format settings are converted, roughly corresponding to the available format tools in the Tag rich text editor.

A future version of Tag may extend this step to handle HTML to XSL-FO conversion.

3. tag:json-as-xml

The tag:json-as-xml step converts a JSON document into an XML document.

<p:declare-step type="tag:connector">
<p:input port="source" content-types="json"/>
<p:output port="result" content-types="xml"/>
<p:option name="method" select="'jackson'" values="('jackson','xpath')"/>
</p:declare-step>

There are two ways to perform this conversion which is controlled by the method option. The xpath method is the conversion method used by the XPath json-to-xml() function. It creates accurate, yet verbose, XML to represent the input JSON. For example,

<map xmlns="http://www.w3.org/2005/xpath-functions">
<string key="name">John</string>
<number key="age">22</number>
</csv>

The other conversion method is jackson, which refers to the popular Jackson open source library. The XML created by Jackson is less verbose and may be more suitable for some purposes. This is the default method for this step. For example,

<ObjectNode>
<name>John</name>
<age>22</age>
</ObjectNode>

In some cases, the jackson method will not be possible (due to complexity of the input JSON) and the xpath method will need to be used.

3. tag:prompter

The tag:prompter step pauses execution of a pipeline to prompt the user for input.

<p:declare-step type="tag:connector">
<p:output port="result" content-types="xml"/>
<p:option name="message" required="true" as="xs:string"/>
<p:option name="prompt" as="xs:string"/>
<p:option name="title" as="xs:string"/>
<p:option name="type" select="'prompt'" values="('confirm','info','prompt','yes-no-cancel')"/>
</p:declare-step>

The type option dictates what kind of prompter appears:

If null is returned, the pipeline will stop running. All other values are wrapped in a c:result document and written to the result output port.

3. tag:sleep

The tag:sleep step pauses execution of the pipeline for a specific duration of time. It can be used to simulate longer-running steps for demos, or during prototype development.

<p:declare-step type="tag:connector">
<p:option name="millis" select="500" as="xs:integer"/>
</p:declare-step>

3. tag:sparql

The tag:sparql step reads remote SPARQL endpoints (semantic databases).

<p:declare-step type="tag:connector">
<p:input port="source" content-types="text"/>
<p:output port="result" content-types="xml"/>
<p:option name="password" as="xs:string"/>
<p:option name="port" as="xs:string"/>
<p:option name="server" required="true" as="xs:anyURI"/>
<p:option name="user" as="xs:string"/>
</p:declare-step>

A text document containing a SPARQL query is passed in, and used to query a SPARQL endpoint using the server URI and some additional settings.

Note that the query can be generated using logic and/or data by prior steps in the pipeline. This is a very powerful way to access SPARQL endpoints.

The result is saved as XML in a similar way to tag:sql. Each row in the result set creates a repeating element, which has child elements for all returned variables. There is no guarantee that all repeating elements have exactly the same child elements.

3. tag:sql

The tag:sql step reads local or remote SQL databases.

<p:declare-step type="tag:connector">
<p:input port="source" content-types="text"/>
<p:output port="result" content-types="xml"/>
<p:option name="database" as="xs:string"/>
<p:option name="password" as="xs:string"/>
<p:option name="port" as="xs:string"/>
<p:option name="server" required="true" as="xs:anyURI"/>
<p:option name="type" required="true" values="('access','mysql','sql-server')"/>
<p:option name="user" as="xs:string"/>
</p:declare-step>

A text document containing a SQL query is passed in, and used to query a SQL database using the type option ("access", "mysql" or "sql-server"), the server option URI, and some additional settings.

Note that the query can be generated using logic and/or data by prior steps in the pipeline. This is a very powerful way to access SQL databases.

The result is saved as XML where each row in the result set creates a repeating element, which has child elements for all result columns. All repeating elements have the same child elements, although some may be empty.

3. tag:xml-as-json

The tag:xml-as-json step converts an XML document into a JSON document.

<p:declare-step type="tag:connector">
<p:input port="source" content-types="xml"/>
<p:output port="result" content-types="json"/>
<p:option name="save-as-array" as="xs:string"/> <!-- XPathExpression -->
</p:declare-step>

There are two ways to perform this conversion which is determined by the input XML. If the XML references the "http://www.w3.org/2005/xpath-functions" namespace, it is converted to JSON exactly like the XPath xml-to-json() function.

If that namespace is not present, the Jackson open source library is used to perform the conversion. If Jackson is unable to perform the conversion, an error is reported and the pipeline will stop.

The save-as-array option may be used during Jackson conversion. Jackson can't handle multiple map siblings with same name, and some data is not preserved. Instead, this option stores an expression that will "flatten" the XML structure into something that converts to an array (e.g., the expression selects a list of repeating elements from somewhere within the XML hierarchy).