Blog
MarkLogic Index Data Types
- 28 July, 2022
- By Dave Cassel
- No Comments
MarkLogic offers several types of indexes: Universal, range, triples. These indexes provide fast access to your content and can be configured to work with specific data types. MarkLogic will even do some type conversions for you.
Universal Index
Let’s insert a couple documents. Note the difference between the updated
properties (“T” versus no “T”) and the types of the someNumber
property.
'use strict'; declareUpdate(); xdmp.documentInsert( "/content/doc1.json", { "updated": "2022-07-13T00:00:00", "someNumber": 1 } ) xdmp.documentInsert( "/content/doc2.json", { "updated": "2022-07-12 00:00:00", "someNumber": "2" } )
The Universal Index will store each of these values, along with the structure, as they are provided to MarkLogic. We can query those as soon as the transaction completes. To do so, we need to query for the specific value of the right type: cts.jsonPropertyValueQuery("someNumber", 1)
will find doc1.json, but cts.jsonPropertyValueQuery("someNumber", "1")
will not.
Range Indexes
Let’s set up 2 range indexes:
- On the “updated” property with type “dateTime”
- On the “someNumber” property with type “int”
I remember that at some point in the past, doc2.json would have been rejected, because a valid dateTime has to have a “T” between the date and the time. (In other words, xs.dateTime("2022-07-12 00:00:00")
would fail.) MarkLogic changed that at some point; our sample data values, both with and without the “T”, can be passed to the xs.dateTime
constructor successfully. If we ask MarkLogic for the values in the range index, we’ll see both dateTimes (with the “T”):
cts.values(cts.jsonPropertyReference("updated")) =>Likewise, we can do an inequality query whether our input has the “T” or not:2022-07-12T00:00:00
2022-07-13T00:00:00
cts.search( cts.jsonPropertyRangeQuery( "updated", ">=", xs.dateTime("2022-07-12 00:00:00") ) )
Triples Index
The triples index, which powers both triples and views, also does this conversion. Let’s add a template:
'use strict'; const tde = require("/MarkLogic/tde.xqy"); const typeTemplate = xdmp.toJSON( { "template": { "context": "/", "directories": ["/content/"], "rows": [ { "schemaName": "test", "viewName": "types", "columns": [ { "name": "updated", "scalarType": "dateTime", "val": "updated", "invalidValues":"reject" }, { "name": "someNumber", "scalarType": "int", "val": "someNumber", "invalidValues":"reject" } ] } ] } } ); tde.templateInsert( "/test/typeTemplate.json" , typeTemplate, xdmp.defaultPermissions(), ["TDE"] )
Now we can do a simple query and see that the values have been converted to their target types:
select * from test.types
test.types.updated | test.types.someNumber |
2022-07-13T00:00:00 | 1 |
2022-07-12T00:00:00 | 2 |
Note that our template doesn’t have any code to explicitly convert the values; MarkLogic just does it for us.
Impact
I find this implicit conversion especially helpful for xs.dateTime
. Relational databases often use the format without the “T” in the middle. When ingesting data from such sources (or accepting queries from consumers that expect that format), the ingest process would need to add the “T” in order to match the expected format if the implicit conversion didn’t happen.
The key thing is to remember that the value in the document (and in the Universal Index) hasn’t changed — MarkLogic stores whatever is provided. If you have a property where the source doesn’t reliably provide the same type, remember that your value queries will need to match both type and value (as in the case for the someNumber
property above).
Share this post:
4V Services works with development teams to boost their knowledge and capabilities. Contact us today to talk about how we can help you succeed!
MarkLogic offers several types of indexes: Universal, range, triples. These indexes provide fast access to your content and can be...