Technical Overview
An Informal Introduction to MarkLogic Server, XQuery, and Developer Resources
This document gives you an informal introduction to Mark Logic Server, the XQuery language, and the developer resource site dedicated to both.
Tour Stops
- Getting Your Bearings
- Getting MarkLogic Server
- Administering MarkLogic Server
- Build Your First App in Minutes
- Writing XQuery
- Understanding XQuery
- Loading Documents
- Searching Documents
- Java-Based and .NET-based Queries with XCC
- Continuing On
Getting Your Bearings
The first stop on our tour today is MarkLogic Server, the industry's leading XML Server. What's an XML Server?
An XML server is a platform that provides a set of services used to build applications and support business processes based on information and content. The native data format is XML, and the best kind of content it can work with is semi-structured, if not XML then something that can be converted into that format: for example HTML, email, and Word documents.
The server contains indexes for rapid searching across content, structure, and values, which makes possible features you won't find in an ordinary database or search engine. It's designed for content, so it includes features like format conversion, enrichment, and pipeline processing.
The server manages its own repository and is accessed using the W3C XQuery language. But unlike a database that only services SQL queries, MarkLogic Server includes a robust application server incuding SSL and URL rewriting, as well as Application Services APIs that make it easier to build complete apps.
It's probably easiest to understand the concept of an "XML server" with a demonstration. At MarkMail.org you'll find a web-based application that allows you to explore some 40 million messages from public mailing lists focused on technology and open source. You can drill down into the database based on search terms (including stemming where searcing for 'win' also matches on 'wins' and 'won') , or specific data facets like author, mailing list, date, attachment type, or message type.
We have several other interesting demos: you can see some of our customer implementations by looking at the demos section of our website. If you're in touch with Mark Logic's sales team, ask to see some of the others. Some of my favorites let you dig into medical textbooks and Arabic newspapers.
Getting MarkLogic Server
The best way to understand what you can do with an XML server is to use one. Under the Community License, you can use a free copy of MarkLogic Server for non-commercial use. You'll find a big button on the front page of the developer site, which points to http://developer.marklogic.com/download/. Check the information on that page for specific system requirements.
There's one binary download for each platform. The license key you enter after install determines the functionality exposed. You can upgrade simply by changing license keys. To learn about the product editions and capabilities, I'll refer you to the What Is MarkLogic Server page.
For all platforms, there is a single shared installation guide, which we highly recommend reading. We'll pause the tour now while you follow the guide's instructions and install the database. It takes just a couple minutes.
Administering MarkLogic Server
The install guide should have walked you through the process of browsing to the admin interface at http://localhost:8001 to enter the license key. Now you can go to the same web address to administer the server. The admin interface lets you control the creation, management, and configuration of databases, forests, servers, and hosts. The left navigation bar contains the "nouns". Use it to select the item you want to act upon. The top right tabs contain the "verbs". Select the verb after selecting the noun. Under the tab is a data entry area for making changes.
The main thing you need to understand when using the admin pages is the database topology. Documents are stored in forests. One or more forests are gathered together to form a database. Databases are logical units against which you can assign HTTP, WebDAV and XDBC (for XCC Java and .NET connectivity) servers and set various runtime configuration options. The name forests comes from the fact that XML documents are tree structures, and a collection of trees is a forest. Databases exist as a logical abstraction because in a distributed environment it can be useful to have the same logical database spread across different hosts, perhaps one host with two forests and another with three.
There's a full Administrator's Guide document available from http://developer.marklogic.com/pubs/ which you can use it as your guide through the port 8001 administration pages.
Build Your First App in Minutes
Starting with version 4.1, MarkLogic ships with Application Services which includes MarkLogic Application Builder, running on port 8002 by default. The main use of the Application Builder is to quickly build an app around your own content, though it does include a demonstration app and some sample content, which can be loaded into a database of your choosing.
From the Application Builder main page click the "Add New" button, and on the dialog that appears, provide a name and select "New Database" (also giving it a name) and "Oscars® Sample". Click the "Create Project" button and you will be presented with a wizard of six screens, each of which controls one aspect of the app that it will ultimately generate:
- Appearance
- Search
- Sorting
- Results
- Content
- Deploy
You can experiment with the settings on these pages as much as you like. When ready to run the app, click on the Deploy tab, enter in a port number for the to-be-created App Server, and hit the "Deploy" button. The app will launch on the port of your choosing.
Check out the rest of the tutorials and the official documentation for more details on the Application Builder and how it can be used to rapidly get your apps off the ground.
Writing XQuery
Now that you have a database installed and had a chance to poke around the admin screens and the Application Builder, I'll bet you're itching to dive in deeper and write your own XQuery code. There's more than one way to do it: in the browser, or by creating files on the file system to name a few.
One of the samples that ships with MarkLogic Server 4.1 is a web-based tool called CQ (client query). It's a web-based application useful for writing quick queries with minimal set-up.
To set it up, create a new App Server: from the main Admin page, click "App Servers", then "Create HTTP" and fill in
- "cq" for the server name
- "Samples/cq" for the root
- "8027" for the port (I use 8027 because 27 spells CQ on my phone, you can use any conveniently available port)
- Click OK to create the App Server
Then visit localhost:8027 in your browser to see a form where XQuery can be entered and evaluated. A drop-list on the page lets you choose which database and App Server to use when evaluating your XQuery.
Note: CQ is a powerful tool. Use it with care.
Another easy way to get started is to make use of the pre-existing Docs
HTTP server that's by default setup and running on port 8000 against the Documents
database. To write a query, go (via the command line or with an explorer) to the
Docs directory under your server root. That's
C:\Program Files\MarkLogic\Docs
on Windows and /opt/MarkLogic/Docs on Unix. There, write some XQuery in
a file, say welcome.xqy, and in a browser visit
http://localhost:8000/welcome.xqy
substituting your real server name if it's different.
Whichever way is most convenient, try the following query:
(: This is welcome.xqy :)
<big xmlns="http://www.w3.org/1999/xhtml">
Welcome to { xdmp:product-name() }
{ xdmp:version() }
{ xdmp:product-edition() } Edition!
</big>
When a client requests a query file using the special extension .xqy the server executes the query file content and returns the result. It's basically CGI for XQuery. And because XQuery so easily constructs dynamic XHTML output, it's an amazingly convenient development and deployment model. There's no need to use something like Java classes in processing the result (although you can, as we'll see later).
Understanding XQuery
XQuery is a potentially huge subject, but as the prior section showed, it can be very easy to get started. If you looked around on the default App Server installed on port 8000, you might have run into the XQuery use case documents at a URL like localhost:8000/use-cases. Here you will see a simple demo built from the XQuery Use Cases specification. To get started:
- Click the "Load source XML into database" link.
- Click on an example on the left and it populates the query textarea on the top right.
- Click Submit and you see the results (in XML or XHTML format) in the bottom right.
You can use these examples to get a taste for what XQuery code looks like. You can enter your own custom query into the textarea. Here's an example:
(: Try this in the textarea :) for $i in collection() return document-uri($i)
Running this gives you an unsorted listing of all the documents held within
the database. The collection() function returns a sequence of document nodes while
document-uri($i) returns the URI (the identifier) for document $i.
This query might time out when run against larger databases--more efficient means
to iterate URIs exist for larger data sets.
For some explanation on the XQuery Use Cases, I'll point you toward the Getting Started with MarkLogic Server document available at http://developer.marklogic.com/pubs/.
XQuery is a language designed to efficiently query large collections of XML data. Examples include medical records, textbook content, office documents, or web pages. In this model, you store the documents directly into the XQuery database -- possibly going through a conversion to XML. Then you query the documents to extract the bits and pieces deemed important. Increasingly XQuery is getting used for application logic as well, and thus becoming a one-stop-shopping language for building Web Apps.
Loading Documents
As many people have pointed out, vanilla XQuery leaves certain areas underspecified. An additional standard called "XQuery Update", still unfinished as of this writing, provides a way to put things into the database, though it's not yet widely implemented. Plain old XQuery also lacks built-in support for efficient full-text search, though the W3C is working on that too. Mark Logic addresses these gaps with numerous built-in functions. This section explains a few of the methods you need to understand in order to make the most of MarkLogic Server.
Watch out! If you're new to XQuery and skipped over the Getting Started links above, you're going to find the XQuery code in this section a little heavy. That's OK. I'll just assume you're having such a great time here that you can't wait to continue. Learn what you can. You can always come back.
When it comes to getting your content into the database, the most important MarkLogic Server
built-in function is xdmp:document-load(). The first argument points to
a local file to load, and by default also forms the database URI used to store
the file. But if you desire the document to reside at a different database URI, you can
pass in a second parameter with options (see the online docs for an example).
Here's a simple example:
xdmp:document-load("/tmp/bib.xml")
This loads the file /tmp/bib.xml to the database under the name
/tmp/bib.xml.
The xdmp:document-load()
call returns the empty sequence on success and throws an error
in case of problems. To print "Loaded" after a load, use the following trick:
xdmp:document-load("/tmp/bib.xml"),
"Loaded"
When you start writing code like this you'll know you're an XQuery master.
This bit of code evaluates as a sequence of two items, the empty sequence (the
output from the xdmp:document-load() function) followed by a string
("Loaded"). Put together, the result is the simple string "Loaded".
In case of error, the xdmp:document-load()
call errors out and the trailing "Loaded"
gets ignored. To handle errors, you can use try/catch (another extension
to the language):
try {
xdmp:document-load("/tmp/bib.xml"),
"Loaded"
}
catch ($e) {
(: * below matches the element without need of declaring the namespace :)
<span>Problem loading { $e/*:message/text() }.</span>
}
The caught error is an XML node with elements like <message> that explain the reason for the error.
To view the content of a loaded document, use the standard doc()
function:
doc("/tmp/bib.xml")
This returns the document node associated with the given URI. To view a list of all loaded documents:
for $i in collection() return document-uri($i)
You saw this query earlier when you typed it into the use-cases textarea. Bringing the two queries together lets you produce a "list and view" script:
let $uri := xdmp:get-request-field("uri")
return
if (empty($uri) or $uri eq "") then
(
xdmp:set-response-content-type("text/html"),
<ul>
{
for $i in collection()
let $doc := document-uri($i)
return
<li><a href=
"view.xqy?uri={xdmp:url-encode($doc)}"
>{$doc}</a></li>
}
</ul>
)
else
(
xdmp:set-response-content-type("text/xml"),
if (empty(doc($uri)))
then <error>No content</error>
else doc($uri)
)
To give this query a spin, paste it in to CQ or put it in a file where it will be served by an app server. You'll see a <ul> listing of all the documents in the database. Each is clickable, and when you click on the document you see its raw content. (Because the script doesn't have any throttle support, be careful not to use it with long listings or large documents. Browsers don't always like showing <ul> lists of more than a thousand items or XML files of more than a megabyte.)
The script first fetches the uri
query string parameter. If it's empty,
then it treats it as a request for a listing. If it's not empty, then it's a
request for the given URI to be displayed. To handle listings, we set the
content type to text/html
and print every document-uri() linking to itself.
To handle a document view, we set the content type to text/xml
and print the
doc($uri)
result or give a polite error note if the document couldn't be found
for any reason.
Guru Tip: The parentheses (notice they're not curly braces) are required because the expression within a then or else clause has to be a single expression, and parentheses make the multiple items into a single, comma-separated sequence.
Searching Documents
Text search forms the core of a database. Search is the process of selecting from a collection of elements, those "relevant" to some search condition. Starting with version 4.1, MarkLogic Server includes Application Services which has a slick component called Search API that makes it simple to perform powerful "Google-like" searches.
A simple search can be quite straightforward:
xquery version "1.0-ml";
import module namespace search = "http://marklogic.com/appservices/search" at
"/MarkLogic/appservices/search/search.xqy";
search:search("for sale")
This returns the top ten most relevant mentions of for and sale
across all documents.
By passing in an options element node as the second argument you can get as sophisticated as your searching needs require. You can find more details about the Search API in the documentation or separate tutorials.
Hint: by default searches are case insensitive for all-lowercase tokens but case sensitive for tokens containing any uppercase characters. The logic is, if you bothered enough to capitalize, you probably meant it. To override this behavior, you can pass in settings via an options element node.
Java-Based and .NET-Based Queries with XCC
While it's easy to develop complete apps from within MarkLogic, there are times when you want to directly connect to the database from a separate application. For this, the database exposes an interface to Java and .NET clients called XDBC, and a client library in both Java and .NET languages called XCC. To get it working you need just a few things:
- The appropriate XCC client-side package files, downloadable from http://developer.marklogic.com/download/.
- The server configured to listen for XDBC connections. Use the admin pages to set this up.
- Java or .NET code written against XCC that connects to the server, executes your query, and (optionally) iterates the result.
It's that easy. The full details are explained in the XCC Developer's Guide [pdf]. You'll find Javadocs and the .NET documentation for the XCC classes included in the distribution and also online on the developer network. For connecting back to Java, see the tutorial and documentation for the MLJAM library.
Continuing On
Well, our tour's coming to an end. Let me leave you with one piece of advice: Join the developer network mailing list.
When new content is posted and new releases come out, the list is where the releases are announced. If you have questions, it's where you ask them. And if you have answers, it's where you share them. Here's the link:
http://developer.marklogic.com/discuss/
Hope to see you around!