The Oracle database is one of the most successful software products of all time. In the past 40 years, it has evolved to become a dominant technology for storing and managing data, and the RDBMS and SQL standards are now entrenched in every large enterprise. Alternatives like MarkLogic were built more recently to address new data management challenges, particularly with data integration.
Today, organizations are confronting new challenges that did not exist in the 80’s and 90’s. Data is big, fast, varied, and changing. Instead of a small handful of systems, organizations have hundreds of systems and petabytes of data. And, business needs are changing more quickly than ever before, and are more regulated than ever before. This means rethinking how data is managed in order to meet quickly evolving business needs:
- Organizations need agility to integrate data, get a unified view of information, and deliver durable data assets quickly. Organizations cannot wait months or years to integrate data
- Organizations need to reduce costs — including the amount of development time on new systems and apps and the time spent managing infrastructure. As organizations make their journey to the cloud, they must keep cloud costs under control
- Organizations need to improve security and governance for better data sharing. Organizations do not have time to waste on patching up data silos and shadow IT. They need to proactively manage their data throughout the integration lifecycle
MarkLogic provides distinct advantages compared to Oracle in all of the above areas. MarkLogic provides greater agility to integrate data and with less risk, makes cloud costs significantly lower and more predictable, speeds up delivery of new applications, and does not sacrifice data security or governance.
This comparison looks at the underlying differences between Oracle and MarkLogic databases, and also how MarkLogic Data Hub Service stacks up against Oracle’s suite of cloud products. In summary, the main underlying differences are the following:
- Oracle and MarkLogic have fundamentally different data models (relational versus multi-model)
- Oracle and MarkLogic have fundamentally different approaches to indexing and data access (relational indexing and SQL versus MarkLogic’s universal, multi-model indexing and search-driven approach)
- Oracle and MarkLogic scale much differently (scale-up system versus MarkLogic’s distributed, scale-out system)
- Oracle and MarkLogic have very different approaches to data integration in the cloud (suite of traditional relational products that require heavy maintenance versus MarkLogic’s unified, agile approach that requires little tuning and maintenance)
What is Oracle?
Oracle is one of the ten largest software companies in the world and provides the most widely adopted relational database. Oracle continues to drive a relatively large percentage of its revenue from licensing the Oracle database (and its many derivatives), and in recent years has made significant investment in building out Oracle Cloud, its suite of over 120 different products spans SAAS, PAAS, and IAAS.
The Oracle relational database was first released in 1979 and the latest release of that software is Oracle Database 19c (the long term support release of Oracle 12c and 18c). There are various iterations of this core product, and Oracle’s product suite now includes specific products for different workloads (analytics vs transactional), engineered systems (combined hardware/software appliances), and fully managed cloud services.
Oracle’s Key Product Lines Include
- Oracle Database 19c – Released in 2019, Oracle 19c is the latest version of the original Oracle relational database and is the flagship offering for on-premises environments. Oracle calls version 19c the long term support, or “terminal patch” version, in that it is the final release of the Oracle 12c family of products, which includes Oracle 18c. It is also possible to run it as an engineered system using Oracle Exadata.
- Oracle Autonomous Database – First available in 2018, this is Oracle’s cloud service offering. While it runs on Oracle database, it is built and marketed as a separate product family. Users get to choose between two options: Oracle Autonomous Data Warehouse for analyzing data, and Oracle Autonomous Transaction Processing for processing mixed workloads.
Additional Oracle Database Products (Software and Cloud)
- Oracle NoSQL Database – A Key Value store (a two-column relational database designed to query only on the primary key), that added native JSON support in 2018. Open-source databases like Redis offer similar functionality and the same data model.
- Oracle NoSQL Database Cloud Service — Fully managed service with similar capabilities to the on-premises version mentioned above. It was launched in March 2019.
- Oracle Berkeley DB – Includes Oracle Berkeley DB XML for storing and querying XML documents with XQuery. This came from the acquisition of Sleepycat, was last updated in 2017, and has a small customer base. No cloud service version is available.
- Oracle XML DB (cloud service capability) – XML storage and query added May 2019, many features not available. See Oracle docs for updates.
- Oracle Spatial and Graph – An option for 19c, this includes the ability to store RDF triples and build knowledge graphs. Underneath, the triples are stored in relational tables.
- Oracle Spatial and Graph (cloud service capability) – Capabilities being added over time. See Oracle docs for updates.
- Oracle Text – An option for 19c that includes the ability to build full-text indexes and then search and analyze using standard SQL
- Oracle Text (cloud service capability) – Full-text search added May 2019. Many legacy features not available. See Oracle docs for updates.
- Oracle TimesTen In-Memory Database – Oracle’s in-memory relational database similar to Redis, SAP HANA, or MemSQL. By running in-memory, these databases serve as excellent caching layers. Oracle also sells it as an engineered system called Exalytics.
- MySQL RDBMS – Offered as community and enterprise editions, it joined Oracle’s product line when Oracle acquired Sun Microsystems in 2010.
- Oracle Big Data Appliances and Connectors – Oracle also sells multiple products for customers who want to integrate their Oracle estate with tools like Hadoop, enabling them to connect and access data that might not be stored directly in Oracle.
Additional Oracle Data Integration Products
- Oracle Fusion Middleware – Software designed for on-premises, SOA integration using ESBs. Mostly a compilation of products from acquisitions.
- Oracle Golden Gate – Software for change data capture, distribution, transformation, and delivery. Often used for real-time data flows from operational to analytical systems. Also available as a cloud service.
- Oracle Data Integrator Cloud Service – Part of Oracle’s newer suite of cloud products, this is software for traditional ETL work, but in cloud environments. Also available as a cloud service.
- Other Big Data Cloud Service – Cloud service for data analytics, leveraging Hadoop and Spark
What is MarkLogic?
MarkLogic Data Hub Service is MarkLogic’s flagship product. It is a fully managed cloud data hub for agile data integration and data management. Built on MarkLogic Server, it has all the same multi-model, security, and scale-out capabilities.
MarkLogic Server is a multi-model database with modern NoSQL and trusted enterprise capabilities. It can be deployed as part of MarkLogic Data Hub Service, or alone in any environment (on-premises, cloud, hybrid).
MarkLogic also develops associated tools and connectors for the ecosystem, which includes various APIs and connectors.
What's the Difference Between MarkLogic and Oracle?
There are two ways of comparing MarkLogic to Oracle:
- MarkLogic Server vs Oracle Database — The main focus of this comparison is the data model. MarkLogic has a multi-model approach and Oracle has a relational approach. In sum, Oracle offers a fantastic relational database and for organizations needing a relational database and are committed to the Oracle ecosystem, then it is a good choice. But, for organizations needing the flexibility to integrate data from silos, then MarkLogic’s multi-model approach has some distinct advantages. This is discussed in more detail below (including why Oracle’s most recent claims of being multi-model are somewhat lacking).
- Suite of Oracle cloud services to create a “data hub” vs MarkLogic Data Hub Service — For data integration in the cloud, to get functionality similar to MarkLogic Data Hub Service, you would need Oracle Autonomous Database in combination with other tools such as Oracle Data Integrator Cloud Service, Oracle GoldenGate Cloud Service, Oracle NoSQL Database Service, and other services. And, regardless of environment, organizations still run into the underlying relational versus multi-model database issues.
- Note that Oracle did have a “Data Hub Service” product, announced in 2017, but it is no longer marketed by the company.
- Also, note that you can run a MarkLogic Data Hub on-premises, and the comparison is very similar. In that case, organizations still need a large suite of Oracle products such as Oracle NoSQL database, Berkeley XML DB, Oracle Spatial and Graph, ETL tools like GoldenGate, and other big data tools to match the functionality of the MarkLogic Data Hub.
Comparing MarkLogic and Oracle Databases
MarkLogic Server was the first modern, multi-model database on the market. MarkLogic has multiple ways to model data (e.g., documents, graphs, relational), even data that represents the same entity. And, MarkLogic supports storing data in multiple schemas at the same time — all in the same database — with a single integrated back end.
Only a few years ago, the term “multi-model” was relatively new, and it required significant effort to explain what it was and why you needed it. Today, that is not the case — there is widespread acknowledgment that multi-model databases should be part of any modern data architecture.
The following resources provide a deeper dive into understanding the multi-model advantage provided by MarkLogic:
- O’Reilly book on multi-model databases
- Webpage on multi-model
- Escape the Matrix virtual tour
Here’s the summary of MarkLogic’s multi-model benefits:
- Flexibility to handle ever-changing data and schemas and ability to do schema on read and handle multiple schemas (versus having to define your strict schema in advance)
- Richer representation of your business entities and richer querying and apps as a result (versus having to define your entities in the context of strict relational schemas)
- Eliminating data silos (versus requiring a new database for every new application)
- Agile data integration and curation (versus lengthy ETL and upfront data modeling)
- Overall simplification of your data architecture (versus having to knit together various databases; polyglot persistence in other words)
Oracle is a relational database that stores data in rows and columns. Here are three specific examples to highlight how this approach differs from a multi-model approach:
Data Modeling and Access — With Oracle, users need to understand relational schemas that are often very complex. With this structure, data defining a single business entity may be split across a large number of tables. This usually results in cryptic column and field names (or VARCHAR columns) that only the database administrator understands, which means only they know how to properly access the data. For example, if data about drugs is stored, users must know whether to query on “aspirin”, “acetylsalicylic acid”, “Excedrin”, or “Bufferin” (all names for the same thing). If users query on the wrong term they miss most of the results.
MarkLogic Server solves these issues by using the document model that is more human readable and does not require shredding entity data. Also, users of MarkLogic Server can rely on its built-in search and semantic capabilities to search across the data like a knowledge graph, making it much easier for non-database and domain experts to query the data.
Indexing and Performance — With Oracle, there is significant maintenance overhead as database admins spend time constantly optimizing tables and indexes for query performance. For example, Oracle requires constant defragmentation of the tablespaces (perhaps weekly or more often depending on amount of deletes), in order to maintain insert performance. Also, Oracle indexes usually require constant rebalancing and re-indexing. With Oracle, users can expect that execution plans will get frequently impacted and new ones will need to be created to maintain performance. Relying on the Oracle optimizer can make things worse even with a well maintained system. Also, using Oracle data replication can negatively impact transaction performance.
Unlike a relational database, MarkLogic Server has a Universal Index that automatically indexes words, phrases, relationships, values, and structure. This index requires zero maintenance to build, update, or keep in sync. And, query performance against this index is more like Google search and is consistent even as workloads vary.
Metadata and Data Governance — With Oracle, tracking metadata requires upfront planning, changes are complex, and data is often lost. Like other relational databases that have defined schemas, columns must be added to handle new pieces of metadata. But often, metadata is discarded or just stored separately. Provenance and lineage information is metadata that is critical for data governance, but is often too cumbersome to manage, especially across a complex data integration life cycle.
MarkLogic Server stores any amount of metadata right alongside the data itself — it’s just more attributes in the document. The PROV-O standard is used for storing provenance and lineage metadata, so any tool can understand it. Furthermore, as with any data in MarkLogic, it can be harmonized and semantically enriched. The same can’t be said with a relational approach.
Is Oracle a Multi-Model Database?
In some ways, Oracle’s latest version is a multi-model database. Historically, however, Oracle ridiculed any other approach than a purely relational one. In a 2015 eWeek article, Oracle executive Andy Mendelsohn said that NoSQL databases, including multi-model databases, were “designed for simple data management problems”, have “very low productivity,” and are “limited.” In reality, there has been massive growth in the NoSQL market. And, in 2019, Mendelsohn stated at Oracle Open World that Oracle actually is a NoSQL multi-model database after all: “over the years, relational databases have become multi-model databases. We support JSON as a data type. We support XML…”
He’s right. It is possible to ingest JSON, XML, and RDF in Oracle. But, underneath Oracle is still relational, not truly multi-model, just like every other version before it.
Oracle does not natively store JSON. Oracle’s documentation states that: “In Oracle Database, JSON data is stored using the common SQL data type VARCHAR2, CLOB, and BLOB (unlike XML data, which is stored using abstract SQL data type XMLType).”
As a result, in order to retrieve a value from the JSON document, the entire JSON document must be traversed to locate the data. This approach is slow. There are two workarounds Oracle recommends to improve performance. One workaround is to extract the data into a materialized view, pushing values into another table (i.e. shredding). The other workaround is a JSON search index, which does not maintain ACID compliance (it is only updated periodically when triggered).
In general, handling multi-model workloads with a relational database like Oracle will be hard, brittle, or both. In addition to running into simple challenges like querying documents, a relational database cannot do more advanced functions like link documents together with triples or query XML and JSON together – tasks that come easy with MarkLogic Server.
In the past, developers were often forced to use relational databases because of their broad adoption. As Jeff Bezos pointed out in his 2018 letter to shareholders, “the broad familiarity with relational databases among developers made this technology the go-to even when it wasn’t ideal.” Today, more than ever, users have a choice about which database to use and the adoption barrier is growing smaller as multi-model gains widespread popularity.
Database Comparison Table
|MarkLogic Server||Oracle Database|
|Search & Query|
Note: Oracle does have another product called Oracle NoSQL database. That database is a Key Value store, which is very different from a document-oriented database like MarkLogic Server. Key Value stores are most often used as caching layers where they are optimized for simple low latency processing, not for data integration. For that reason, we do not cover it in this comparison.
Comparing MarkLogic and Oracle Cloud Data Hubs
This comparison looks at the similarities and differences between MarkLogic Data Hub Service and an Oracle “cloud data hub.” That is in quotes because Oracle does not provide one unified product. If an organization wants a cloud data hub with Oracle, they must stitch together a collection of Oracle Cloud products (or other third party products) to achieve similar functionality.
Here’s a list of some of the Oracle Products that may need to be stitched together to create an Oracle “cloud data hub”:
- Oracle Autonomous Database – Oracle database optimized for OLAP
- Oracle Autonomous Transaction Processing – Oracle database optimized for OLTP
- Oracle Spatial and Graph – Geospatial data/query and some support for RDF graph storage
- Oracle Text – Full-text search
- Oracle NoSQL Cloud Service — Provides support for JSON
- Oracle XML DB – XML storage and query
- Oracle GoldenGate Cloud Service – Operational data store, some transformation work
- Oracle Data Integrator Cloud Services – ETL pipelines and connections
- Other Big Data Cloud Service – Data analytics, leveraging Hadoop and Spark
Why is MarkLogic a Better Cloud Service Option?
MarkLogic Data Hub Service is a better choice compared to the collection of Oracle’s products because of the following advantages:
- MarkLogic helps organizations integrate data 10x faster with a modern, agile approach (versus using a traditional approach with lots of upfront modeling)
- MarkLogic is more flexible (multi-model versus relational foundation) and better optimized to store documents and triples
- MarkLogic is simpler and faster to deploy, operate, and maintain (1 product versus a large handful of Oracle products)
- MarkLogic is lower cost and more transparent (who really wants another Oracle ELA?). There are numerous examples of customers saving millions of dollars by choosing MarkLogic (read the Chevron case study for example)
With MarkLogic Data Hub Service, organizations can skip the big upfront modeling steps required to load data. When loading data, users simply add the data they have as needed to meet the immediate business need. Users can represent repeating hierarchical attributes like phone numbers and addresses naturally as JSON or XML documents, without having to build out separate tables. And, because MarkLogic indexes data as users add it, it’s immediately discoverable.
To integrate data, MarkLogic users iteratively build a canonical model of the data, master the data, enrich it with metadata, add semantics, and govern the entire process. This makes it much faster to create data services for downstream business needs, with less risk when things change. And, MarkLogic supports real-time, operational applications, in addition to traditional BI and analytics.
Deploying cloud infrastructure is fast, often complex, and can get expensive quickly without the proper considerations. It is important to understand what inputs determine cost and how variable a cloud service is to prevent cost overruns. Different services handle bursting much differently (when excess demand spikes past your normal predicted consumption), and that leads to large variations with consumption-based economics.
MarkLogic Cloud Consumption Model
MarkLogic Data Hub Service uses a consumption model and is priced along three dimensions:
- Bandwidth – priced in $/TB
- Storage – priced in $/GB/Month
- Compute – priced in $/MCU/Hour
The service takes any workload’s context into account when upscaling or downscaling the cluster. It independently scales operational, analytical, and curation workloads to provide a high degree of reliability and responsiveness. This consumption-based pricing frees organizations from unpredictable spending that often comes with complex cloud services.
MarkLogic Data Hub Service makes scaling and bursting simple and predictable. MarkLogic uses a system like rollover minutes so that unused units get saved, and “rolled over” to the next billing cycle. Organizations can store up to 12x of their capacity (not 2x), and do not have to pay extra to use those credits that were already paid for. With this approach, organizations avoid the costly mistake of provisioning for the peak, but also avoid letting the bill get out of control when a spike happens.
Oracle Cloud Consumption Model
Oracle also uses a consumption-based billing mode, and uses their own cloud credit units, named Oracle uses OCPUs (Oracle Compute Units). Similarly, organizations also have to pay a nominal rate for bandwidth and storage on top of that.
Oracle’s set of cloud services, put together, will be more expensive than MarkLogic Data Hub Service. This is because organizations will simply be running more services that require orders of magnitude more infrastructure (data duplication, backup and recovery, redundant indexing, etc.). Organizations have to pay for capacity for each of Oracle’s services whereas with MarkLogic, you’re only paying for one comprehensive service. Cloud credit units are somewhat arbitrary so it is difficult to be precise, but estimates show that Oracle is 3x the cost of MarkLogic for most use cases. It is well known among both companies and analysts that Oracle’s pricing is a frequent cause for concern, and organizations run the risk of surprise billing in arrears.
And, the cost differences above are not even taking into account how each service handles bursting. Both products can handle bursting, but with Oracle it is more unpredictable, restrictive, and expensive than MarkLogic. With Oracle, organizations are billed pay-as-you-go pricing for excess capacity when bursting, and can only burst to 2x their subscription (See Oracle docs). Also, when organizations exceed their bursting limit, Oracle Cloud notifies them and suspends the account (See Oracle docs).
The Future of Cloud
MarkLogic is a cloud-neutral vendor and a strategic partner of the leading cloud providers. MarkLogic Data Hub Service fits seamlessly into their ecosystems. Rather than just another relational database (AWS and Azure have relational databases you can use), MarkLogic provides a highly differentiated product and provides the flexibility for customers to change cloud providers later if necessary.
Oracle dominated the legacy database software market but that is not the future. The future is cloud data management and that is not Oracle’s strong suit. As an article in Forbes pointed out, “Only 2 percent [of CIOs] surveyed see Oracle as ‘their most integral vendor for cloud computing.'”
Cloud Comparison Table
The following table compares MarkLogic Data Hub Service to the collection of Oracle Cloud components required to achieve similar functionality.
|MarkLogic Data Hub Service||Oracle “Cloud Data Hub” Components*|
|Security & Governance|
*Autonomous DB with options, Data Integrator, GoldenGate, other related cloud services mentioned above
When to Use Oracle vs MarkLogic
Oracle is designed for storing and managing traditional relational data modeled in rows and columns and queried with SQL. This structured approach, combined with the ubiquity of SQL and relational modeling skills in the market, means that Oracle is used to run transactional and analytical applications for which it is well-suited. Given Oracle’s long history as an on-premises software leader, many organizations already have Oracle ELAs in place. In those instances when data is predictable, managed on-premises, and licenses are available, it makes sense to continue using Oracle.
When data management workloads become larger, more varied, and more complex — and as organizations migrate to the cloud — then MarkLogic is a better choice.
When to Use MarkLogic vs Oracle
MarkLogic is a better choice than Oracle for use cases around data integration — especially when it involves large, complex data sets required for both transactional and analytical purposes. This may mean building a data hub for use cases like 360 of anything, operational analytics, or search and discovery. Whenever the data is somewhat messy and rapidly changing, it will work better in a data hub with a multi-model database than in an RDBMS.
While MarkLogic is not a rip and replace option for Oracle, MarkLogic can easily ingest data from Oracle. Also, MarkLogic does support relational views for traditional SQL querying and one-click integration with leading BI tools. For these reasons, there are many organizations that use Oracle alongside MarkLogic, allowing both technologies to excel at what they do best. For example, they may use Oracle GoldenGate to help aggregate data from upstream Oracle systems before ingesting it into a MarkLogic Data Hub — this is a common pattern. Oftentimes, it makes sense to take advantage of an existing Oracle license when first getting started with MarkLogic even if there are longer-term plans to phase Oracle out.
Customers Choosing MarkLogic Over Oracle
Here are some specific examples where organizations specifically chose MarkLogic instead of Oracle:
- Chevron – MarkLogic’s “Asset 360” Data Hub provides a unified view of refinery data, including facilities, equipment, and other information to provide safety and efficiency. According to their Chief Architect, “If you tried to model a thousand types of things in a relational structure, you would really struggle. With this format and managing the data this way, we could solve a master data management problem that has been challenging me for years.”
- ABN Amro – MarkLogic provides real-time analysis of 40+ million trade records. The bank’s principal architect said, “We need the ability to respond quickly to changing regulatory requirements. We chose MarkLogic because trade data is notoriously difficult to handle in relational databases.”
- HealthCare.gov – MarkLogic provides insurance for over 10 million Americans, handling 280,000 concurrent users and 6,500 transactions per second. According to Dr. Stephen Parente it is, “The largest personal data integration government project in the history of the Republic.” (Also, compare our success to Oracle’s failure in Oregon)
- L.A. Care – MarkLogic supports the accuracy, validation, analytics, reporting, and publication of provider data. According to the Chief Digital Officer, “We found that the scale of efficiencies with MarkLogic’s technology were 10x, both in terms of competition in the market and in terms of cost and delivery. We are really happy with that.”
Escape the Matrix Virtual Tour
This quick tour walks through in more detail how a relational approach hampers data integration and creates more risk.
MarkLogic from a Relational Perspective
This 3-part blog series, written by an engineering veteran in the financial services industry, discusses why organizations are moving to multi-model and what some of the key concepts are when making the transition.
Data Hub Guide for Architects
This in-depth eBook provides a history of data integration and the underlying problems with a relational and ETL-driven approach, and how MarkLogic simplifies it.