Authors:
Ricardo Jimenez
1
;
Marta Patino
2
;
Ivan Brondino
1
;
Valerio Vianello
2
;
Ricardo Vilaca
1
;
Boyan Kolev
3
;
Patrick Valduriez
3
;
Raquel Pau
4
;
Apostolos Hatzimanikatis
5
;
Vassilis Spitadakis
5
;
Dimitris Bouras
5
;
Yorgos Panagiotakis
5
;
Giorgos Saloustros
6
;
Anastasios Papagiannis
6
;
Pilar Gonzalez-Ferez
6
;
Angelos Bilas
6
;
Ying Zhang
7
;
Pavlos Kranas
8
;
Sotiris Stamokostas
8
;
Vrettos Moulos
8
;
Fotis Aisopos
8
;
Francois Sabary
9
;
Luis Cortesao
10
;
Diogo Regateiro
11
;
Jose Pereira
12
and
Rui Oliveira
12
Affiliations:
1
LeanXcale, Spain
;
2
Universidad Politecnica de Madrid, Spain
;
3
INRIA, Missing City, France
;
4
Sparsity Technologies, Spain
;
5
Neurocom, Greece
;
6
Foundation for Research and Technology - Hellas (FORTH) & Institute of Computer Science (ICS), Greece
;
7
Xiamen University Tan Kah Kee College, China
;
8
National Technical University of Athens & ICCS, Greece
;
9
Quartet FS, France
;
10
Altice Labs, Portugal
;
11
Instituto de Telecomunicacoes, DETI, University of Aveiro, Portugal
;
12
INESC TEC, Portugal
Keyword(s):
Polyglot Persistence, Transactions, ACID, Polyglot Queries, Query Processing, SQL, NoSQL, CEP, Big Data.
Abstract:
There has been a blooming of new data stores addressing new challenges
in data management of structured, semi-structured and unstructured data
in the last years. In the semi-structured segment, many different kinds of NoSQL
data stores have emerged proposing new data models and associated query languages
and/or APIs appropriate for them, such as document-oriented data stores,
key-value data stores and graph databases. On the structured data world, new
technologies have emerged such as NewSQL, columnar data warehouses or inmemory
databases. This adoption of new data stores store has led to a proliferation
of data stores at organizations raising interesting challenges. On one hand,
NoSQL data stores were born in a world where scalability was considered a key
attribute and many of them attained this scalability by dismissing the transactional
properties, since it was the main bottleneck in data management technology
preventing from achieving scalability. This design decision has enabled
many
NoSQL data stores to achieve different degrees of scalability but at the cost of
losing consistency in the advent of failures and/or concurrent access. On the other
hand, having multiple data stores results in the so-called polyglot persistent
environments that create two interesting problems. The first one happens when
a business action requires updating data across multiple data stores, which lacks
the transactional consistency provided by OLTP databases, resulting in a loss of
consistency when conflicting accesses happen or a failure occurs at one or more
nodes. The second one is that exploiting the data in a polyglot persistence environment
is an extremely hard task unreachable for most organizations. The reason
is that different data stores provide different query languages or APIs. This forces
to program a query engine at the application level that is obviously not feasible
for most organizations or to move all the data to a data warehouse forcing to put
a relational schema to data that is schemaless such as the one stored on NoSQL
data stores. It turns out that the motivation to introduce data stores is precisely
to prevent to put under a relational schema data not suited for such model and
requiring the flexibility of other data models such as documents, key-value or
graphs. CoherentPaaS comes to solve the three above problems. On one hand,
it solves the pains related to updates across different data sores by providing a
holistic transactional manager that enriches NoSQL and other transaction-less
data stores with transactional semantics without affecting their scalability and to
execute holistic or global transactions across any subset of data stores integrated
into the CoherentPaaS platform. On the other hand, it solves the problem related
to queries in polyglot persistence environments by enabling the combination of
SQL with the native query languages and naive APIs to make queries across data
stores with arbitrary data models and their query processing framework.
(More)