Nieuws (algemeen)

The Zoomdata Query Engine with Pushdown Processing

In our last post, we explained how Zoomdata’s microservices architecture makes it easier to keep pace with advancements in technology and empower users to perform speed-of-thought analytics. Now, we’ll explore the Query Engine with pushdown processing.

The Zoomdata Query Engine is a microservice that sits between the web application (or a client application built with the Zoomdata JavaScript client) and the Zoomdata Smart Data Connectors. It is purpose-built for a general user population, which can be unpredictable and demanding.

The Query Engine has three primary roles:

  1. Deconstruct and convert user requests into distributed query execution plans
  2. Optimize queries and execution plans based on data platform capabilities, in-memory cached results, and its own Query Engine capabilities
  3. Execute data functions:
    1. Communicate with Zoomdata Smart Data Connectors to execute pushdown queries (see below)
    2. Retrieve data from in-memory cached results as appropriate
    3. Use in-memory processing to combine, append, and/or manipulate one or more data sets to produce only the values needed to fulfill the user’s request.

It can be deployed with all Zoomdata microservices on a single computer, or it can be deployed separately with a modern resource manager like YARN or Kubernetes (coming soon!) to take advantage of distributed processing.

Pushdown Processing

Unlike most traditional and even newer BI alternatives, Zoomdata is intelligently architected to allow users to interact directly with data down to row-level detail (within the context of one’s security privileges, of course!). Pushdown processing complements our implementation of [link to prior post] websocket communication [/link to prior post], and is necessary to support an ad-hoc, interactive user experience on fresh data. As users explore data at progressively lower levels of detail, Zoomdata pushes processing down as new queries. This is in contrast to other solutions that query and work off data extracts. Extracted datasets, whether in a cube, flat rows, or other format, restricts what and how the analysts engage with their data.

With Zoomdata, the database returns only the values that the Query Engine needs to populate the user’s visualizations. The push-down architecture also avoids scaling up the Query Engine unnecessarily when complex processing can be better executed on high-performance database engines or scalable data platforms.

Query Optimizations Reduce Database and Network Load

Zoomdata pushes down as much work to the underlying data sources as possible. The Query Engine evaluates and optimizes each end-user request, and determines whether to submit all or part of the request to the target data sources. The engine can, if appropriate, push down filtering criteria, aggregations, calculations, and offset, limit, sort, and time bucketing operations.

Pushing down filters means the data platform engine doesn’t need to scan large datasets unnecessarily. It also reduces the amount of data transferred over the network from the data source to Zoomdata. Zoomdata can push down all filters required by a user’s security profile or that a user requests in the web application.

Pushdown of aggregations and calculations optimizes performance for the most resource-intensive operations. Zoomdata always pushes down aggregates: min, max, sum, avg, count, distinct count, last value, and percentiles. Where advantageous, the Query Engine combines several simpler aggregates to compute more complex metrics.

Time for Something Different

Zoomdata offers automatic time bucketing, which allows users to group and filter data by time categories such as current or prior week, month and year, rolling time periods, and so on. There’s no need to pre-aggregate or model time buckets, which frees technical personnel to focus on higher value objectives. All that’s needed is a date-time field. The Query Engine does all the work of interpreting and converting user requests to one or more queries and pushing the whole operation down to the data source.

To re-cap, the benefits of Zoomdata direct-connect with pushdown processing are:

  • The user always has access to fresh data from the source
  • Compute resources are scaled and managed where they make the most sense
  • Network bandwidth is conserved
  • Works very well for hybrid-cloud deployments, since there’s no need for massive data movement between systems

Up next, Big Data Exploration with Microqueries & Data Sharpening™.




0 rating(s).


Zoomdata, 12-02-2019 19:06:30

Nog meer up to date zijn? Kijk dan ook eens bij de video's.