Levyx Connectors

Plug-Ins solutions that seamlessly connect Levyx's software-defined data processors to major Big Data Applications

Enabling Spark Analytics on Bare Metal

The Levyx-Spark Connector™ is software that allows Helium/Xenon to run Apache Spark deployments far more efficiently by offering in-memory speed RDDs and data frames that are also persistent. With additional query pushdown features, the Levyx-Spark Connector is able to greatly accelerate operations such as large sorts and joins, streamline multi-job/multi-tenant workflows, and reduce the overall number of Spark nodes needed to execute those jobs.

Carefully designed to not disrupt the underlying Apache Spark deployment, the Levyx-Spark Connector acts as a “plug-in” to the existing platform, seamlessly integrating into it while supercharging its performance.

Connects Xenon to Spark

  • Spark RDD/DataFrame maps to Xenon dataset
  • Pushdown of SQL queries to Xenon layer
    • Spark context extended with 3 simple API (i.e., fromXe, toXe, runXe)
  • JIT “C” level compilation/execution

Combined solution provides superior performance vs Conventional Apache Spark especially in situations involving:

  • Large datasets dealing with sorting, joins, group-by (heavy shuffling)
  • Ideal for workloads involving small Random inserts, point queries
  • Leveraging Index lookups vs  filtering/full-table scan

This Spark comparison runs the application using Apache Spark and Levyx’s version of Spark (Levyx-Spark™) using its Levyx-Spark Connector™ (i.e. performance plug-in) for 10 iterations. As the following chart indicates, Levyx-Spark takes less than half as long to complete the iterations (some subsequent iterations take up to 4X longer using conventional Spark to complete compared to the Levyx-Spark solution):

Through this implementation we essentially achieve performance thresholds that are comparable to Spark Analytics on bare with these added benefits:

    • Persistence In-memory speed RDDs and data frames that are also persistent
    • Multi-tenancy Great accelerate operations such as large sorts and joins, streamline multi-job/multi-tenant workflows
    • Offload Offload analytics to our JITC
    • Indexed Data Dataframes are indexed without re-scanning the data
    • Updateable Updates can be done in-place without creating new dataframes