At the same time, the demand for big data and machine learning continues to grow as enterprises use data for a competitive advantage. Data demands have inspired a new generation of tools, such as Apache Spark and TensorFlow, that are pushing advanced analytics into the mainstream.
These tools generally utilize a cluster of servers to address these large computations, but cluster scaling alone has limits in its ability to provide high performance. Scale-up and scale-out strategies can work effectively for smaller workloads, but they run into diminishing returns when cluster sizes (scale out), or server capability (scale up) grow larger.
Hardware acceleration such as graphics processing units (GPUs) or field-programmable gate arrays (FPGAs) provide a vehicle for high performance and, in fact, enhance the gains of scaling. To date, however, acceleration has had limited success due to a key gap between data scientists and the performance engineers working with the computing infrastructure, illustrated by Figure 1.
Figure 1: Programming Model Gap Inhibiting Hardware Acceleration
Until recently, there was no automated way for big data platforms such as Spark to leverage advanced field programmable hardware. Consequently, data scientists and analysts had to work with performance engineers to fill that programming model gap. Though feasible, this process was inefficient and time-consuming.
Data scientists, developers, and quantitative analysts are accustomed to programming using big data platforms in a high-level language. Performance engineers, on the other hand, are focused on programming at a low level, including field-programmable hardware. Thus, the scarcity of resources, along with additional implementation time, would significantly lengthen time to value of analytics when accelerating. In addition, the resulting solutions would typically be difficult to update as analytics evolve.
Figure 2 illustrates Bigstream’s architecture to address this gap.
Figure 2: Bigstream Hyperacceleration Addresses the Gap
At a high level, Bigstream Hyperacceleration automates the process of acceleration for users of big data platforms. It includes compiler technology for both software acceleration via native C++, and FPGA acceleration via bitfile templates. This technology yields up to 10x end-to-end performance gains for analytics, but with zero code change.