Accelerating Big Data In Financial Services


From anti-money laundering to personalized banking, data is at the heart of innovation for financial services companies. Firms have pushed and benefitted from the advancements in big data platforms like Apache Spark as well as bigger, faster computing infrastructure. These advancements have limits, though, and only with additional acceleration technology can banks realize the potential of the data platforms and the underlying infrastructure.

In this whitepaper, we cover:

  • Use cases are driving the need for data and computing power
  • The limits to computing power improvements
  • The potential for acceleration across financial use cases
  • Hyperacceleration: maximizing the computing potential for data platforms

Download a PDF version of this guide by filling out this form, or keep scrolling to read.

Simply fill out this form to receive a PDF version of our guide.

Accelerating Big Data in Financial Services

Chapter 1

Performance-hungry use cases drive innovation

The financial services sector has long been the leading consumer of computing power and analytics, driven by the need for market and portfolio simulations, high-frequency trading, order execution, and many other examples. Today, increasing regulatory requirements, rising cybercrime concerns, the growing sophistication of consumers, and the wealth of new relevant datasets has placed big data analytics at the center of every financial firm’s  strategy.

What is at stake? Disrupt or be disrupted.

Figure 1: Financial Services - 11 Clusters of Innovation


The World Economic Forum tracks six major financial sectors and eleven "clusters of innovation," and a majority of these disruptions have to do with the better use of data, algorithms and computing infrastructure.

This unrelenting focus on technological innovation and data-driven processes is forcing IT departments to consider non­-traditional approaches to building and operating big data infrastructure.

Cloud and Open Source: Engines of Disruption

The sea change that many quants and data scientists are witnessing is that the processing power needed for high end computing tasks is now much more affordable and increasingly available via private cloud infrastructure and public cloud providers. This, combined with Open Source Software (OSS), has emboldened a new generation of entrepreneurs to challenge financial business as usual with only an algorithm, a business model and a dream.

The explosion of interest in OSS machine learning and big data platforms is also a clear indicator of the nature of the disruption to the old-world order in financial services. Open source platforms such as Apache Spark, TensorFlow and other machine learning platforms are attracting a growing number of developers who want to turn OSS big data innovations into new services.  These OSS platforms are designed to scale to handle the huge data volumes that make machine learning and big data analytics come to life. The scale of these projects puts significant pressure on IT operations and DevOps to maximize the efficiencies and performance of their computing resources.


Use Case: Risk is Predictable

Risk continues to be of critical importance across financial services segments. While there are many forms of risk, the most common form of risk across all financial segments is surrounding cybercrime and fraud. There is also a post-financial crisis regulatory aspect of risk management that forces lenders to know precisely how much capital they need in reserve. Keep too much and you tie up capital unnecessarily, lowering profit. Keep too little and you run afoul of Basel III regulations.

Figure 2: Big Data in Risk Management

There is great promise in new big data and machine learning technologies to enable lenders to tap into an ever-deepening pool of new data to analyze all aspects of risk and fraud. Identifying risk and fraud can require huge data volumes and large compute clusters which are typical of modern big data systems. 

Fraud detection is a classic example of predictive analytics at work. For data scientists, fraud can be determined precisely by building the right scoring model and associating the scoring model with actual business costs. These fraud models identify the rules of what constitutes the fraud, and then those models crunch through the relevant data sets to identify the cost of the fraud versus the cost to detect the fraud.  Therefore, cluster performance of the big data platform is becoming increasingly critical.

Use Case: Algorithmic Trading and Analytics

While using advanced algorithms to make informed trades is not new, its widespread adoption and applicability to a broader profile of traders is noteworthy. Today, asset managers, fintech companies and even retail banks are looking to provide richer analytics, daily forecasts, market advisories and recommendations to both industry and consumers. Both disruptive startups and well established financial firms invest heavily on adding speed and sophistication to automated system trading. This is putting increased stress on cloud and big data infrastructure.


Chapter 2

Looking Beyond Moore’s Law

The growth of machine learning and big data platforms has created a compute bottleneck. The main contributing factors to this bottleneck are higher data volumes, larger memory systems, solid state disks (SSDs), flash arrays, and faster networks. Traditional processors have simply not kept pace with the growing scale of the IO. 

While no one is predicting the end of many-core CPUs, there is general agreement that the long-term reliance on Moore’s Law for application scaling is coming to an end. The doubling of transistors is dramatically slowing down now reaching physical limits so compute-hungry firms are looking elsewhere to increase compute density in the data center.

Those charged with running data centers – either your own IT department or a cloud provider – are tasked with delivering high performance BI and analytics at scale. This has resulted in an increased adoption of specialized hardware accelerators such as GPUs, FPGAs and ASICs (like Google’s TPUs) to improve compute density. With public cloud leaders Amazon AWS (F1 Instances) and Microsoft Azure (Configurable Cloud)  employing FPGAs, many financial services firms are following suit in their own data centers.

Figure 3: The Economist on Moore’s Law

While FPGAs have played a specialized role in financial services for use cases like high frequency trading, the growing emphasis on predictive analytics in fraud detection and machine learning in trading and pricing systems makes the flexibility and reprogrammability of FPGAs very interesting to financial firms.  The traditional objections to using FPGAs – difficult to program, skills shortage – are addressed head-on by Bigstream Hyperacceleration. Bigstream combines software acceleration technology with hardware based accelerators in a seamless solution that provides 3x-10x acceleration of fast data workloads.

Chapter 3

The Value of Speed

Financial firms are experiencing the same exponential  increase in data volumes as everyone else. They are also the first to understand that processing data at scale is proving costly and unpredictable. While tools like Hadoop and Spark enable scaling through growing compute clusters, the limits of CPUs lead to diminishing returns of cluster growth. This is one of the key reasons for the growing adoption of hardware accelerators like GPUs and FPGAs. Hardware and software accelerators can cut processing times significantly, which yields direct business outcomes across a range of financial services applications.

Faster Trades and Insights

In areas like algorithmic trading, complex derivative or option pricing, back testing and other compute-intensive big data workloads, bringing more processing power to bear can materially affect returns and fund performance.

                                          Empowered Quants and Data Scientists
empowered quants and data scientists

Simply put, quants and data scientists want to get there faster. They are often looking for any performance edge that can enable them to iterate faster on their models, and to lower latencies for processing tick data and other incoming data streams. Being able to move daily model testing to an intra-day schedule means more accurate models in less time.

                                          Maximize IT Infrastructure ROI
maximize it infrastructure ROI

The other side of the performance equation is cost. Whether you are talking public cloud costs or data center costs like real estate, power, and HVAC, a compute power efficiency gain of even 20% can add up to millions in savings, or a significant reduction in project backlog. Then, consider that Bigstream can deliver 200% to 1000% acceleration and the ROI numbers become really interesting.

                                          Simplifying Cloud Scaling
CLOUD scaling

Different workloads have different scaling characteristics, but all workloads have this in common: if each individual cluster node is not performing optimally, then the entire cluster is not scaling to its full potential. Data-driven financial firms routinely run 100-500+ node clusters to power market simulations and customer behavior analysis in both private and public clouds. Predictable scaling models help IT Ops teams maximize                                  the value from their processing time.

Scaling the Benefits of Data Science and Machine Learning

Because of the heightened need for data science, data engineering and big data development talent, financial services companies must think of scaling in terms of the amount of data, the amount of processing, and the number of end users that can gain benefit from these large data sets.  Given the widely reported shortage of data scientists and machine learning specialists, the open source community has repeatedly turned to SQL as a key enabler to scaling the value of data and analytics.

With the shift to Spark Dataframes,myriad SQL and SQL-like access methods have become compelling options. Virtually every data scientist uses some SQL in their work and it remains a foundational tool to access, process, and analyze data. SQL access to data – wherever it lives – is a key part of scaling the value of data, the value of your data scientists, and the monetization of your business.

Key SQL Operations for Data Scientists and Quants

Aggregations Data analysis is all about aggregations. Aggregation functions are very useful for understanding the data, and to present its summarized picture.
Windowing Functions Some of the most powerful functions within SQL, these unlock the ability to calculate moving averages, cumulative sums, and much more.
Text Mining  While many turn to scripting languages for text mining, SQL has powerful built-in capabilities that can benefit from acceleration technologies.
Feature Extraction Developing 1-day, 5-day, 30-day moving averages based on daily close data is a common machine learning task for investment houses and hedge funds.


In addition to SQL, many financial firms need to incorporate algorithmic constructs in their analytics workloads through open source or proprietary libraries. Bigstream Hyperacceleration is designed to accelerate a wide variety of workloads that includes Spark SQL, Spark Dataframe/Datasets, and User Defined Functions (UDFs).

Chapter 4

Bigstream Hyperacceleration in Financial Services

Bigstream Hyperacceleration enables Apache Spark to make the best use of underlying processing platforms. Bigstream uses advanced compiler technology to provide native scaling of Spark workloads. Bigstream also offers automatic programming for FPGAs to provide frictionless acceleration of big data workloads. This is done without impacting software developers because no application code changes are required, and no FPGA programming skills are necessary.

Figure 4: Bigstream Hyperacceleration business benefits

Figure 4: Bigstream Hyperacceleration business benefits

In a typical big data analytics pipeline, Bigstream can accelerate data ingest, data discovery, data parsing, ETL transformations, SQL analytics, data compression and decompression, User Defined Functions, and numerous other processes where Spark SQL is used.

Unlike other hardware-specific acceleration products, Bigstream Hyperacceleration provides platform-level acceleration, requiring no special APIs, no code rewrites or application redesign. This is accomplished by using an intelligent and adaptive combination of acceleration techniques such as zero-copy, in-line code optimizations, locality tuning, vectorization, native compilation of Spark functions and UDFs.

Bigstream Hyperacceleration can help you accomplish the following:

  • Improved fraud detection is achieved through more precise models – enabling data science teams to use data to understand customer behaviors and to predict future behavior. How are more precise models achieved? After continuous tweaking and iterating on these models until they display optimum performance. If one iteration of model development can be accelerated by 2X, 5X or 10X, then fraud models improve and more fraud is detected or prevented.

  • Greater analytical throughput from being able to mine data from diverse sources and get that data organized into a structure that is meaningful to business users. The ability to bring the full power of many-core CPUs and FPGAs together without burdening the developer saves computing time, streamlines DevOps, and reduces uncertainty at the customer site. Questions like “how can I take full advantage of FPGAs, CPUs and Spark to advance my business goals?” now have an answer: deploy Bigstream Hyperacceleration.

  • Modernize Data Warehousing and BI.  The trend away from ivory tower systems that are the playground of a chosen few is rapidly fading. The race is on to get more intelligence to more end users faster.  Boosting performance of everything that makes up a modern big data warehouse is critical to achieving that goal.

  • Curb out of control big data infrastructure costs.  While public and private clouds have undoubtedly made deploying and scaling new applications easier, it is a foregone conclusion that this level of abstraction doesn’t usually yield the best performance, which translates quite directly into OPEX costs, whether paid to Amazon, Microsoft, or internal IT.

Bigstream speedup with F1 Instances 100 TPC-DS Queries

Bigstream Speedup with F1 instances

* Each query was run in 5-node clusters, first on AWS R5.2xlarge instances and then F1.2xlarge instances with Bigstream

Figure 5: Bigstream TPC-DS benchmark results


Bigstream TPC-DS benchmark results show Bigstream-accelerated Spark performing at an average of almost 3X faster than Apache Spark on the same hardware. Early benchmark results using FPGAs suggest that 10X acceleration and beyond is possible.

These are just a few examples of how big data is changing the financial services landscape. Entrepreneurs in FinTech and data driven development teams are increasingly relying on new generation open source platforms like Spark to redefine financial analytics.  Bigstream Hyperacceleration provides a frictionless method for these professionals to achieve super-computing performance on the new generation of big data infrastructure.

For more information about how Bigstream Hyperacceleration works, read the Bigstream whitepaper or contact our team for more information or a demo.

close chapters modal

Download a PDF version of this guide by filling out this form

Simply fill out this form to receive a PDF version of our guide.

Accelerating Big Data in Financial Services