Why Banks Are Taking Big Data to the Cloud

The columns of a bank shown against a blue sky and clouds

Why Banks Are Taking Big Data to the Cloud

A conversation with Ed Franklin, Executive Vice President, Veristorm, Inc.

Before we talk about how to take Big Data to the cloud, can you tell us why banks are interested in Big Data?

Financial institutions leverage Big Data because they want better predictive analytics to get to the “market of one.” For example, If I have a checking account at Chase, I might also have a Chase credit or debit card, a retirement account, an auto loan and so on. Financial organizations want to better understand what moves their customers are going to make at different stages of their lives. They want to predict when the customer might be buying and selling stocks, adding to their 401k, using their credit card more often, taking out loans, and so on. With better insight into their customers, they can offer more attractive services. This is a win for the customer and of course it’s more profitable for the bank as well.

The challenge and opportunity of Big Data comes from the fact that over the years banks grew through consolidations, mergers and acquisitions. As a result, these firms inherited and now operate legacy systems that are stitched together, typically with an e-commerce or a mobile front end so that the client thinks that they are dealing with one unified entity. They have more data available, but it’s in many disparate formats and systems. Big Data excels at taking all of these various inputs and analyzing them very quickly.

Why do banks want to use the cloud for Big Data?

Banks use the cloud because of the rapid release of new or updated technology platforms. For example, just a few years ago, Hadoop was made up of several platforms. Now, Hadoop covers over 25 different platforms and technologies. But, Big Data isn't just Hadoop. It's NoSQL platforms, Spark, RDBMS and natural language processing, and machine learning that all need to be included in the predictive analytics goal.

The cloud helps you keep pace with changes in technology. It's hard to develop a private platform, get it up and running on your own infrastructure and then to have to update it often with new releases and technologies. It is also much easier to do QA, dev and test in the cloud. You can pick multiple platforms to do proof of concepts on, and easily iterate in the cloud. It’s just so much easier. Eventually, banks will move to hardened, on-site private solutions in conjunction with the cloud.

Is that because there is an unlimited amount of platform capabilities on the cloud that makes it so difficult to duplicate in-house?

The short answer is “yes.” Years ago, the early adopters of cloud and virtualization, as it was called at the time, were primarily interested in cost-savings and application agility. Cost reduction and TCO were early premises to cloud, along with bypassing the IT department.

QA, test and dev were the first applications in the cloud. No one began their cloud experience by putting production apps, which were safely running on their private systems, into the cloud. Instead, they dipped their toes in the water. They did QA, test and dev first. They did the odd application that the business demanded IT to move on quickly. Traditionally, it might take six months to get a server cluster and an operating system and a platform up and running, especially if you want to test performance at scale. The cloud simply gets them there more quickly. They’re fast to fail, and they can serve their business users far better than they could in a private environment.

That was cloud in the early days. What cloud turned into was not just an ROI model, but a model that allowed for much better business agility and the ability to buy services.

For Big Data, the cloud has the ability to instantly scale to accommodate new data streams, even in cases where they are many times the size of the original enterprise data.

What is the key reason for taking Big Data to the cloud?

In a word, “agility.” For example, should we be using Mongo or Cassandra for a database? There are arguments for either one, so why not try them both? Let's run them side by side and test them both. The cloud is perfect for testing those various systems and determining which one best suits the use case.

As cloud platforms have matured, IT has become comfortable with these technologies and the ability to use multiple platforms. The early security and governance concerns in the cloud have been addressed quite well by the leading vendors, and now everybody feels pretty comfortable with these platforms.

What types of use cases should banks consider first?

We certainly see the advent of predictive analytics use cases and testing new systems for integration into existing platforms, but there is opportunity for tremendous ROI by utilizing a new technology to do something for less cost than the old. A good example would be a searchable archive, where we’ve seen mainframe storage savings make up for a new platform that can become the basis for your big data platform.

Banks have a lot of data that they have to maintain for long periods of time. Thus they are always looking for a better, faster and cheaper ways to put data in databases that are easily searchable and archivable.

What is the biggest challenge banks have in creating Big Data applications?

It’s that banks have so many legacy systems. Going to the cloud is certainly desirable, but those legacy systems present a major challenge. They want to take the data from these different systems and put it in a place where they can perform analytics. They don't care that transaction processing is running on a mainframe or that they have mobile apps that track location or that they have RFID tags on credit cards. They just want all the data in one spot so that data analysts can do the type of analytics needed to meet the business objective of getting to that “market of one.”

What is the biggest challenge that banks face with legacy systems?

The biggest challenge is just integrating the data. Today data analysts spend 80% of their time integrating data and only 20% of their time analyzing it. That gives you an idea that data integration is really, really hard.

The vast majority of Fortune 500s use mainframe data to process transactions, such as ATM transactions, credit and debit card transactions, patient records or online sales. The disproportionate percentage of an enterprise's data resides on a mainframe which may be considered old school, but they process the vast majority of transactions.

What we see ourselves doing is bridging the gap between what the data analyst wants to do, which is predictive analytics, and what the system architect wants to do, which is to maintain control and governance over the data that they've been guarding for the past fifty years.

The data is in many different legacy databases, with different formats, structured or unstructured. The biggest challenge is how to get all of that data from their production operating environment to an environment that works for Big Data.

So how can you get all that data into the cloud?

Our paradigm is to integrate with and utilize the existing mainframe security models and processes, so that data moves to the cloud in a highly secure way. This is the solution for the mainframe IT leader who says, “Wait a minute, my production transaction data is occurring here. How can you possibly maintain security?” Well, the answer is to use the exact security that's on the mainframe to get the data over to the cloud.

What are the steps in taking Big Data to the cloud?

To me, the natural order to start using Big Data in the cloud is for more agile test and dev, QA, and proof of concepts. You can set up, replicate, and tear down environments very rapidly, and scale them for performance testing. If it doesn’t work at scale, then it isn’t ready for production.

Number two, try ROI and TCO-oriented use cases to gain budget momentum. Show money saved, and you’ll also gain a better understanding of the strengths of different platforms and which use cases are going to be best suited for your organization, and how to prioritize them. Different projects have different constraints, and may need different platforms.

You can test in the public cloud and then bring these projects to your in-house platforms, or you can employ a hybrid model.  Costs, governance, and agility will all be factors, but the cloud has matured to the point where security and reliability are not concerns.

That's what we’re seeing with our clients regarding Big Data and the cloud. In general, it's going to go through the same lifecycle as “infrastructure as a service,” I see that happening with analytics and Big Data.