Bringing Hadoop to the Mainframe
Gigaom Research has published this research paper by Paul Miller. Click to request a free copy. Executive Summary According to market leader IBM, there is still plenty of work for mainframe computers to do. Indeed, the company frequently cites figures indicating that 60 percent or more of global enterprise transactions are currently undertaken on mainframes built by IBM and remaining competitors such as Bull, Fujitsu, Hitachi, and Unisys. The figures suggest that a wealth of data is stored and processed on these machines, but as businesses around the world increasingly turn to clusters of commodity servers running Hadoop to analyze the bulk of their data, the cost and time typically involved in extracting data from mainframe-based applications becomes a cause for concern. By finding more-effective ways to bring mainframe-hosted data and Hadoop-powered analysis closer together, the mainframe-using enterprise stands to benefit from both its existing investment in mainframe infrastructure and the speed and cost-effectiveness of modern data analytics, without necessarily resorting to relatively slow and resource-expensive extract transform load (ETL) processes to endlessly move data back and forth between discrete systems. Key findings include: Mainframes still account for 60 percent or more of global enterprise transactions. Traditional ETL processes can make it slow and expensive to move mainframe data into the commodity Hadoop clusters where enterprise data analytics processes are increasingly being run. In some cases, it may prove cost-effective to run specific Hadoop jobs on the mainframe itself. In other cases, advances in Hadoop’s stream-processing capabilities can offer a more cost-effective way to push mainframe data to a commodity Hadoop cluster than traditional ETL. The skills, outlook and attitudes of typical mainframe system administrators and typical data scientists are quite different, creating challenges for organizations wishing to encourage closer cooperation between the two groups. About The Author Paul Miller is an Analyst for Gigaom Research and a consultant, based in the East Yorkshire (UK) market town of Beverley, but working with clients worldwide. Paul helps clients understand the opportunities (and pitfalls) around cloud computing, big data, and open data, but also presents, podcasts and writes for a number of channels. His background includes public policy and standards roles, several years in senior management at a UK software company. About Gigaom Research Gigaom Research provides timely, in-depth analysis of emerging technologies for individual and corporate subscribers. Our network of 200+ independent analysts provides new content that bridges the gap between breaking news and long-range research. Free Download Request your free copy.