It’s time to re-prioritize analytics, Part 1: The unintended consequences of ignoring analytics infrastructure
In 2008, Airbus developed a quieter airplane. However, the planes soon had to be recalled, as the quieter cabins had the unintended consequence of making other noises more noticeable, including bathroom sounds, passenger conversations, coughing, sneezing and crying. This resulted in a less enjoyable experience for both the passengers and crew. The planes then had to be re-engineered to add more noise back into the cabin.
Like Airbus, many organizations are facing the unintended consequences of focusing on infrastructure for transaction processing optimization while overlooking infrastructure for analytics. Businesses are playing catch-up to meet the data needs of the modern enterprise.
Organizations are in the early phases of the next leap in demand for data analysis: cognitive computing, machine learning, IoT, personalization and fraud detection. Enterprises are suffering from years of under-investment in the systems needed to support these efforts. IoT alone is expected to add approximately $5.6 trillion to the global economy between 2014 and 2019, with $2.4 attributed to the enterprise industry. But, only seven percent of executives that expect the IoT to have a big impact on industries in the next three to five years said they have a clear vision for their IoT strategy and have started implementing it, according to BPI Network.
Workloads have long suffered from neglect by system designers and platform architects. For decades, they prioritized online transaction processing over the needs of business analysis. Then, when they decided to invest in infrastructure for analytics, they were stifled by the outrageous costs of traditional relational databases and the pathetic performance of hard disk-based storage. Starting in the 2000s, necessity was the mother of invention, and enterprises started building new solutions to work around these problems. Now, it’s over 15 years later, and analytics has come a long way. However, it’s not done yet.
Neglect and abuse
When businesses invest, they invest in online transaction processing first. In many data centers, the first and sometimes last question is whether a data analysis process will slow down, destabilize or risk production transaction processing. Even today, it’s not unusual for the process to run against a backup copy of data taken during a nightly batch process. It’s easy to see this neglect affects everything from computer hardware to investment in the people needed to do insightful work.
This neglect was in part a function of the times. For decades, the agenda was about what was going on inside the company — capturing business transactions, keeping accounting ledgers updated and paying employees. The neglect was also about cost. To capture and make readily available longitudinal data on customer buying, you need more data storage. You also need to pay more for database licenses.
As data warehousing and data mining came into vogue in the late 1990s, these solutions were stifled by the same technology problems that were gating online transaction processing, notably slow storage.
Necessity is the mother of invention
Pain creates opportunities. The seeds of innovation needed to power what you think of now as your next-generation workloads were planted in the mid-2000s with the invention and subsequent productization of NoSQL databases, document stores, in-memory databases, MapReduce and all-flash arrays.
Necessity is the mother of invention, but different people and companies saw different necessities. Companies such as Facebook and Google were among the first to try to solve the scale and cost problem with the invention of NoSQL solutions such as Dynamo, Cassandra (2008) and Bigtable (respectively). Hadoop and HDFS, designed from the start to enable massive-scale data analysis with commodity components, launched in 2006 and was an outgrowth of research done at Google.
Others saw that I/O bottlenecks — where a system does not have fast enough input/output performance — were at the core of their performance pain, leading initially to solutions with RAM disks and then to in-memory databases. One of the earliest in-memory databases included TimesTen, which was spun out of HP in 1996 and then acquired in 2005. IBM DB2 Blu, another in-memory database, is an outcome of projects started in 2006. SAP HANA, one of the biggest commercial successes due to the integration of key SAP application features and an in-memory database, launched in November 2010.
All-flash arrays, first commercially available in 2007, were a response to the I/O bottlenecks and high latency of HDD-based storage systems and a massive cost improvement when compared to available RAM-based storage systems.
Some needed relief
This slew of innovations in the 2000s continues to make implementing data analysis easier, less expensive and often higher-performing. Key-value store databases and Hadoop have lowered the cost for large-scale workloads. In-memory databases are outrageously fast for read-intensive, smaller data sets. All-flash arrays are used to accelerate analytics running in traditional database environments and are slower than RAM but faster than hard-disk tiers in many NoSQL solutions. They are used to store write transaction logs for in-memory NoSQL and SQL databases to accelerate writes and reduce recovery times from server failures. They are finally helping to provide some performance boosts for Hadoop environments.
As customers have adopted these technologies and mixed them together liberally, it’s worth noting there are still some gaps. Hadoop is great for its low cost, but it’s not very fast. What can you do to bypass some of the software layers that are slowing it down? In-memory databases are fast but have scale, availability and cost issues. What are the options for improving the scalability of in-memory databases? Until recently, flash solutions did not offer enough capacity and were too expensive for widespread adoption at a massive scale. What changes will make flash more impactful to analytics?
The new solutions and technologies that were first released in the mid-2000s have taken some of the pain out, and they hit the market maturity just in time. Of the mid-2000s-era technologies, the one with much room to grow is faster and cheaper flash storage technology.
The next article in this series will go over changes to flash storage that enable the next generation of data workloads.