Business users more often than not judge their tools by their looks. Sure, a nice, appealing, and user-friendly interface is great to have. However, when it comes to analytics, the user interface plays only a minor part in the complete solution equation. Data, stored in one or many databases is a cornerstone for any analytics and decision making. The magic happens below the water surface and the users only see the tip of the iceberg (user interface).
Most of the “dirty work”, such as architecture, data integration, data storage is not visible to the end users eyes. However, they are the groundwork for analytical solutions to perform to the best of their abilities.
Today I want to put to light some of the stuff we rarely talk about. It is in a world of its own and a very integral part of any analytical solution: the database. The basic requirements analytical databases must satisfy are: integrity, accuracy, availability, and performance. I will mainly address another one: capacity.
Mankind has been forever trying to properly store and use its data. The first system of/on record is the 30,000-year-old Ishango bone. It was most likely used by the first people to count days, the number of domestic animals or warriors. Data was written on the bones with a sharp rock (in order to make dashes). This prehistoric system is obviously so good that is still used today. We can see it in action in pubs, breweries, at card games, and everywhere else.
Of course, modern methods of storing the data are entirely different. Most business users don’t know where data is stored, and frankly, they have no interest in knowing that. Users only care about the experience – the data they seek or use should be accessible immediately and all the time, accurate and readily available.
The experience of business users that use analytical solutions is deteriorating, so they started to explore where the reasons. Usually, their first stop for query performance is the IT department. Complaints about data integrity, speed and inflexible systems are very common when users start self-service work. If the IT department does not meet those needs, business users will choose additional tools on their own, since they have to complete their work one way or another in a timely fashion.
Quo Vadis Analytical Solutions
Have you ever wondered what database solutions companies use today? Many companies are still using data warehouses on classic relational databases from 90’s (like Oracle, Microsoft SQL, IBM DB/2). These last century solutions are less suitable for direct use in the modern business analyitcs.
In this century databases have already experienced two (r)evolutions. Specialized analytical appliances appeared, followed by analytical databases. In recent years emerged analytical systems based on distributed data processing.
Currently there is an abundance of choice when it comes to database solutions on the market. This makes it difficult for companies to decide which suits their needs and requirements best. The latter are also constantly evolving which creates demand for even faster development. Such technology development made a big mess and the result is that most companies use very heterogeneous, siloed and often incompatible solutions in the workplace. Complex IT environments also have higher maintenance costs and are less competitive; implementing change is becoming a nightmare. But ask any businessman and you will learn that when it comes to ROI, they no longer measure decades but in months. Database and analytical solutions should follow suit.
Databases for Analytics
Let me quickly overview what kind of database systems can one find in enterprise data warehouses today:
1st generation: classical relational databases
The first generation of analytical solutions used classical relational databases such as Oracle, DB/2 (IBM) and SQL Server (Microsoft). These databases were mainly intended for use with transaction systems and stored data generated by business: customers, accounts, production. Since the specialized analytical systems were very expensive and rare 10 or more years ago, relational databases were also used by analysts for analytical purposes. Sure, these were not the ideal choice but data warehouses have tried to overcome their limitations with different optimization techniques. Soon it became clear that classical relational databases are simply not the right solution for analytics and specialized and dedicated analytical solutions took over.
2nd generation: analytical appliances
Right after Y2K, Netezza introduced a revolutionary design of an analytical appliance, completely shaking the market and established database vendors. Netezza’s solution was a bunch of low-cost MPP servers with fast interconnect. It originated in PostgreSQL in Linux environment. Voilà! The fastest, preconfigured and extremely cheap beast was created. It was more than 100x faster than conventional relational databases and it was superbly easy to use. In its time, it was kind of a technology miracle. I still remember one of our customer’s DBA saying: “It is against physics and impossible!”
Soon other manufacturers tried mimicking Netezza’s approach with more or less good copies. Oracle created Exadata, HP bought Vertica, EMC bought Greenplum, and Microsoft acquired DATAllegro. In 2010, Netezza ended up under IBM, as the IT giant caught the moving train at the last minute. With that they protected its host computer market. In the end, SAP joined this game with its HANA solution.
3rd generation: dedicated analytical software
The importance of big boxes was soon diminishing as the hardware prices steeply declined and capacity continued to grow rapidly. These conditions allowed for pure software solutions to appear – analytical databases that operated on any general hardware entered the market. This had various positive effects – database solutions became cheaper, the speed increased, and the standardization of hardware within the company was simplified. Some of said solutions are: HP Vertica, EMC GreenPlum, EXASOL, MongoDB.
4th generation: analytical solutions for big data
Despite the rapid development of data management solutions, the needs of companies to store and analyze data grew even faster. Research and development has recently focused its efforts on effective and efficient collection and analysis of massive amounts of data generated by various devices connected to the Internet. Companies that want to leverage data from internet of things, mobile devices, personal fitness trackers, home automation devices, company websites, social networks and e-commerce platforms need the so called big data solutions. The most effective solutions for managing big data have roots in HDFS based cluster systems. Modern HDFS based solutions are emerging monthly, together with new vendors, new concepts and solutions, which is also a challenge for companies and business users who have trouble deciding which is the right solution for their needs.
Some popular choices are: Cloudera, Hadoop/HBase, MonetDB, HPE Ezmeral…
5th generation: the virtual world
The rapid development of new platforms for storing large quantities of data also brings many challenges. Data transfer to analytical systems is slowing down, data requires a lot of duplicate space and comes to data warehouses with a delay. Storing and analyzing huge amounts of data is also associated with high costs. New systems have been developed to remedy these challenges – new solutions only connect various data sources at the logical level. Data virtualization platforms hide the complexity of the company’s data management solutions to end users. Business users do not even know that data is analyzed from multiple, physically disconnected databases.
Solving the Unsolvable
Solving the chaos of the data management systems is not a one-time job any more but has become a continuous process of improving performance and functionality of existing data systems and adding new components to the existing ones when needed. In order to be able to find optimal solutions and implement them successfully one will need several broadly educated internal IT resources that are not relying simply on the vendors or “independent” analysts information.
Are Research and Advisory Firms just Toothless Tigers?
Even experienced analysts at established research and advisory companies today struggle with analytics. It is such a fast-developing area that their findings and forecasts often contradict each other. Gartner, Ventana Research, Bloor, Howard Dresdner, BARC, BIScorecard, ZDnet, WIRED and others involved in market research usually have their own “versions of truth”. Their forecasts predicting future development and winners resemble a guessing game.
In order to limit the noise that comes from new, innovative and somewhat revolutionary companies market researchers are constantly raising the bar that determines who and what products are included in the research. Consequently many new and truly innovative companies and their products are left out of these charts. Therefore organizations seeking to gain impressive competitive advantage by using the latest and greatest solutions have to invest in their own research. On the other hand enterprises that base of their analytical technology purchases solely on market research and opinions of advisory firms will never use the latest and most up-to-date solutions and will therefore also be less competitive.
Purchasing second best product available is already a competitive disadvantage.
Take Over Control of Analytical Databases
IT departments face many tough but justified questions, dilemmas and concerns. Some of the most important and common ones sound like this: “Should we retire our 1st generation data warehouse database and replace it with new one?”
The answer depends on several factors. If you still use BI for static, corporate reporting and users do not perform extensive ad-hoc or advanced analytics than existing classic database is most likely still good enough.
Otherwise, particularly if you deal with large amounts of data and have already run into performance issues you should think about different solutions. Some businesses cannot afford to fall behind, today telecommunication providers, banks, insurance companies, internet sales, utilities and retail need specialised analytics databases that will work hand in hand with analytical solutions.
Can You Make In-House Technologies Work?
Industry standards make it possible to connect all devices quite easily with each other. This applies to all levels of analytics as well, be it data integration, metadata level, or analytical tools themselves. Modern tools are able to read data from virtually any electronic source link them together and present a consolidated result. Great example is MicroStrategy with its Intelligent Enterprise.
Should You Use One or More Platforms for Analytics?
Best of breed companies constantly seek new and competitive solutions, often test cutting-edge technologies and introduce them to the business environment. Most likely these companies use two or more generations of analytical platforms simultaneously. Meanwhile the more conservative companies are usually just replacing (one) older technology with another like Exasol.
What is Best for Your Business?
Once you get out of the comfort zone and start looking for the best analytical solutions available you can soon find one that fits your business. But it is of the utmost importance that you extend the research beyond traditional software and hardware vendors and stick to your own criteria when choosing a solution. Sure, this requires a lot of additional effort from the employees in order to gather the required knowledge. But once these become really capable of seeking out modern solutions on the market and taking into account the current and future needs of the company the decision on which ones to pick will be an easy one and supported by creation of additional business value.
Still, it is best to take a month or two or even half a year before you make a final decision and purchase. Be sure to throw everything at the POC (Proof of Concept) and make sure it can handle it. It is well worth the effort and beats living with a wrong decision for several years or even a decade.
All solutions work well for vendors
In any case, I advise you to educate yourself and your colleagues before you make any decision about investment in data and analytical infrastructure. Extend your search for potential solutions, seek information from several different sources. And most important, do not just simply (or blindly) rely on your current technology suppliers. All solutions work well for vendors – but you are after the one that works best for you and your data!
Check the fastest analytical database
Speed up Tableau with Exasol
Speed up Tableau with Exasol and say goodbye to slow performances forever. Tableau is a well-known data visualisation tool that has been around since 2003. It h...read more
Exasol is the fastest analytical database by TPC-H
Exasol is the fastest analytical database by TPC-H for 12 years in a row. It maintains its position as the undisputed leader for both raw performance and price-...read more