The Big Data ecosystem has become bigger and better due to all the big tech companies creating many of their in house software projects as open source projects through the Apache Software Foundation. Bonus points for them! When they decide to scale, people will be more familiar with the software.
Every other day, there is new software available in the market and we have a lot to choose from in the pool of software in order to build the desired product. Okay, which is the right one to choose and build my product? Well, you may want to take the following into consideration when making this decision.
Building blocks for your enterprise application:
When an enterprise level data product is under construction, there are many moving parts and decision makers. Stakeholders, decision makers and development teams should work in harmony to make the product a success. So, what is the role each one plays during the decision process and product delivery.
Role of a Director or VP - Big Data:
As a Director or VP of Big Data, the responsibility on your shoulders increase when selecting the right team and products.
From forming the team to delivery, you are in too many knowns and unknowns like in any other application development project. But this time, the elephant is bigger and taming the elephant with a lot of unknowns is key to success of the project!
Here are some factors that can be considered when choosing a team, product and services:
Support from Open Source Community - Does the Apache project have many contributors? How is it built (incubator)? What is the road map? Is the product "buggy" with a lot of issues (jira or bugzilla).
- Cross Train the existing resources who know the application, company process and policy. All third party vendors mentioned above provide corporate training at your office or negotiate them to do so.
Training Camps - Get instant access to a pool of resources who are already trained - Insight Data Science, Insight Data Engineering, Zipfian Academy, NYC Data Science Academy, Data Incubators and many others not mentioned here.
Hiring - Availability of Resources in the region. This can be easily performed with a bit of research in LinkedIn.
Attend a conference or two - Attend one or two conferences related to Big Data. See what people are working on, their environments, the day to day work and challenges faced.
- Get mentored - Find somebody who had already implemented a Big Data project within the company, in the region or in your known list. Ask them to be a mentor.
Support the Team - Be ready to start from scratch.
Role of an Enterprise Architect and Solutions Architect:
As an Enterprise Architect or a Solutions Architect, you need to pick the right tools for the enterprise application that you are building and not what tools you already know. It requires some unlearning and learning.
From data ingestion to visualization, there are too many solutions that you could propose to build for the data pipeline as well as the analytics.
Deciding and designing the building blocks of the data pipeline is the key!
Here are few things that can be considered in deciding what goes into your data pipeline:
Data Ingestion - What is the kind of data that needs to be ingested? Is the ingestion going to happen from multiple source systems? The number of messages to be handled (How about a trillion messages a day?).
Architecture - Batch processing or Real time processing or combination of both (Lambda Architecture).
Data Storage - Mainly the database layer. Get to know the CAP theorem! What does the application need? You cannot get a CAP solution - you get a CA, CP or AP.
Visualization / Application - Is it used for generating reports and sending to executives or an application that is going to use the data? Does the Data Scientists and Analysts have a particular need?
"ity" Qualities - Reliability, Stability, Scalability, High Availability and the "-ities" that the product needs.
Defend your selection of the tools with a strong backing powered by data.