When we’re talking about conventional IT systems, we rarely question the idea of geo-distributed systems and redundancy. And we don’t usually challenge the notion that load balancing among servers and farms is a smart thing to do. So why don’t we routinely think this way about Hadoop?
Customers can set up multiple Hadoop clusters and use each one for a different workload. Companies can then site these clusters in different geographies, for redundancy, load balancing and/or content distribution. The data can be segregated or, using replication technology, it can be synchronized between sites to create a “logical data lake.” Is utilizing multiple Hadoop clusters in this way is folly, or is it just pragmatism?
In this webinar, our panel will discuss:
- Does Apache YARN make all tasks equal or does dedicating clusters to specific workloads make more sense?
- Is the data lake concept best for all, or is partitioning…
View original post 96 mots de plus