Over the last 25 years, McCrory led teams at GE Digital, Basho Technologies, Warner Music Group and others. He also co-founded Hyper9 (acquired by SolarWinds) and Surgient (acquired by Quest Software). McCrory holds more than nine technology patents in virtualization, cloud and systems management and created the concept of data gravity.
How did you come up with the idea of data gravity?
I started working on virtualization almost two decades ago. I realized its potential when I was at Surgient. We filed a number of patents, including the first patent on cloud computing—a logical virtualized server cloud. From 2000-2010 there was significant growth of the cloud. By 2010, it had started to take off. Back then, I worked for Dell Data Center Solutions group. While evaluating cloud providers, I noticed the data was growing significantly. As the data grew, it attracted both services and applications closer to the data. That reminded me of gravity and led me to write a blog post about ‘data gravity in the clouds’. There was a virtuous cycle of data—the more data you have, the more data you create, and applications wanted to be closer to the data. The reason for that was applications (being closer to the data) gained access to higher bandwidth and lower latencies. With this rationalization, I coined this term data gravity to apply to the concept.
What is data gravity’s role in cloud computing?
Cloud providers understand the importance of data gravity and its effects. Initially at least, it is usually less expensive for enterprises to store their data in cloud platforms. The cloud providers realize that enterprises are going to want leverage these same platforms to then run their applications, do analytics and have the partners access and use this data. All of that means that increased amounts of data is going to be attracted to their cloud. Therefore, the cloud providers benefit from the effects of data gravity, and they build ecosystems leveraging those effects.
How do you take advantage of data gravity in a colocation environment?
When it comes to data, it is important to understand what is creating and interacting with it, plus where it gets stored. Ideally, data analytics and data processing should be done where the data resides. However, if your consumers are distributed, then you need to also distribute the data as rapidly as possible.
In case of an enterprise, when working with large amounts of data, you find ways to work with that data in the most efficient ways. Some enterprises work with all of their data in one place. If the data is not being consumed all in that one place, then the data architecture needs to change so that data is at the center. The emphasis is on a data-centric architecture instead of backhaul models, where processing is at the center. If there is a reason you need to move the data, then you look for the best ways and locations. Those locations include the highly connected facilities where your company and business partners come together to exchange data in a low-latency, high-bandwidth environment.
What is the role of data gravity at the edge?
Data processing happens at the core. The emphasis is on the low-latency delivery of applications. Edge is important for faster delivery of data. However, the edge has its limitations. You neither have infinite storage, nor do you have infinite bandwidth, at the edge. You may have edge locations globally, but each location is not going to communicate with all the others. Therefore, the data has to go to either an intermediate or core facility to get aggregated. In each stage, data gravity has its effects. At the edge, you are bound by either physical limitations with its location, network bandwidth or latency. It is important to get the data in one place, so that location has high amount of gravity. Then, this location attracts partners and customers to connect to the facility, which enables them to access the data quickly and easily.