The Evolution of Digital Infrastructure
The data center is a very physical place. The majority of the work in design, construction, engineering and operations is hands-on. It requires engineers and technicians to apply their experience and know-how to build and operate digital infrastructure systems that serve the world. For example, data center mechanical and electrical equipment, IT hardware, and network devices require technicians and engineers to install, monitor, and execute planned and unplanned maintenance for this equipment. For the past 20+ years the approach has been relatively the same. There are people on the floor doing the work. There are processes and systems in place that track the builds, changes and break-fix work once they are in place. Many see this traditional approach as the teams clinging to legacy thinking and not being willing to embrace new ways of doing things. I disagree with that assessment. The owners of these colo and/or owned data centers have optimized these processes enabling a much higher ratio of work completed per person. That is through standardization of components to enable modularity and consistent deployments coupled with monitoring and automation to enable scale. The days of custom designs per location are over. Companies cannot afford to continue to have one-off deployments in their portfolio. The question is what’s next? How do we take it to the next level of efficiency and scale?
There are many projects underway with robots and other actuated solutions that could ultimately replace these physical human functions on our way towards fully autonomous digital infrastructure. While I believe that will ultimately be the answer for the majority of locations, it will take decades to perfect and reach an effective price point to justify it.
In the meantime, there is still pressure to increase efficiency and output. This comes down to increasing the ratio of work completed per person. For example, in my previous roles at Ebay and Uber, my teams were able to manage >2,500 devices per technician, per site. That was up 10-fold from previous efforts due to the solutions outlined above. Some of my hyperscale peers have reached 5-10x that amount in their larger portfolios. As I mentioned above, this is achieved through standardization and modularity. We’ve commoditized racks and supporting power and cooling capacity enabling them to be built more on-demand versus all at once. Bottom line, decoupling hardware dependencies from the shared platforms has enabled each rack to become a failure domain. If a node or an entire rack fails, it’s ok. The failed components can wait in a queue that is serviced during planned maintenance schedules. That node or that rack is no longer a critical factor to the performance of the overall system. Other nodes will fill in to cover the capacity reduction from failures. As portfolios grow, there is more capacity to absorb failures like this.
This seems like a highly optimized deployment but there is so much more we can do; another level we can achieve. These efficiency ratios should continue to go up by a factor of 10 or more. To make progress towards our goal of autonomous digital infrastructure, we have to embrace more than robotic process automation (RPA). We need to augment physical labor with cognitive digital labor. Ok, I know what you’re thinking. This sounds like pie in the sky stuff that has no practical application in today’s data centers. That’s where you would be wrong. There are solutions that can reach that level today.
For the last five years I have been advising Amelia(formerly IPsoft) that has built the most human AI for the enterprise. Ok, I can see you furrowing your brow again asking what that actually means. It means real AI is here now and can be applied to data center infrastructure. Let me double click into this to provide some clarity.
Cognitive digital labor are digital colleagues that operate like humans. They can understand the intent of requests or conversations like humans do. When a human receives a request in a ticket, an email, chat, slack or over the phone, they break down the words to understand the intent from the request to be able to appropriately respond. They can interpret the language to get to the answer. That is the difference between a chat bot and cognitive digital labor. Amelia is a Digital Colleague that uses advanced natural language processing to interpret the request and respond appropriately versus an “if, then, else” chatbot loop or simple robotic process automation. She does this while being able to context switch just like a human. She can bounce between topics, understanding intent and get resolution without losing track of all parallel activities. This means Amelia is able to service millions of requests a month and learn as she goes. That is what a human would do but nowhere near the scale, efficiency or accuracy of Amelia. Amelia is orders of magnitude more effective than humans at serving these requests.
Amelia has been effectively applied to banking, telecommunications, hospitality and many more industry verticals. For example, Amelia is a Conversational AI agent for a telco provider averaging more than 7 million conversations per month. This is done globally in German, English, and Spanish achieving 93% resolution rate of inbound requests and 91% satisfaction with 9/10 customers rating Amelia’s service as good to very good. This is what the next level of efficiency looks like for data centers. Digital Colleagues that can augment the workforce and execute at scale globally. Just to put this in perspective, today you can go to Amelia.ai to test and then download skilled Digital Colleagues like an app from the app store. Amelia is an expert in platforms like Workday, ServiceNow, AWS, Oracle, and more.
The big question for our industry is how do we apply this advancement to digital infrastructure? Let’s look at some real-world examples I’ve been thinking about.
People and automated systems create work orders to complete a task in the data center. That can be a configuration change to a network or compute or storage device or break fix of a node or rack. There are thousands of tasks that happen every day in data centers. In the majority of these ticket systems a person or an automated system adds tickets to the queue. Those tickets usually have to wait for a human to complete a task or approve the ticket to move them to the next step. Many tickets get stuck in the queue or missed due to other priority tickets or just human error. I’ve seen this at every job I’ve held. Submitters are frustrated at the delay or misinterpretation or mistake in the ticket and escalate to management to get it resolved or just push the priority above other tickets. All of this causes extra work and delays decreasing efficiency.
What if Digital Colleagues not only monitored the queues but actually interpreted the ticket requirements, interfaced with the submitter, took the ticket through the process, validated the appropriate steps and executed the request? Amelia could remove the traditional delays in the queue and scale immediately managing growing requests. This is not new. This efficiency has already been applied to IT stacks across many other industry verticals. It’s time for it to be applied to the data center.
Change Review Board
One of the most critical processes in a data center is change control. This process ensures that infrastructure changes follow policy and have checks and balances in place to verify that a change won’t cause unintended consequences like cascading faults. This requires review by humans to verify the steps before deployment, and buddy checks during the execution of that change to minimize mistakes. Like many of you, I’ve experienced outages even with these processes in place. When we count on humans, there will be mistakes. But we do need their experienced eye on things to spot risks. They do this by understanding the environment and applying their expertise to validate the change is safe to make. The issue is humans are inconsistent. They can easily miss things by rushing, being distracted or fatigued. Bottom line, humans are the primary reason for outages in the data center because they are, well, human!
What if you could have digital labor review the changes, validate the content against standard procedures and do mock runs of the change? Even if this doesn’t apply to physical work such as maintenance on switchgear, generator tests or physical reconfigurations of network devices, having a Digital Colleague that has the cognitive ability to interpret the change against known configurations, validate them across any other maintenance work or changes being applied in real time and identify risks that could easily be missed by humans, would be a huge leap in efficiency and uptime. Digital labor will have contextual awareness of the current state, planned changes and validate procedures that will be applied versus having a manual review or just running scripts to apply a change. Digital labor can operate like a human but never miss steps or get fatigued. These colleagues can also learn as they go, suggesting changes to increase efficiency or decrease risk.
The total addressable market (TAM) for digital labor is expected to grow to $2.9 trillion by the end of 2021 recovering 6.2 billion hours of worker productivity. The downside is that digital labor will replace over 52% of human work tasks by 2025. The upside is digital labor will create > 130 million new roles that are adapted to the new design of labor between humans, machines and algorithms. This is the reality of technology advancement. It is no different than previous advancements such as the steam engine, the printing press or factory automation. Jobs will change as technology advances, but history has shown that new opportunities will emerge from these advancements. Digital infrastructure is no different. We need to help current and future displaced workers to be retrained to take advantage of the new roles that are born from digital labor deployments.
For those of you who know me, I am always curious about what’s next. I believe that digital labor will be disruptive but also extremely effective at getting us to the next level. For our industry to embrace this new technology, we need to work together to define and develop how it will be applied. I see this as a must have since we are all struggling to serve the explosive growth of digital infrastructure. This is where Infrastructure Masons steps in. Collaboration is one of the core principles in the founding of iMasons. We are a professional association that is focusing on uniting the builders of the digital age. This aligns with the iMasons technology committee vision–every innovation is realized. The technology committee is focused on harnessing iMasons’ collective experience and resources to discover, guide and accelerate digital infrastructure innovation.
To that end, we will be holding multiple working sessions to dive into digital labor technology and define how it can be applied to data center infrastructure. We need your expertise and insights to help us get there.
Our first session will be at the iMasons New York Local Chapter meeting on September 30, 2021. Many thanks to my long-term industry friend Robert Dugdale for helping to establish the NY City chapter and hosting this kickoff event at Amelia headquarters. I also want to thank Chetan Dube, Founder and CEO of Amelia, for participation at our Global Member Summit in November of 2020. Our discussion sparked the interest in our community to explore this new innovation.
We have an amazing cohort of end users and partner members coming together in person and online to brainstorm on what processes Amelia could be applied to and the roles that will need to be created for these Digital Colleagues.
If you would like to be part of this journey of discovery, invention and innovation, become an iMasons member at https://imasons.org/join. If you’re already a member, join the technology group through our member platform and join us at the NY Local Chapter event on September 30, 2021. You can also learn more about Amelia by listening to my interview with Josh Schechter on episode 43 of the Next Wave Podcast at https://nextwavepodcast.com or https://amelia.ai/podcasts.