Our Bet on Human-in-the-Loop Automation

One of today’s hottest topics is AI’s impact on our daily lives and technology. Many people are asking: where do we stand with respect to the technology we have developed? Years before the public discourse got heated around AI, at Plus One Robotics we embraced the complementary nature of the work that can be shared between humans and advanced robotic systems. We bet heavily on the idea that human input will be crucial in key moments for system operations. We chose to invest in developing our own implementation of Human-in-the-Loop Autonomation, and through it, we have been generating value with our parcel sorting automation and depalletization applications.

In this article, I discuss why I believe our bet has paid off so far – and refute some of the most common criticisms we encounter of the concept. I also touch upon why we think it still is a valid approach for the foreseeable future. I believe that the variety of the instances we see deployed today proves the commitment of the robotics field to Human-in-the-Loop Autonomation.  

Defining Human-in-the-Loop Automation

We can begin by stating the obvious: most autonomous systems, if not all, have a human component to them (this is recognized by prolific experts in the field, as stated clearly by Rod Brooks [1]). Human involvement varies in degrees of control, timing, and decision-making power. Some successful examples are commercial airplanes’ autopilots with on-board pilots, warehouse robots that depend on field operators and more recently self-driving taxis with remote assistance operators.

On a scale of full teleoperation to full autonomy, the factors below distinguish what we define as “human-in-the-loop” from the alternatives:

  • What data is sent to a person and how frequently – whether streamed live or made visible after the fact – to a person (e.g. images, audio)
  • What data a person sends back to the system and how frequently (e.g., interrupts or control input)
  • When the data is generated (e.g., live / “online” or logs / “offline”)
  • What the system does while a person is in the process of responding (e.g. the system halts and waits, hovers over a target, gracefully-exits and restarts etc.)
  • How quickly does a person have to provide an input to the system (e.g., response time)
  • At any given time, how many systems can a single person assist sequentially or in parallel

In addition to the above, another consideration is whether the person in the loop is fully dedicated to this process or if they have other duties they will need to pause / stop to support a distressed system.

Details of the above often depend on the requirements of a particular product and there isn’t a magic bullet for solving the challenges that are associated. Although there are off-the-shelf solutions that can be customized to specific needs, we believe that embedding a field-iterated implementation with a ground-up design is a part of the secret sauce that ensures the success of deployment as opposed to trying to address the challenges after a fully automated system is deployed. Effective integration means stitching in the human-in-the-loop capability at various points in the software stack. Trying to add a 3rd party supervisory package afterwards won't be as effective.

Bet Early, Bet Often

Standardized, generic robotic products, when deployed across multiple instances, will quickly encounter a variety of situations for which they are not tested or even sometimes designed. This is true even for applications that take place in relatively “constrained” environments, such as warehouses. An example of a challenge in logistics is what is known as “entitlement” - the variety of parcels and material “input” the systems must process.  

Due to many factors, it is nearly impossible to fully describe an exhaustive list of combinations that constitute the entitlement of a system. Even in a relatively simple system, that only has to handle boxes, there are simply too many variables (dimensions, color, presence of tape or labels, weight, etc.) to allow for a complete specification of all possible inputs.  Often, this specification is made even harder, since the control of these features can be spread across multiple entities.  That is, the manufacturer controls the color and shape of the box, but the shipping company controls the label.  Entitlement can then be seen as a measure of confidence in a system's ability to handle a particular package, and other packages 'similar' to it.  Under this view and considering the sheer number of different packages that must be handled, it is only a matter of time before the system interacts with something that it cannot handle (a disentitled parcel) and requires outside assistance. That is, even in the relatively constrained environment of a warehouse, it is cost-prohibitive to provide a robotic solution that is designed to work through countless unforeseen challenges 24/7, non-stop, without any human involvement.

I hope that at this point, I can convince you that, on a fundamental level, this is an entropy problem: The level of uncertainty we would like the system to handle depends on the distribution of the system’s possible responses to all of the possible inputs. Systems that have any level of uncertain input need to be increasingly sophisticated to stay operational for expected durations (e.g., several hours between expected human interactions).  

Because of all of the points above, human in-the-loop autonomation is an essential component of any advanced commercial robotic system. I believe that it simply is a matter of where and who that human is that will interact with the robot (for instance, when your Roomba’s brushes get stuck because of tangled hair, you may be the human-in-the-loop). This is a natural step in the evolution of robotic systems that are transitioning from highly constrained and isolated domains into interactive and dynamic environments with uncertainty.

To put this in perspective, we calculated that the cost of 300,000 local interventions in a single warehouse is about $15,000,000, compared to $750,000 with a human in-the-loop intervention system (that is just 5% of the local interventions’ cost). With a response time of 2-to-5 seconds human-in-the-loop operators can remotely correct issues we encountered more than 98% of the time reducing the local interventions, stopping the robot for just 5-10 seconds per intervention depending on the application. To illustrate this, we sampled 60 robots from five of our customers, we remotely intervened on 3.8 million picks out of 250 million total items picked saving them nearly $5,000,000 in 2023 alone. Keeping in mind that these numbers apply to the logistics domain, they may not translate directly in terms of dollars to other applications as the scales might be different, but we would expect to see similar proportions of cost savings.

We have thus learned by experience that autonomous robotic systems that embrace a human-in-the-loop component are better positioned to provide higher product reliability, lower system outage frequency/duration and targeted continuous improvements. This is why I believe our bet has paid off so far, not just for us at Plus One Robotics but more importantly for our customers. Even so, we’ve encountered some opposition along the way.

3 Most Common Criticism for Human in The Loop Autonomy

1. The human who is in the loop is a crutch for underperforming machine learning models.

From a bystander’s perspective the human-in-the-loop seems to “make up” for some decision-making capacity deficiency that seems to exist in a product. The robot can’t seem to do the right thing for one reason or another, and the observer may think “there comes the person to save the day”. For some situations, this might be an accurate observation, however most of the time it falls short of being the truth. In most cases, there are simply too many factors that keep piling up on top of each other that cause a system failure as opposed to a single point of failure (in the case of this criticism, the machine learning (ML) model). Reinforcement Learning is one of subfields of artificial intelligence that focuses on automatically “distributing credit for success among many decisions” a system may have made [13]. Problems of this kind  can be difficult for systems to resolve immediately, however a human-in-the-loop with the right expertise may be able to identify the right root cause of the observed problem and not only address it in a timely manner, but also provide labeling / data log for the machine learning system to use it in the future for system improvement.

Another factor further complicating the challenge is the frequency of occurrence of some of the problematic situations that need to be observed in order for most algorithms to start detecting a pattern of failure. The inputs are in some cases “unlearnable” within the timeframe to meet product requirements. A good analogy for this in statistics is heavy-tailed, more specifically, long-tailed distributions (e.g. Pareto distribution). Think of all possible definitions of an object that can make its way in front of a robot for a pick-and-place task; some objects may be more difficult to pick and place but a robot may only encounter them every so often, sometimes operating for days without encountering one. Examples from one of our systems were actual trash caused by packaging falling apart, and long pieces of rolls of paper tangled with parcels on a conveyor belt. Another example is the seasonal changes in product packaging; packages may get bigger end-of-year holiday time or package designs may reflect holiday colors and patterns.

We know that human-in-the-loop leads to superior outcomes, and we believe that product designers who avoid it are ultimately going to box themselves in. When there is increased process complexity and uncertainty, our experience shows human-in-the-loop automation is the best possible insurance against downtime.

2. Isn’t “online machine learning” a better solution than human-in-the-loop autonomy?

This argument assumes that online machine learning and human-in-the-loop autonomy are functionally equal, which is not true. It is, however, understandable why customers and end-users may think this way. After all, if the person in the loop is providing data, and that data is being used for system improvements (one way or another), why not use a better machine learning algorithm that can improve itself and do away with the person in the loop?  

There are two major issues with this point. Applications that need improvements fast often provide the most adversarial environments and demanding system requirements. State-of-the-art self-supervised machine learning algorithms that employ “online learning” make many mistakes in the process and those errors are not welcome in production, which creates a Catch-22-like paradox. Some examples are various flavors of reinforcement learning, generative AI or even more fundamental approaches such as hierarchical clustering, that are based on heuristics, and parameters. The output could be faulty for different reasons and those mistakes, if they go unchecked, can make a bad situation worse in production.

Examples from our domain are situations where the rate of change in parcel mix is greater than affordable machine learning model updates’ ability to keep up. This is a problem related to the distribution of parcels I mentioned above. In these cases, the distribution’s parameters can shift over time and there will always be a lag between the first time a system sees a new package from this new distribution of parcels and when the system can tolerate it. Human-in-the-loop is how you assure uptime during that interval.

The second issue is that when a person provides input during an intervention for future reuse, that input is not always used only for machine learning model updates. There are many things to consider all the way from mechanical design to system kinematics and dynamics, that, unless all components of the system (down to firmware) are getting updates using machine learning models (plausible in the future but highly doubtful in today’s production environment), self or semi-supervised learning methods will simply not be able to provide enough coverage for continuous system improvements.  

Not every production issue is due to machine learning failure. Unless everything can adapt, there will always be some aspects that a human operator might need to “work around”. Examples of this type of situations, are hard to detect mechanical failures that affect the behavior of the software. A suction cup that is in the process of failing may cause repetitive pick failures that might be obvious to an operator, who in turn, might be able to provide usable input to keep the system performing reasonably. A machine learning algorithm that does not understand the causal effect of suction cup wear and tear may not be able to compensate for the pick failures observed in a timely manner.

3. Why not use the local operator to respond to faults?

This point is valid in some limited context under some special circumstances: if there were only one system and a local operator who is available to respond to faults most of the time, and the only issue expected to arise is system faults, then this point is valid. This is a typical scenario in many industrial applications where there is an enclosed cell and local operators are responsible for the cells’ operations. It is understandable similar expectations would be transferred onto newer next generation systems. In many cases, there are several systems (from single digits to triple digits depending on the type of systems we are referring to), and the real questions are what the frequency of local intervention is, and what the ratio of local people to robot cells are. For systems where there are hundreds of AMRs expected to operate smoothly, both the frequency of intervention and local people-to-robot ratio will stop making sense for local operators to be the sole first-responders to exceptions. Another factor is the sheer size of distribution centers and warehouses. In a 500,000 sq ft warehouse, the travel time for an operator to get to a system that has paused can be non-negligible.  Remote operators can respond faster, helping with uptime and throughput.  

Despite our advocacy for remote human-in-the-loop operators, there are technical bounds as to how many (and what type of) issues a remote operator can address and there will almost always be a need for local operators to address some limited number of exceptions and faults.

Examples of Human-in-the-Loop Autonomy in Industry

Early examples of human-in-the-loop autonomy have their roots in academia for developing machine learning algorithms that can take advantage of contextual information provided by expert humans [6, 7, 8, 10]. More recently the scale of contextual data collected from humans has reached unprecedented levels.  Many research groups in the field collaborate through government and private funding to innovate and push the boundaries of what robots can do by leveraging task demonstration data [5, 9]. Meanwhile, the culture shift based on lessons learned from academia made its way into commercial applications of robotics and we started seeing well-polished examples of remote assistance solutions: Waymo Fleet Response [3], Plus One Robotics Yonder [11] and Contoro Robotics’ InteleOp [2] are a few that are worth mentioning.  

In addition to all of the above developments and releases, particularly this year many humanoid robot makers revealed their products working in real world environments. It may not be immediately clear what parts of their videos involved full autonomy versus human-in-the-loop autonomy, however a deeper look in careers pages in various tech companies may give us hints as to how companies are attempting to solve their most common technical problems [4, 12].

Why We Will Continue to Bet on Human-in-the-Loop Autonomy

Human-in-the-loop autonomy systems may seem as if they consist of entirely separate components of people + robots + data processing power. However, a system designed with human-in-the-loop automation in mind outputs greater KPIs than the sum of each component's KPI.

With the rate of technological advancements in edge computing, connectivity, machine learning and robotics, human-in-the-loop automation provides fast and superior return of investment. It is unrealistic to expect a fully self-enclosed “offline” system to handle all uncertainty while meeting all operational requirements in the wild. A lot of the uncertainty is caused by things that are outside the control of the system or even a single party involved. Some applications can benefit from generalization of unexpected observations (e.g. treat all objects as obstacles while driving and slow down or stop no matter what the object is). Other applications may not be able to tolerate similar generalizations. For problems caused by humans, it makes sense they are a part of the solution as well, especially when timing requirements are strict.  

It is worth thinking about the cost of not having human-in-the-loop automation in terms of potential system downtime, slower improvement iterations and unmet system requirements. We know that even the best local human operators often solve technical problems through information exchange and reaching out to others when in doubt. A human-in-the-loop system recognizes and embraces this process, streamlining exception handling. For a fraction of the cost, it generates value for everyone involved with a wealth of data points. This is why we will continue to bet on human-in-the-loop autonomy.

References

[1] https://rodneybrooks.com/rodney-brooks-three-laws-of-artificial-intelligence/

[2] https://contoro.com/technology/

[3] https://waymo.com/blog/2024/05/fleet-response/

[4] https://job-boards.greenhouse.io/figureai/jobs/4289513006

[5] https://www.physicalintelligence.company/blog/pi0?blog

[6] Robot Learning from Human Teachers, Chernova and Thomaz, 2022

[7] Remote Robotic Laboratories for Learning from Demonstration, Osentoski et al., 2012

[8] Effect of human guidance and state space size on Interactive Reinforcement Learning, Suay, Chernova, 2011

[9] https://robotics-transformer-x.github.io/

[10] Robot web tools: Efficient messaging for cloud robotics, Toris et al., 2015

[11] https://www.plusonerobotics.com/human-in-the-loop

[12] https://www.tesla.com/careers/search/?query=Tesla%20bot&site=US

[13] Reinforcement Learning, An Introduction, Second Edition, Sutton and Barto, 2018