Multi agent reinforcement learning based warehouse task assignment
Loading...
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The growth of e-commerce and the demand for rapid fulfillment place a high pressure on modern warehouses to optimize efficiency. Central to this problem is the dynamic allocation of different tasks to a workforce comprised of both humans and robots. The task allocation system must prioritize time-sensitive picking tasks and ensure the timely completion of crucial supporting tasks like replenishment, all while considering the availability of specialized devices required for different task types. Conventional rule-based or heuristic systems struggle to adapt and optimize decision-making within such a complex operating environment.
Recent literature shows interest in applying reinforcement learning for warehouse management, including task allocation. Reinforcement Learning algorithms are well-suited to environments with clear rewards and observable states. However, most existing work focuses on single-agent scenarios or limited aspects of warehouse operations. In contrast, real-world warehouse task allocation inherently involves a cooperative multi-agent setting.
This thesis adopts a Multi-Agent Reinforcement Learning (MARL) framework to address the challenge of optimizing task assignment within a warehouse environment characterized by a workforce comprised of both human workers and autonomous robots. A central challenge addressed in this framework is the need to prioritize and allocate tasks with varying levels of urgency while considering the distinct constraints and capabilities associated with different worker types. To capture the dynamic nature and interdependencies of warehouse tasks, a comprehensive representation is used. Each task is defined by a set of attributes including its type (e.g., 'pick', 'replenish', 'load', or 'put'), the specific product or Stock Keeping Unit (SKU) involved, quantity, source and destination locations, timestamp, and order association. Notably, pick tasks are directly linked to specific customer orders for prioritization, while replenishment tasks may be indirectly related if they serve to stock locations for picking operations. Additionally, a 'status' attribute indicates whether the task is 'available', 'active', or 'completed'. Devices essential for task completion are modeled with attributes indicating their type (e.g., forklifts and pallet jacks), availability status, and any currently assigned tasks. The MARL system leverages agents that learn effective decision-making based on the current state of the warehouse. This state information is composed of two key elements provided as input to each agent's Deep Q-Network (DQN). The first element is the task state, which includes the current catalog of available tasks along with their detailed attributes. Secondly, device state information is fed into the DQN, providing details about the availability of devices and their current associations with tasks, if any. By considering both task information and device availability, the agents are equipped to make informed decisions for optimal task allocation.
A carefully designed reward mechanism is crucial to guide agents towards optimal task selection and resource utilization. The reward structure is designed to reinforce the timely completion of customer orders, prioritizing pick tasks. To ensure uninterrupted picking workflows, replenishment tasks necessary to facilitate pick operations receive a secondary level of reward. Overall warehouse efficiency is encouraged through a base-level reward for the completion of other tasks. Importantly, negative rewards are introduced as an obstacle against actions that lead to wasted
resources or unnecessary delays. This includes actions such as selecting tasks that are already in progress, attempting to use unavailable devices, or remaining idle.
At the core of the proposed framework is a Multi Agent Reinforcement Learning approach utilizing Deep Q-Networks (DQNs) as the decision-making policy for each agent. DQNs, a powerful form of reinforcement learning, allowing agents to learn the value of different action within a given state. Over time, as agents interact with the environment and observe the outcomes of their decisions, they learn to make optimal choices. The use of individual DQNs for each agent, both human and robotic, is critical. This individualized DQN approach allows for personalized learning plans tailored to the specific capabilities of agent type, making the overall task allocation system more effective and adaptable.
The framework is evaluated through a series of simulation experiments. Scenarios are designed to assess the MARL-based system against a baseline random policy, focusing on assessing the system's impact with a human-only workforce and limited devices and examining the system's performance and adaptability with the introduction of robots alongside human workers.
Simulation results demonstrate a consistent improvement in key efficiency metrics when more agents employ the learned DQN policy. Notably, pick task completion efficiency shows substantial improvements, indicating the framework's success in prioritizing time-critical outbound tasks. Furthermore, the MARL system demonstrates effective device utilization, with a decrease in idle resources. This thesis confirms MARL's potential for optimizing task allocation in warehouse environments with heterogeneous workforces. Future extensions could explore more sophisticated coordination techniques across agents and incorporate warehouse layout into the decision-making process.
Description
Citation
Priyadarshana, K.A.A. (2024). Multi agent reinforcement learning based warehouse task assignment [Master’s theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/24232
