Maximizing diversity within and among teams in a large-scale project

This work aims to improve an earlier methodology for assigning personnel to diverse three-member teams. Notably, the original algorithm focused only on diversity within teams, to ensure that conflicting interests are represented in each team. While this indeed created diverse teams, in many cases different teams featured the same combination of conflicting interests. The client for the original project, a government agency, asked for a methodology that produces more combinations. Hence, the current study presents an approach for boosting diversity among teams. That is, the new method maximizes the differences among teams, not only differences within teams. This was achieved by limiting the number of times each combination appears, while maintaining maximal diversity within teams as well. The upgraded algorithm is scalable and fast converging.• We suggest an integer linear programming algorithm for creating thousands of diverse three-member teams that represent the conflicting interests of different groups.• The algorithm imposes penalty costs on potential assignments based on their deviations from the project's requirements and sets upper bound constraints on the frequency of different assignments.• The algorithm is efficient, scalable, and converges to maximal diversity within seconds.


Introduction
Diversity plays an important role in many aspects of modern life -politics, business, education, etc. [ 1 , 2 ]. There are cases where such diversity is essential. For example, in politics, when different causes and interests compete for limited resources, or when mutual monitoring is required to prevent corrupt deals behind closed doors. Diversity is also necessary when staffing positions in committees and boards of public sector organizations.
The maximally diverse grouping problem (MDGP) is well-known in the literature. MDGP aims to assign a given set of elements (i.e., employees) to a fixed number of mutually disjoint subsets (i.e., teams) to maximize overall diversity (i.e., among members of the same team). The approaches for solving this problem include cross entropy [3] , variable neighborhood search [4] , iterated maxima search heuristic algorithm [5] , a three-phase search [6] , an iterated tabu search algorithm [7] , a hybrid genetic algorithm [8] and many more. A variation of MDGP refers to cases where diverse teams need to be created from several homogenous groups. This sub-problem is often formulated as a multisided game [9][10][11] or as a k-partite graph [ 12 , 13 ]. This variation of MDGP has numerous applications in the real world. Thus, many case-specific algorithms and heuristics were developed for it. Examples include staffing hospital shifts [14][15][16] , assigning flights to airport gates [ 17 , 18 ], matching teachers to schools [ 19 , 20 ], integrating experts from several fields in teams to increase creativity and innovation within organizations [21] ; creating heterogeneous teams of students [ 22 , 23 ]; nominating students for exchange programs; and even assigning multiple targets for a swarm of unmanned aerial vehicles [24] .
Besides Yeoh and Mohamad [23] , whose approach is based on randomization, all the other abovementioned studies suggest binary variables models (i.e., each pair -element, team -is represented by a variable with a value of 1 if the element participates in the team and 0 otherwise). Then, heuristics algorithms are developed because these models are characterized by a rapid increase in run time. Both approaches -randomization and heuristics algorithms -do not guarantee optimal outcomes. Moreover, under the randomization approach, the gap between the optimal solution and the converged result is not necessarily bounded.
In a previous study, we suggested a different approach to maximizing the diversity of teams created from homogenous groups [25] . The novelty of our updated model lies in integrating two concepts: recognizing and prioritizing two forms of diversity, both within and among teams, and focusing on groups rather than on the individuals within groups. The first concept is integrated by defining diversity rules and setting a price tag for each rule so that its value is proportional to its priority. Thus, the requirement for maximal diversity is equivalent to the objective of minimizing the penalties. The second concept enables us to efficiently solve very large problems (i.e., creating a massive number of diverse teams). The current study improves the original algorithm by adding an iterative process that enables us to maximize diversity among teams as well. Both versions of the algorithm are efficient, scalable, and converge within seconds.
In Section 2 , we present the problem that was the trigger for developing the algorithm. In Section 3 , we present the original algorithm and apply it to an illustrative example. In Sections 4 and 5 we present the updated algorithm, apply it to the illustrative example and two case studies, and compare the solutions to demonstrate the improvement.

Framing the Problem
The challenge we faced was to create thousands of diverse teams for a special project. Each team included three members: chair, deputy, and associate, selected from various interest groups. The number of groups could potentially range from 10 to 20, their size was proportional to their political power, and the total number of members in all groups was sufficient to fully staff all the required teams. There were three levels of shared interests among groups: high, medium, and low. High refers to groups whose interests match strongly; Medium applies to groups that share similar but not the same interests; Low applies to groups that share very limited or no interests. Members of the same group would naturally be designated as High.
The project's managers preferred that team members share as few common interests as possible to facilitate effective monitoring and increase the odds of optimal performance, which depended on minimizing improper cooperation between team members. In the past, creating such groups was based on a lottery and the greedy approach -that is, in each round three delegates were randomly selected. If this trio shared a low level of interests, the team was approved, the size of their groups was reduced by 1, and the next lottery was executed. Otherwise, the groups' sizes remained the same and another lottery trial was executed. This process was exhausting, time-consuming, and did not guarantee maximal diversity. Furthermore, there was no diversity benchmark for evaluating the results.

The original algorithm
The basic algorithm we suggested to resolve the problem included three stages. First, all possible combinations for setting up three-member teams are listed. Second, a penalty function is calculated for each combination. Third, the mix of combinations that minimizes the total penalty is determined using integer linear programming (ILP).

Stage 1 -Creating the list of combinations
The list of combinations is created using the existing groups: delegates from the first, second, and third groups are chosen for the roles of chair, deputy, and associate roles, respectively. At this stage, we do not restrict the team's makeup (i.e., several delegates from the same group may be chosen). Thus, given that there are M groups, the number of possible combinations is M 3 .

Stage 2 -Defining the penalty function
First, based on management's preferences, we set price tags as follows: a pair of delegates from groups that share a high level of interests costs 111 points; a pair of delegates from groups that share a medium level of interests costs 11 points; a pair of delegates from groups that share a low level of interests costs 1 point. These price tags were chosen to reflect the fact that each level of interests includes the levels below it as well, and is thus penalized for it.
Second, we implemented this rule to define the price tag for each pair of groups according to their level of shared interests.
Third, we calculated the penalty incurred by each combination created in the first stage as the sum of the price tags of its pairs.
For example, suppose that there are four groups: A, B, C, D. Thus, there are 64 ( = 4 3 ) combinations in the list -starting with (A, A, A) and ending with (D, D, D). Also, assume that the map of interests is as follows: Only pairs of delegates from the first group -such as (A, A), (B, B), etc. -share a high level of interests, so their price tag is 111; groups A and B share a medium level of interests, so the price tag of (A, B) and (B, A) is 11; groups C and D share a low level of interests, and therefore the price tag of (C, D) and (D, C) is 1.
Referring to the rules defined above, the penalties assigned to combinations in the list created in the first stage ranged from 333, for teams where all delegates belong to the same group, to 1, for teams comprising delegates from groups (A, C, D) or from groups (B, C, D). The complete penalty function and its frequency (i.e., the number of combinations assigned each penalty) are shown in Table 1 .

Stage 3 -Integer linear programming
Next, integer linear programming is used to find the best mix of combinations, which maximizes the diversity of each team. This goal is equivalent to the objective of minimizing the total penalty imposed on all teams.
Because the penalties are derived from the teams' combinations, we defined the decision variables as follows: each decision variable represents the number of times a specific combination is assigned. The "price tag" of a team is based on the level of common interests among its members: high penalty for high level; medium penalty for medium level; and low penalty for low level.
Penalty Frequency This approach enables us to substantially reduce the size of the problem because the focus is not on the number of matches required, which may be tens of thousands, but on the number of possible combinations (which is the number of groups to the power of 3 -hundreds or a few thousands at most).
The problem's constraints include sets of equations -a set of three equations for each group. Each equation represents a role. The sum of decision variables that represent combinations in which this group holds this role is equal to its number of delegates for this role. The total number of constraints is three times the number of groups.
Referring to the illustrative example (section 3.2), assume that 1,0 0 0 teams should be assigned, and that the number of delegates for each role and group are as given in Table 2 . The ILP formulation is as follows in section 3.4.  ( s 1 , s 2 , . . . s 64 ) = ( ( A , A , A ) , ( A , A , B ) , . . . , ( D , D , C ) , ( D , D , D ) ) s j = s chair ( j ) , s deputy ( j ) , s associate ( j ) C The penalty costs of vector S, c j is the cost for assigning one team by the s j combination, following the penalty function as shown in Table 1 . ( 333 , 133 , . . . , 113 , 333 ) Decision variables X A vector in which each element represents the number of teams in the j-th element of S; i.e.,

The ILP formulation of the illustrative example
x 1 represents the number of teams where the combination is ( A , A , A ) ; x 2 represents the number of teams where the combination is ( A , A , B ) ; … x 64 represents the number of teams where the combination is ( D, D, D ) .

Constraints
Constraints for chair assignments: Constraints for deputy assignments: Table 3 Mix of combinations in the optimal solution produced by the original algorithm -illustrative example (maximizing diversity within teams). Constraints for associate assignments:

Serial number in combinations vector
Value range: The general mathematical formulation is given in the appendix.

Results
The algorithm was coded in Python 3.7.2 on a conventional laptop. We applied it to the illustrative example and got the optimal solution of 10,0 0 0 penalty points within 0.68 seconds, distributed as follows: 900 teams with a price tag of 11, and 100 teams with a price tag of 1.
When zooming into the overall mix of teams, we noticed that only 9 combinations were used out of the possible 64. Furthermore, there was no balance among these 9 combinations -one combination was assigned 300 times whereas others were used only 50 times (see Table 3 ). Such outcome is justified when that is the only mix that minimizes the total penalty, but was it really the case? It is more likely that the model terminates when the minimum total penalty is achieved, because no limit was imposed on the frequency of combinations. This means that only one aspect of diversity (that is, within teams) is addressed in the original algorithm, and not a second aspect (diversity among teams). This drawback is discussed and solved in Section 4 .

The Updated Algorithm
As noted, the concept of diversity can be applied within teams (assigning diverse delegates to a given team), but also among teams (producing teams featuring diverse combinations of delegates).
Although the original algorithm is efficient, scalable, and converges quickly, it only addresses the first concept of diversity mentioned above. As a result, a specific match with a minimal penalty may be used to produce many teams, while other combinations with minimal penalty scores may be used only a few times or not at all, even though the optimal values of the objective function in both cases are the same. In the long run, interest groups may exploit this drawback to promote improper cooperation. To prevent or limit this possibility, we suggest a solution that limits the number of teams featuring the same combination of delegates in a way that ensures diversity both within and among teams. This goal is achieved by adding an iterative process to stage 3 based on a binary search, as follows: Initialize: a. Perform stage 1, stage 2, and stage 3 of the basic algorithm to determine Z and X. Z is the optimal value of the objective function, and X is the vector of instances (i.e., the number of times each combination is used). b. Set: Q is the largest element in X, i.e., the number of times the most popular combination is used.
To guarantee a solution, the minimal number of times each combination is used should be at least the number of required teams divided by the number of potential combinations.
Step 1 Determine: The optimal solution is X, the number of combinations used is R, and the value of the objective function is Z.
Step 3 Calculate an upper bound: Step 4 Add a set of inequalities to the original ILP model to create an updated model (hereafter -U-ILP): Step 5 Run the U-ILP model and get new Z1 and X1 ( Z1 is the optimal value of the U-ILP objective function, and X1 is the vector of instances).

Step 6
If Z1 = Z set X : −X1 , Q : −UB Else set L : −Q1 , Go to step 1 Go to step 2 The performance of the upgraded algorithm follows the performance of binary search log 2 n , where in our context n is the value of ( Q − L ) , calculated in step 1b.
Applying the updated algorithm to the illustrative example increased the number of used combinations to 14 and balanced their frequency, as presented in Table 4 . Whereas the difference between the most used and the least used combinations was 250 (300-50) in the original algorithm, it was only 88 (112-24) in the updated algorithm.

Implementation
The updated algorithm was applied to the two cases presented in Talmor [25] . The first case refers to creating 10,250 teams using delegates from 12 groups. The second case refers to creating 10,600 teams using delegates from 11 groups. The entire input data of these two cases is presented in Table 5 . Table 6 and Table 7 detail the process of the updated algorithm in cases 1 and 2, respectively. In case study 1, the maximal diversity among teams was achieved in the 6th iteration. Note that the number of different combinations increased from 35 to 75, while the total penalty remained minimal. The process ended after approximately 20 seconds.
In case study 2, the maximal diversity among teams was achieved in the 5th iteration, where the gap between L and Q was bridged. The increase in the number of different combinations from 31 to 80 did not affect the total penalty value. The process ended after approximately 25 seconds.
When comparing the distributions of the price tags of the original algorithm solution vs. the updated algorithm solution, no substantial difference was observed. Fig. 1 demonstrates this finding. Table 4 Mix of combinations in the optimal solution produced by the updated algorithm -illustrative example (Maximizing diversity within AND among teams).

Discussion and Summary
This paper presents a practical and effective approach for creating thousands of diverse teams based on two notions: the first is the ranking of diversity preferences and setting penalties according to this rank; the second is referring to a group (and not to an element within it) as the basic entity when formulating the algorithm.
Thus, we set the stages of the algorithm as follows: first -identifying all possible team combinations; next -assigning penalty costs to each option, based on our diversity ranking; lastrunning an integer linear programming model formulated to minimize the total penalties and achieve an optimal set of assignments. An updated algorithm, based on the original one, was developed using an iterative binary search, so diversity was maximized both within and among teams.
Although the idea of a penalty may seem similar to previous studies that integrated multi-criteria analysis in the assignment process [ 20 , 26 ], the preferences for the ranking methods and mathematical formulation are different. Moreover, both the original and updated algorithms can be applied to problems involving very large numbers of assignments without increasing the convergence time to an optimal solution, as demonstrated in the case studies. The scalability and rapid convergence are Table 6 The iterative process of case study 1. Applying the updated algorithm to calculate maximal diversity within and among teams.  Table 7 The iterative process of case study 2. Applying the updated algorithm to calculate maximal diversity within and among teams.  rooted in the fact that the size of the problem depends only on the number of groups, rather than on the total number of groups members. These benefits make our approach highly relevant in cases where heterogeneous teams are required (i.e., hospital shifts, educational activities, hackathons). Our model is also suitable for cases where team members must monitor each other and the number of teams with the same combination of members needs to be limited. Examples of such cases include counting election ballots or preventing security breaches at sensitive facilities.
As mentioned above, the number of groups impacts the scope of the problem: the larger this number, the more decision variables are required, increasing the run-time of the ILP model. In practice, the problems we coped with featured no more than 20 groups at most. Thus, the optimal solution was achieved in under 30 seconds for the updated algorithm. However, applying the algorithm to a problem involving a larger number of groups should be analyzed separately. More potential directions for future research include applying this approach to teams of different sizes or to larger teams.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Data will be made available on request.

Acknowledgments
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
The author would like to thank "CEC" for the opportunity to develop the algorithm presented. where: s j = (s chair ( j) , s deputy ( j) , s associate ( j)) , for s role ( j) ∈ M, role = chair, deputy, associate C The penalty costs of vector S, meaning that c j is the cost for operating one team by the s j combination.

X
The vector of instances, i.e.: a vector in which each element represents the number of times the s j combination is used.