Stochastic Friendship Maximization Algorithm

Problem Statement

Given N potential friends, determine optimal pairings.

Overview

The algorithm has two steps, (a) obtain pairwise numerical friendship scores and (b) identify pairings that maximize friendship.

Stage 1: Obtaining Rankings

The procedure for obtaining rankings is:

– Gather N potential friends and the same number of conversation prompts.
– Pair participants, and have them talk casually for roughly five minutes on one of the topics. After the time is up, have each person secretly rate their perceived friendship potential on a 0-10 scale.
– Repeat using a round robin approach to measure all possible pairings.

The desired outcome of this stage is a matrix representing perceived friendship possibility. The diagonal should be set equal to zero (self-friendship is lonely, and so assumed to be suboptimal). The matrix should be complete and asymmetric, as each pairing was ranked by both members.

Discussion of Stage 1: Obtaining Rankings

Ideally the prompts will have the potential to be either divisive or spark connection – smalltalk is to be avoided. A future writeup about topic selection is being written.

It is worth noting that some data validation and cleanup may be required. Initial empirical studies exhibited missing values, numbers outside the scoring range (e.g. “one million”), and non-numerical results (e.g. a picture of a cat wearing a party hat).

Stage 2: Friendship Maximization

First, the full asymmetric friendship matrix is transformed into a symmetric matrix by taking the element-wise geometric mean of the matrix and its transform.

Next a stochastic optimization is performed. Random pairings are generated and the friendship score is calculated. This is repeated for a fixed number of iterations. The optimality criteria is the mean of the indices of the symmetric matrix corresponding to the proposed pairings. The pairing with the highest score is declared the friendship maximizing paring.

Discussion of Stage 2: Friendship Maximization

The first version of this algorithm used the algebraic mean rather than the geometric mean to calculate symmetric friendship scores from the asymmetric matrix. Using the geometric mean is recommended because highly dissimilar rankings are unlikely to produce a good friendship. Specifically, a (5,5) friendship ranking and a (9,1) friendship ranking have the same algebraic mean, but the (5,5) pairing is much more likely to produce a mutually satisfactory friendship and has a higher geometric mean.

Multiple cost functions were examined for the optimization stage. Most produced unsatisfactory results. For example, using the median produced an unacceptably high variance in pairwise friendship scores – simply put, there is no cost in making a few really bad pairings.

Finally, the choice of a fully stochastic search was made for three reasons:


Future Work

There are three improvements I would like to see:

1. A Bayesian extension, where social web mining can be used to generate prior distributions.
2. Using simulated annealing rather than a full stochastic search.
3. Additional empirical testing may improve conversation prompts.

References