June 8, 2020, midnight UTC
Sept. 1, 2020, midnight UTC
If your submission is valid, we compute a score that consists of two components:
The \(F_1\)-score is a popular statistic that takes both, precision and recall into consideration. It is computed by
$$ F_1 = 2 \cdot \frac{precision \cdot recall}{precision + recall} $$
but we will formalize this later. In the leaderboard, \(1 - F_1\) is used to rank your submissions, while the MSE serves as a tie-breaker. True to the motto of Kelvins: "Reach the absolute zero!", a score of (0, 0) would mean that you predicted all the objects within the given pixel-tolerances without introducing false positives. Generally speaking, your score will be worse if you
The total score of your submission constitutes an average of each score for each frame in the test-set. The exact details can be found in the following document: spotGEO_metric.pdf. Furthermore, the starter-kit (this link will take you to "Zenodo") provides the code which is run on the Kelvins server to compute the score.
To understand how the score is defined, we explain its computation for a fixed frame \(f\) in the following.
Assume that for a frame \(f\) we have a set of \(N\) ground-truth coordinates \(Y^f = (y_1^f, \ldots, y_N^f)\) and a submitted prediction with \(M_f\) coordinates \(X^f= (x_1^f, \ldots, x_{M_f}^f)\). In a first step, we compute a one-to-one matching between \(Y^f\) and \(X^f\) to determine, which predictions are likely meant to indicate which object. There are multiple ways on how to define such a matching, but for the scoring we decided to compute the matching which minimizes the sum of all truncated Euclidean distances of matched points (while maximizing the number of matched pairs). Formally, this constitutes the solution to a minimum weighted unbalanced assignment problem, which can be easily solved by the Hungarian algorithm, linear programming or similar.
Computing the matching like this will (in most cases) result in each object being matched with its nearest neighbor prediction. In cases, where one prediction would be the nearest neighbor for two objects (with no other candidates), the prediction would be associated to the closest of the two objects. In general, this matching is trying to minimize the distances between objects and predictions, which is important because there exists a border-radius \(\tau\) around each true object, in which your prediction has to fall to be (potentially) be counted as a true positive. A second, smaller tolerance-radius \(\varepsilon < \tau\) determines the contribution of your prediction to the regression error. If \(y_i\) is matched with \(x_j\) and \(d = d(x_j, y_i) = || x_j - y_i ||_2\) is the Euclidean distance between both, then the predictions counts as a true positive if \(d \leq \tau\).
The additive contribution of this matched pair to the regression error depends on the distance \(d\):
For cases, in which \(d > \tau\), the prediction is not counted as a true positive and contributes a fixed \(\tau^2\) to the regression error. Since this matching is outside the tolerance, it introduces one false negative and one false positive. Together with unmatched objects (which count as false negatives each) and unmatched predictions (which count as false positives each), we end up with the following sets
and define the sum squared error \((SSE^f)\) of frame \(f\) by:
$$ SSE^f = \sum\limits_{(i,j) \in TP^f} \pi(x_i, y_i) + \sum\limits_{j \in FN^f} \tau^2 + \sum\limits_{i \in TP^f} \tau^2 $$
where \(\pi(x_i, y_i) = 0\) if \(||x_i - y_j||_2 < \varepsilon\) and \(\pi(x_i, y_i) = ||x_i - y_j||_2^2\) otherwise.
In the special case that all \(TP^f, FP^f\) and \(FN^f\) are \(\emptyset\), we define \(SSE^f\) to be 0.
To the right, you can see an example prediction of \(\lbrace x_1, x_2, x_3, x_4\rbrace\) for true objects \(\lbrace y_1, y_2, y_3 \rbrace\), omitting the \(f\) notation of the frame for clarity. Solving the assignment problem, the matching that minimizes the sum over all distances puts \((y_1, x_1), (y_2, x_2)\) and \((y_3, x_3)\) together with \(x_4\) being unmatched.
To summarize, we have:
\(TP = \lbrace (1,1), (2,2) \rbrace\), \(FP = \lbrace 3, 4 \rbrace\), \(FN = \lbrace 3 \rbrace\)
and a regression error of
\(SSE = 0 + d(x_2, y_2)^2 + 3 \tau^2\).
For a given sequence of 5 frames, we have
\(TP = \sum\limits_{f = 1}^{5} |TP^f|, FP = \sum\limits_{f = 1}^{5} |FP^f|, FN = \sum\limits_{f = 1}^{5} |FN^f|\) and \(SSE = \sum\limits_{f = 1}^{5} SSE^f\).
We further define
$$ MSE = \frac{SSE}{TP + FN + FP}. $$
(again, \(MSE = 0\) if \(SSE = 0.\))
For a whole submission, consisting of \(K\) sequences, where we denote with \(TP_k\) the true positives of the \(k\)-th sequence (and so on with \(FP_k, FN_k, SSE_k\)), we compute precision \(P\), recall \(R\) and \(MSE\) as follows:
$$ P = \frac{\sum\limits_{k=1}^{K}TP_k}{\sum\limits_{k=1}^{K}TP_k + FP_k} \qquad R = \frac{\sum\limits_{k=1}^{K}TP_k}{\sum\limits_{k=1}^{K}TP_k + FN_k} \qquad MSE = \frac{\sum\limits_{k=1}^{K}SSE_k}{\sum\limits_{k=1}^{K}TP_k + FN_k + FP_k}. $$
With this we can compute \(F_1 = \frac{2 PR}{P + R}\). Our ranking is determined by \(1 - F_1\) with the smallest number being the winner. \(MSE\) is used as a tiebreaker.