**8.2 Active player determination**

In a typical footage of a handball game or training session, at a given time only one player or a small proportion of players present on the scene participate in the action that is currently in focus, e.g., jump-shot, passing, while others may perform actions that are not currently relevant for interpreting the situation, e.g., moving into their positions. To train the action recognition models, the actions of those players that perform the action of interest should be annotated. The annotation process is at least partly manual, so it is time-consuming and tedious given the large amounts of video data to process. To assist with annotation, a method is proposed to select among the detected and tracked players, the ones that are currently the

#### **Figure 12.**

*A typical training situation. Two players on the right are performing the current task, while the rest are queuing.*

**161**

**Figure 14.**

*Application of Deep Learning Methods for Detection and Tracking of Players*

tion is reduced to verification of only the proposed active players' tracks.

*b*

*OF*

most likely to be performing the action of interest, here called active players. Thus, instead of reviewing every single players' activity at all times, the manual annota-

First, the players are detected and tracked as described in previous chapters. Then, the information about player positions, i.e., the detected bounding boxes, is combined with the low-level movement features, such as optical flow or spatiotemporal interest points, to obtain a measure of each player's activity in the considered

The optical flow-based measure of player's activity ( *OF Ab* ) is calculated using the bounding box ( *Bb* ) of each detected player bounding box, as the maximum optical

*<sup>b</sup> x y <sup>b</sup> <sup>B</sup>*

As for calculating the optical flow, there are several algorithms that can be used to detect the STIPs, e.g. the method presented in [30] is based on the Harris corner operator (Harris3D) that is extended into the spatiotemporal domain, [31] uses a Gaussian filter in the spatial domain and a Gabor band-pass filter in the temporal domain, the algorithm presented in [32] is based on Hessian3D derived from SURF, and the selective STIPs detector [33] is designed to specifically detect the STIPs that

Given that movement during the performance of various sports actions causes significant variation of appearance in that part of image, it is expected that there will be more detected STIPs in image regions with more intense player's activity [34]. An activity measure based on density of STIPs in the area near the detected player, *STIP Ab* , can be calculated for a player with a bounding box *Bb* and area *Pb* as:

*b b*

*<sup>P</sup>* <sup>=</sup> # ; within . (6)

*STIP <sup>A</sup> STIP B*

In the experiment here, the Harris3D detector with the default parameters was used to extract the STIPs. **Figure 14** shows an example of the detected player

*b*

An alternative feature to optical flow for defining the activity measure are spatiotemporal interest points, or STIPs. STIPs are an extension from the spatial domain into both spatial and temporal domain of the notion of interest points in images, which are points with a significant local variation of image intensities. STIPs are thus points in the image with large variation of values in both spatial and

*A V xy B* = max ; , within , (5)

*DOI: http://dx.doi.org/10.5772/intechopen.96308*

flow magnitude within that box:

temporal directions around these points.

likely belong to persons and not to the background.

bounding boxes and STIPs in a video frame.

*Detected players and spatiotemporal interest points.*

*STIP*

time interval.

**Figure 13.** *An example of segmentation of a video sequence using optical flow magnitude.*

#### *Application of Deep Learning Methods for Detection and Tracking of Players DOI: http://dx.doi.org/10.5772/intechopen.96308*

*Deep Learning Applications*

exercise to be performed. The periods of higher activity, when players perform the exercises are characterized with the higher magnitude of extracted motion features from video, while the periods when players queue or wait for instruction (**Figure 12**)

To mark the periods of inactivity and segment the videos into sections where a single exercise is repeatedly practiced, an optical flow threshold is used. First, the optical flow field is calculated between two consecutive frames sampled each N frames (here, N = 50). Then, mean optical flow magnitude is calculated for each field, resulting in a single value for each sampled time point in video. The video is cut at time points when the mean magnitude of optical flow is lower than an experimentally determined threshold value. An example of the mean optical flow magnitude calculated for a video sequence where there were short pauses of 10–20 seconds between active repetition of an exercise was is shown in **Figure 13**. It can be seen that the normalized flow threshold of about 0.07 clearly separates the

In a typical footage of a handball game or training session, at a given time only one player or a small proportion of players present on the scene participate in the action that is currently in focus, e.g., jump-shot, passing, while others may perform actions that are not currently relevant for interpreting the situation, e.g., moving into their positions. To train the action recognition models, the actions of those players that perform the action of interest should be annotated. The annotation process is at least partly manual, so it is time-consuming and tedious given the large amounts of video data to process. To assist with annotation, a method is proposed to select among the detected and tracked players, the ones that are currently the

*A typical training situation. Two players on the right are performing the current task, while the rest are* 

*An example of segmentation of a video sequence using optical flow magnitude.*

are characterized with lower magnitude of motion features [29].

periods of inactivity from parts of video showing exercise.

**8.2 Active player determination**

**160**

**Figure 13.**

**Figure 12.**

*queuing.*

most likely to be performing the action of interest, here called active players. Thus, instead of reviewing every single players' activity at all times, the manual annotation is reduced to verification of only the proposed active players' tracks.

First, the players are detected and tracked as described in previous chapters. Then, the information about player positions, i.e., the detected bounding boxes, is combined with the low-level movement features, such as optical flow or spatiotemporal interest points, to obtain a measure of each player's activity in the considered time interval.

The optical flow-based measure of player's activity ( *OF Ab* ) is calculated using the bounding box ( *Bb* ) of each detected player bounding box, as the maximum optical flow magnitude within that box:

$$A\_b^{\text{OF}} = \max\_{B\_b} \left| V\_{\ll, \gamma} \right|; \mathfrak{x}, \mathfrak{y} \text{ within } B\_b \tag{5}$$

An alternative feature to optical flow for defining the activity measure are spatiotemporal interest points, or STIPs. STIPs are an extension from the spatial domain into both spatial and temporal domain of the notion of interest points in images, which are points with a significant local variation of image intensities. STIPs are thus points in the image with large variation of values in both spatial and temporal directions around these points.

As for calculating the optical flow, there are several algorithms that can be used to detect the STIPs, e.g. the method presented in [30] is based on the Harris corner operator (Harris3D) that is extended into the spatiotemporal domain, [31] uses a Gaussian filter in the spatial domain and a Gabor band-pass filter in the temporal domain, the algorithm presented in [32] is based on Hessian3D derived from SURF, and the selective STIPs detector [33] is designed to specifically detect the STIPs that likely belong to persons and not to the background.

Given that movement during the performance of various sports actions causes significant variation of appearance in that part of image, it is expected that there will be more detected STIPs in image regions with more intense player's activity [34].

An activity measure based on density of STIPs in the area near the detected player, *STIP Ab* , can be calculated for a player with a bounding box *Bb* and area *Pb* as:

$$A\_b^{STIP} = \frac{\#STIP}{P\_b}; STIP \text{ within } B\_b \text{ .} \tag{6}$$

In the experiment here, the Harris3D detector with the default parameters was used to extract the STIPs. **Figure 14** shows an example of the detected player bounding boxes and STIPs in a video frame.

**Figure 14.** *Detected players and spatiotemporal interest points.*

**Figure 15.**

*Detected leading player (white box) and his trajectory through the whole sequence (yellow line).*

A threshold of activity measure can be used to filter active from inactive players, since the players that perform sports actions should make more sudden movements corresponding to higher activity measures than other players.

Looking at activity measures in a sequence of frames, the ranking of players' activity can change between frames. So, to choose the players that are active throughout the sequence, the Active player score is calculated as the average activity measure of the player along the trajectory of the player's bounding boxes. The result is a set of player trajectories with corresponding player activity scores (**Figure 15**).
