MediaPipe in BCI Setting

MediaPipe Use in a Clinical Brain Computer Interface Setting

Implement, optimize and inform future experimental design of machine learning tool.

Figure 1: MediaPipe output example

GitHub Link

For children living with cerebral palsy, fatigue is an ever present concern, particularly when participating in rehabilitation. Brain-computer interfaces (BCI) may help complement traditional rehabilitation, but evaluating fatigue still relies on time-consuming video annotation or expensive equipment for clinical motion analysis. MediaPipe, an open-source machine learning framework developed by Google, can detect a variety of body landmark coordinates from a single camera angle. The objective of my study was to assess the efficacy of using MediaPipe to analyze physical fatigue in videos of children before and after BCI use in a clinical setting. However, after further work was done in the processing of videos with MediaPipe, I pivoted my project to assessing the efficacy of MediaPipe use in a clinical BCI setting.

MediaPipe models were used to analyze 200 videos from 35 children performing the Box and Blocks Test (BBT) while wearing an electroencephalogram (EEG) headset. In the BBT, participants grasp a block from a box, transport it over a barrier, and release it as many times as possible in 1 min. Three different MediaPipe models were run (one newly released model (Hand) 3 and two legacy models (Hand Legacy, Holistic)) each at a variety of optimized confidence levels for the MediaPipe machine learning hyperparameters.

Model outputs were compared, combined and finally factors in the methods inhibiting model performance were identified. My work is being included in a pending publication in the IEEE Transactions on Neural Systems and Rehabilitation Engineering.

Below summarizes the key steps I took throughout the project.

Preprocessing

Firstly, I wrote a script to preprocess so that each video was at the same rotation relative to the Box and Blocks setup. To efficiently rotate the videos, I developed a script that allows the user to select the corners of the box and blocks setup for each video consecutively. These variables were automatically saved to a csv file and then another script was created to use these variables and rotate each video accordingly.

MediaPipe Processing

I wrote a script to process each video on 3 different MediaPipe models. Using pandas dataframes, I saved 260 outputs per frame to a csv file. In order to increase efficiency when running MediaPipe on multiple videos, I implemented parallelization to utilize all logical processors on the computer. This reduced the run time to about one-fifth of its initial duration, resulting in MediaPipe processing an approximately real time of each video.

Hyperparameter Optimization

For each model, I then optimized confidence intervals for the machine learning hyperparameters. I gained knowledge on and implemented various minimization methods including differential evolution, BFGS, Neldermead and Powell. While attempting different optimization algorithms, I ran brute force optimization in the background. As the brute force optimization was completed before an optimization model reached a conclusive answer, I moved forward with the brute force parameters.

Comparing MediaPipe Models

Next, I compared the performance of the various MediaPipe models. The metric used for performance was the percentage of frames with a detected hand landmark. While each model had an approximately similar average performance across all 200 videos, I determined that the hand and holistic models performed significantly differently for different videos. I validated through statistical analysis that these models were performing significantly differently across the 200 videos. This finding led to my idea to combine outputs of different models in order to increase the amount of usable data.

Increasing Useable Data

The combination of multiple model outputs resulted in a 17% increase in landmarks detected and usable data. I recommended that this strategy is implemented by future researchers when using MediaPipe.

Model Inhibitors

In order to increase MediaPipe success for future experimental methods, I determined factors in the experimental design that are inhibiting MediaPipe performance.

The occlusion of key points on the hand used in the machine learning algorithm. For the Hand models, this point is the base of the hand, which is used to determine the region of interest of the whole hand. With the use of Box and Blocks data, where the base of the hand is occluded for the middle part of the motion, we found that these frames with occlusion of this area often did not have a detected landmark. Future methods should adjust the camera angle to avoid occlusions of this part of the hand.
Exclusion of the whole body in the frame. The holistic model first uses the pose model to detect the whole body and the region of interest of the hand. Therefore, including the whole body in the frame may increase performance of this model. The Box and Blocks clinical data cut off the lower torso and below of the participant, in contrast to the hand waving validation data which included the whole body as seen below the desk. In outputs with the Box and Blocks clinical data, it was often seen that the partially cut off torso would not align with the true torso location.
Using lower quality cameras with high speed motions. In the Box and Blocks Test, the participant is asked to move the blocks as quickly as possible. This resulted in some blurring of the frames. As determined previously, there is a correlation between speed and MediaPipe accuracy. A higher quality of camera should be used in future experiments when there are high speed movements to reduce camera blurring.
Inclusion of research specific objects. In the clinical Box and Blocks Test videos, the participants were wearing a research grade EEG headset. The electrodes were occasionally misdetected as a face by the holistic model. Transfer learning could be considered in specific research scenarios where uncommon objects directly placed on the user are included in the videos to be processed. MediaPipe is in the process of releasing an easy to use transfer learning framework, Model Maker, that may be of interest for future researchers using MediaPipe.

Fatigue Analysis

Present Findings

Lastly, I presented my findings in an oral presentation at the University of Calgary Biomedical Engineering Student Symposium.

Although the data obtained by MediaPipe was sparse, and the focus on my project became assessing its efficacy in a clinical BCI setting and informing future experimental designs, I attempted fatigue analysis to validate my initial analysis strategies. This helped me gain additional knowledge on statistical analysis which I can use for future projects.

Normalization of the data was more complex than I initially thought it would be. As each video was taken at a different depth to the participant and at different translational placements, I had to normalize the data to account for both of these effects. I developed a script to efficiently select a common object in the experimental setup, and to the output points normalized to this common object.

Next, I used peak finding functions to determine the max amplitude. After adjusting these functions to be optimized to my data set, I performed statistical analysis on the amplitude before and after BCI use.