Anam Zahra, Manuel Bohn, Pierre-Etienne Martin, Daniel Haun Max Planck Institute of Evolutionary Anthropology, Leipzig, Germany
One core assumption in developmental psychology is that children’s cognition develops in interaction with their social and physical environment. One way to study this relation is to code interactions from video recordings of children’s daily activities. However, this coding is usually done by hand and therefore very labor-intensive. Modern Computer Vision (CV) techniques – such as automatic people and object detection – can significantly reduce this effort and thereby facilitate the study of cognitive development. We want to use these techniques to evaluate a dataset of children’s daily activities. That is, we want to automatically quantify children’s interactions with people and objects. For this, we are collecting video recordings at home and in kindergartens using small lightweight bodycams. So far, we have recorded ten hours of videos from six children. To evaluate the accuracy of various models, we hand-coded a subset of videos. Antecedently, we have compared 11 state-of-the-art CV detector and objects against the hand-coded subset of videos. The detection accuracy is between 30-35%, leaving room for improvement. We identified key limitations of the state-of-the-art models by specifying systematic detection errors (i.e. conditions under which the model fails to detect a person). This is the basis for improving our processing pipeline. Our next step is to improve the state-of-the-art models by fine-tuning or changing the model architecture. Once the detection is sufficiently accurate, we want to use these models to study the effect of children’s daily activities on their cognitive development at a scale.