Featured Research

 

Dissecting Query-Key Interaction in ViTs [link]

Token interactions in ViT can be better understood by singular value decomposing the query-key interaction matrix. Decomposed singular modes show surprising semantic properties. Further, in early layers self-attention prefers to group similar tokens; while in late layers self-attenion prefers dissimilar tokens indicating contextualization effects.

 

Contextual Surround Effects in CNNs [link]

Vision perception at a given point is influenced by its surroundings (an example). Many contextual surround effects boil down to surround suppression, usually best captured by a division computation.

This study aims to find the role of surround effects in deep neural networks (DNNs) and use DNNs to predict the form of surround effects in high visual areas in the brain. CNNs show surprising alignment with many of the classic surround effects. The surround effects can be visualized using a modified feature visualization technique. The most supressive surround pattern appears to be similar to the center pattern. We see this as a generalized surround effect beyond classic findings with grating stimuli.

In another study [link], we improve CNN classification accuracy by incorporating a brain-inspired weighted normalization.