Vocal communication is a quintessential form of social interaction. Humans and other animals coordinate their behaviors at a distance by producing and perceiving distinct vocalizations at different timings. Therefore, brain networks related to vocal communication should include areas at the intersection of social behavior and vocal production-perception networks. Nevertheless, little is known about the interaction of these networks. In this talk, I will describe our attempt to fill this knowledge gap. We use marmoset, a highly vocal New World monkey to study vocal communication. We use functional ultrasound imaging of the brain to achieve large spatial coverage (16x20mm2) and high spatial (~125x130x400m3) and temporal (500ms) resolution in a behaving animal. Furthermore, we built a stochastic dynamical systems model of vocal behavior that interacts with the marmoset in a closed-loop to fully control the vocal interaction and make quantitative predictions about the brain dynamics of this interaction. We first show the existence of a medial brain system where the activities are related to social context - the social-vocal network (SVN). Second, we use the behaviorally validated computational model to predict that the activity in marmoset SVN entrains to the activity in the model SVN.