Different from the visual grammar of Kress & van Leeuwen, the visual "grammar" in China consists of methods of linguistics structure, rules of formal beauty, and semiotic relationships, and is beyond the three classifications of multi-modal discourse meanings of visual aspect as representation, interaction, and construction of pictures.The visual "grammar" is ingrained with epistemological elements of the embodiment, departing not only from representation, but also in paraphrasing in a various approach which leads to new interpretations of the concept of visual grammar, multi-modality.This paper attempts to illustrate it by an analysis of some humorous incidents.