Research Brief | Computing+ AI Professor Cewu Lu: A Knowledge Engine Foundation HAKE for Human Activity Understanding

Source:上海高等研究院英文网

Human activity understanding is of widespread interest in artificial intelligence and spans diverse applications like health care and behavior analysis. Although there have been advances with deep learning, it remains challenging. The object recognition-like solutions usually try to map pixels to semantics directly, but activity patterns are much different from object patterns, thus hindering another success.

Recently, Professor Cewu Lu proposed a novel paradigm to reformulate this task in two-stage: first mapping pixels to an intermediate space spanned by atomic activity primitives, then programming detected primitives with interpretable logic rules to infer semantics.  Intermediate primitive space embedded activity information in images with limited and representative primitives. The authors built a comprehensive knowledge base by crowd-sourcing. As primitive space was sparse, they can cover most primitives from daily activities, i.e., one-time labeling and transferability. Human Activity Knowledge Engine (HAKE) was trained on the authors knowledge base and could detect primitives well and focus only on primitives instead of the whole image, thus achieving an equivalent higher SCR, a novel measurement depicting image semantic “density” introduced by authors in this study.

HAKE Overview. a. We cast activity understanding into: a(1) Knowledge base construction: annotating large-scale activity-primitive labels to afford accurate primitive detection. a(2) Reasoning: Given detected primitives, adopting neuro-symbolic reasoning to program them into semantics. b. Detailed pipeline. b(1) Primitive detection and Activity2Vec. Given an image, we utilize the detectors to locate the human/object and human body parts. Then we use a simple CNN model together with Bert to extract the visual and linguistic features of primitives via primitive detection. b(2) Primitive-based logical reasoning. With the two kinds of representations from Activity2Vec, we operate logical reasoning in a neuro-symbolic paradigm following the prior and auto-discovered logic rules. Here, NOT() and OR(,) modules are shared by all events but drawn separately here for clarity.

A reasoning engine programs detected primitives into semantics with explicit logic rules and updated the rules during reasoning, which meant diverse activities could be composed of a finite set of primitives via logical reasoning with compositional generalization.

Nature of activity perception: atomic primitive and logical reasoning. a. Primitives exist such as body part states and objects. Here, some common hand states were shown. b. Activities can be inferred by programming primitives following logical rules.

The authors conducted experiments on large-scale benchmarks including HICO-DET, AVA, V-COCO, and Ambiguous-HOI, and based on the results on detection, transfer ability, upper bound, effectiveness of logical modules, and ablation study, they believed that HAKE yields a great performance boost for few-shot learning with conceptual and logical descriptions, under the conditions of scalable and easily adapted primitive dictionary and logic rule base.  Besides, HAKE showed its significant ability in SCR test to locate key semantics via primitive detection and reasoning, ensuring that primitive space could faithfully embed the visual activity information. And the reasoning engine bridged primitive semantic space well.

Masking results from HAKE and humans in the SCR Test. The verbs are given at the top. The source of masking is marked below the sub-image. Two sets of maskings are very similar and difficult to distinguish even by human participants (59.55% accuracy).

From the knowledge base with 26+ M primitive labels and logic rules from human priors or automatic discovering, HAKE exhibited superior concept learning with compositional generalization and performance upon canonical methods on challenging benchmarks, and could be a good platform based on real-world data for cognition analysis and causal inference.

The work was published as ‘HAKE: A Knowledge Engine Foundation for Human Activity Understanding’ by IEEE Transactions on Pattern Analysis and Machine Intelligence and could be accessed at https://ieeexplore.ieee.org/document/10002711. The code is available at http://hake-mvig.cn/.

 

About Professor Lu

Dr Cewu Lu is a Professor at Shanghai Jiao Tong University (SJTU). Before he joined SJTU, he was a research fellow at Stanford University working under Prof. Fei-Fei Li and Prof. Leonidas J. Guibas. He was a Research Assistant Professor at Hong Kong University of Science and Technology with Prof. Chi Keung Tang. He got his PhD degree from The Chinese Univeristy of Hong Kong, supervised by Prof. Jiaya Jia.

Dr Lu is mainly engaged in computer vision and robotics research, and has achieved several breakthrough research results. He has published open source AI frameworks and datasets with top international level, such as Alphapose (GitHub Star 5000+), HAKE (Human Behavior Engine), and GraspNet (High Performance Robot Grasping System), a real-time human posture estimation system.

About SIAS

Shanghai Institute for Advanced Study of Zhejiang University (SIAS) is a jointly launched new institution of research and development by Shanghai Municipal Government and Zhejiang University in June, 2020. The platform represents an intersection of technology and economic development, serving as a market leading trail blazer to cultivate a novel community for innovation amongst enterprises. 

SIAS is seeking top talents working on the frontiers of computational sciences who can envision and actualize a research program that will bring out new solutions to areas include, but not limited to, Artificial Intelligence, Computational Biology, Computational Engineering and Fintech.