Research Brief: PGT--- A Progressive Method for Training Models on Long Videos

Source:上海高等研究院英文网

Convolutional video models have an order of magnitude larger computational complexity than their counterpart image-level models. Constrained by computational resources, there is no model or training method that can train long video sequences end-to-end. Currently, the main-stream method is to split a raw video into clips, leading to incomplete fragmentary temporal information flow.

Inspired by natural language processing techniques dealing with long sentences, a rseach group led by Dr Cewu Lu propose to treat videos as serial fragments satisfying Markov property, and train it as a whole by progressively propagating information through the temporal dimension in multiple steps.

The idea is to treat videos as sequence fragments satisfying Markov properties, and achieve end-to-end training by continuously passing information in multiple steps in the time dimension. In particular, decomposing an entire computational process into multiple sequential progressive steps that satisfy one-way dependencies can reduce resource requirements while ensuring temporal semantic integrity.

The main purpose of the approach is to decompose an integrated computational process into several serially computable parts in order to reduce the computational resource requirements. However, to ensure that the decomposition process does not violate the temporal semantics, the best choice is an equivalent decomposition, i.e., the computational processes before and after the decomposition are identical. To achieve equivalent disassembly, two principles should be followed: one-way dependency and one-way dependency with feedback (i.e., each fragment must be able to update its parameters without feedback from the next fragment).

The figure above shows the comparison between the method in this work and the original convolution operation. The method in this work uses the normal convolution operation in the middle part of each fragment and the Markov convolution operation between each fragment.

The group propose basic, cascade and parallel convolutional operators and evaluate the progressive training method on the large scale action recognition benchmark Kinetics-400 and provide ablations on its subset Mini-Kinetics-200.

The group also empirically demonstrate that the proposed PGT method consistently provides a performance boost by improving SlowOnly network by 3.7 mAP on Charades and 1.9 top-1 accuracy on Kinetics with negligible parameter and computation overhead.

The work was accepted by CVPR21 and could be accessed at [2103.11313v1] PGT: A Progressive Method for Training Models on Long Videos (arxiv.org). The code is available at https://github.com/BoPang1996/PGT


About Professor Lu

Dr. Cewu Lu is a Professor at Shanghai Jiao Tong University (SJTU). Before he joined SJTU, he was a research fellow at Stanford University working under Prof. Fei-Fei Li and Prof. Leonidas J. Guibas. He was a Research Assistant Professor at Hong Kong University of Science and Technology with Prof. Chi Keung Tang. He got his PhD degree from The Chinese Univeristy of Hong Kong, supervised by Prof. Jiaya Jia.

He was selected as one of the Overseas High-Level Young Introduced Talents in 2016, selected as one of China's Top 35 Under 35 Science and Technology Elite (MIT TR35) by MIT Review in 2018, awarded the Quyi Outstanding Young Scholar Award in 2019, honored with the Shanghai Science and Technology Progress Special Award ( ranked third ) in 2020, and published more than 100 articles in Nature, Nature Machine Intelligence, TPAMI, CVPR and other high-ranking journals and conferences with correspondent or first authorship. He was the program chair of CVM 2018, division chair of CVPR 2020, ICCV 2021, and IROS 2021.

Dr. Lu is mainly engaged in computer vision and robotics research, and has achieved several breakthrough research results. He has published open source AI frameworks and datasets with top international level, such as Alphapose (GitHub Star 5000+), HAKE (Human Behavior Engine), and GraspNet (High Performance Robot Grasping System), a real-time human posture estimation system.

 

About SIAS

Shanghai Institute for Advanced Study of Zhejiang University (SIAS) is a jointly launched new institution of research and development by Shanghai Municipal Government and Zhejiang University in June, 2020. The platform represents an intersection of technology and economic development, serving as a market leading trail blazer to cultivate a novel community for innovation amongst enterprises. 

SIAS is seeking top talents working on the frontiers of computational sciences who can envision and actualize a research program that will bring out new solutions to areas include, but not limited to, Artificial Intelligence, Computational Biology, Computational Engineering and Fintech.