北京交通大学教师名录

基本信息

办公电话：	电子邮件： xwanru@bjtu.edu.cn
通讯地址：土建424	邮编：100044

研究兴趣

研究兴趣包括计算机视觉、深度学习、视频理解、行为识别、视频描述等；

招生信息：欢迎对以上方向感兴趣的同学加入

现招收2026级硕士研究生（包括保研和考研）及博士研究生，

招收专业：计算机科学与技术、计算机技术、人工智能、大数据技术与工程、新一代电子信息技术、软件工程。

工作经历

2022/01 -至今，北京交通大学计算机与信息技术学院，信息科学研究所，副教授

2020/07 - 2021/12，北京交通大学计算机与信息技术学院，信息科学研究所，讲师

2018/07 - 2020/07，北京交通大学计算机与信息技术学院，博士后，合作导师：于剑

2016/10 - 2017/10，美国伦斯勒理工学院，访问学者，合作导师：Qiang Ji

教育背景

2011/09 - 2018/07，北京交通大学，信号与信息处理，博士

2007/09 - 2011/07，北京交通大学，生物医学工程，学士

研究方向

软件工程
新一代电子信息技术
人工智能
大数据技术与工程
机器学习与认知计算
智能感知与具身智能
计算机技术

科研项目

自然科学横向项目: 基于知识图谱和LLM的智能问答模型, 2025-2026
国家自然科学基金“面上”: “舞谱与舞蹈”的智能化双向生成与评价方法研究, 2026-2029
小样本条件下动车组故障动态图像高质量恢复与检测技术研究
自然科学横向项目: 钻井数据人工智能分析与计算机建模, 2025-2025
自然科学横向项目: 基于大模型的智能体育视频语义理解系统, 2025-2026
自然科学横向项目: AI应用调研与基于AI大模型的视频修复方法研究, 2025-2028
自然科学横向项目: 地热发电业务钻井数据人工智能分析与计算机建模外协服务, 2024-2024
红果园国家级"四总部": 动态航空场景下的跨域小样本复杂时序异常检测方法研究, 2024-2025
自然科学横向项目: 中小学智慧教学分析与评价模型的计算测试服务, 2024-2026
自然科学横向项目: 基于重点内容和账号多维的风险预警模型, 2024-2024
自然科学横向项目: 多源多模态数据融合算法开发, 2024-2027
基础研究项目: 舞谱-舞蹈的智能生成及动作质量评价方法研究, 2024-2026
北京市自然基金“面上”: 跨域小样本视频描述研究, 2024-2026
红果园国家级"四总部": 多视觉任务协同感知的飞行员功能状态分析技术研究, 2023-2025
重大资助项目: 开放环境下无人车自适应感知研究, 2023-2025
自然科学横向项目: 智慧数据采集与分析服务采购项目, 2022-2024
国家重点研发计划-项目: 宽视场高分辨视频内容感知与高效编码, 2022-2025
基础研究项目: 面向儿童孤独症谱系障碍辅助评估的个性化视线估计研究, 2022-2024
其它（科技处）: 基于多尺度图卷积和强化学习的拉班舞谱自动生成与评测研究, 2022-2026
其它（科技处）: 面向单目RGB图像的抗遮挡6D物体位姿估计研究, 2021-2023
自然科学横向项目: 文化艺术视频画面恢复处理技术的研究与实践, 2021-2026
自然科学类人才基金项目: 多模态信息补充的视频描述修正方法研究, 2021-2024
国家自然科学基金“面上”: 异质知识增强的视频精细化描述研究, 2021-2024
国家自然科学基金"青年基金": 认知启发的视频描述与修正关键技术研究, 2021-2023
自然科学横向项目: 基于拉班舞谱的民族民间动态文化资源抢救交互平台研制, 2019-2024
其它: 视频监控中基于深度强化学习的人体行为定位方法研究, 2019-2020

招生专业

计算机科学与技术硕士
计算机科学与技术博士
计算机技术硕士
人工智能硕士
新一代电子信息技术（含量子技术等）硕士
大数据技术与工程硕士
软件工程硕士
新一代电子信息技术（含量子技术等）博士
计算机技术博士
人工智能博士
软件工程博士

教学工作

论文/期刊

He W, Xu W, Guo P, et al.“InstructStep: Fine-Grained Localization of Step Content and Relation in Instructional Video”.Proceedings of the 33th ACM International Conference on Multimedia (ACMMM) (2025). （CCF A）
Chen X, Cen Y, Xu W,et al.“Hierarchical Meta-prototypes Network for Few-shot Action Recognition”.Proceedings of the 33th ACM International Conference on Multimedia (ACMMM) (2025). （CCF A）
Zhang L, Tian Y, Wang X, Xu W,et al.“Differential Contrastive Training for Gaze Estimation”.Proceedings of the 33th ACM International Conference on Multimedia (ACMMM) (2025). （CCF A）
Zhang G, Kan S, Zhang F, Xu W,et al. “Noise-Guided Predicate Representation Extraction and Diffusion-Enhanced Discretization for Scene Graph Generation”.International Conference on Machine Learning (ICML) (2025). （CCF A）
Liu Y, Xu W, Miao Z, et al. Natural Cognizing Video: A Decoupling and Integration Network for General Event Boundary Captioning, IEEE Transactions on Multimedia（TMM）, 2025
Chen X, Xu W, Kan S, Cen Y, et al. Vision-Semantics-Label: A New Two-step Paradigm for Action Recognition with Large Language Model, IEEE Transactions on Circuits and Systems for Video Technology(TCSVT), 2025
Tian Y, Wang X, Zhang S, Xu W, et al. ‘Disengage AND Integrate’: Personalized Causal Network for Gaze Estimation[J]. IEEE Transactions on Image Processing（TIP）, 2025
Xu W, Xu Y, Miao Z, et al. "CroCaps: A CLIP-assisted cross-domain video captioner." Expert Systems with Applications 268 (2025): 126296.
Xu Y, Xu W, and Miao Z. "Counterfactual contrastive learning for weakly supervised temporal sentence grounding." Neurocomputing 624 (2025): 129508.

Xu W, Miao Z, Tian Y,et al. Probabilistic Distillation Transformer: Modelling Uncertainties for Visual Abductive Reasoning[C]//Proceedings of the 32nd ACM International Conference on Multimedia. 2024: 8865-8873. （CCF A）

Xu W, Miao Z, Yu J, et al. Boosting Semi-Supervised Video Captioning via Learning Candidates Adjusters[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024.

Xu W, Miao Z, Yu J, et al. Bridging Video and Text: A Two-Step Polishing Transformer for Video Captioning[J]. IEEE Transactions on Circuits and Systems for Video Technology(TCSVT), 32 (2022): 6293-6307.

Tian Y., Zhang Y., Huang Y., Xu W., & Ding, Z. (2022). Differential Refinement Network for Zero-Shot Learning. IEEE transactions on neural networks and learning systems.

Xie N., Miao Z., Zhang X., Xu W., Li M, & Wang, J. (2022). Sequential Gesture Learning for Continuous Labanotation Generation Based on the Fusion of Graph Neural Networks. IEEE Transactions on Circuits and Systems for Video Technology(TCSVT), 32, 3722-3734.

Li M, Miao Z, Zhang X, Xu W, Ma C., & Xie N. (2022). Rhythm-Aware Sequence-to-Sequence Learning for Labanotation Generation With Gesture-Sensitive Graph Convolutional Encoding. IEEE Transactions on Multimedia(TMM), 24, 1488-1502.

Xu W, Yu J, Miao Z, et al. Deep Reinforcement Polishing Network for Video Captioning[J]. IEEE Transactions on Multimedia(TMM), 2021,23: 1772-1784.

Xu W, Miao Z, Yu J, et al. Deep reinforcement learning for weak human activity localization[J]. IEEE Transactions on Image Processing(TIP), 2020, 29: 1522-1535.

Xu W, Yu J, Miao Z, et al. Spatio-temporal deep Q-networks for human activity localization[J]. IEEE Transactions on Circuits and Systems for Video Technology(TCSVT), 2020,30(9):2984–2999.

Xu W, Yu J, Miao Z, et al. Prediction-CGAN: Human Action Prediction with Conditional Generative Adversarial Networks[C], In Proceedings of the 27th ACM International Conference on Multimedia (ACM MM), 2019, 611–619. （CCF A）

Rui Zhao, Xu W, Hui Su, Qiang Ji, Bayesian Hierarchical Dynamic Model for Human Action Recognition, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA,2019.6.16-2019.6.20. （CCF A）

Xu W, Miao Z, Yu J, et al. Action recognition and localization with spatial and temporal contexts[J]. Neurocomputing, 2019, 333: 351-363.

Xu W, Miao Z, Zhang X P, et al. A Hierarchical Spatio-Temporal Model for Human Activity Recognition[J]. IEEE Transactions on Multimedia(TMM), 2017, 19(7): 1494-1509.

Xu W, Miao Z, Tian Y. A Novel Mid-Level Distinctive Feature Learning for Action Recognition via Diffusion Map[J]. Neurocomputing, 2016, 218:185-196.

Xu W, Miao Z, Zhang Q. Projection transform on spatio-temporal context for action recognition[J]. Multimedia Tools & Applications, 2015, 74(18):7711-7728.

Xu W, Miao Z, Zhang X P, et al. Learning a hierarchical spatio-temporal model for human activity recognition[C], 24th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New orleans, USA, 2017.3.05-2017.3.09.

Xu W, Miao Z, Zhang X P. Structured feature-graph model for human activity recognition[C], IEEE International Conference on Image Processing (ICIP), Quebec city, Canada, 2015.9.27-2015.9.30.

Xu W, Miao Z, Zhang J, et al. Learning Spatio-Temporal Features for Action Recognition with Modified Hidden Conditional Random Field[C], 13th European Conference on Computer Vision (ECCV workshop), Zurich, Switzerland, 2014.9.06-2014.9.12.

Xu W，Miao Z, Zhang J, et al. Spatial-Temporal Context for Action Recognition Combined with Confidence and Contribution Weight[C], 2th IAPR Asian Conference on Pattern Recognition (ACPR2013), Okinawa, Japan, 2013.11.05-2013.11.08.

Tian Y, Ruan Q, An G, Xu W. Context and locality constrained linear coding for human action recognition[J]. Neurocomputing, 2015, 167:359-370.

Hao S, Miao Z, Wang J,Xu W, et al. Labanotation generation based on bidirectional gated recurrent units with joint and line features[C]//2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019: 4265-4269.

Yang J, Wan L, Xu W, et al. 3D human pose estimation from a single image via exemplar augmentation[J]. Journal of Visual Communication and Image Representation, 2019, 59: 371-379.

An R, Miao Z, Li Q, Xu W, et al. Spatiotemporal visual-semantic embedding network for zero-shot action recognition[J]. Journal of Electronic Imaging, 2019, 28(2): 023007.

专著/译著

专利

苗振江,许万茹,张强,刘汝杰,基于结构化的特征图的人体行为识别方法, 2018.08.31，中国, ZL201510126019.5

苗振江,张强,许万茹,一种基于潜在空间平滑自表征的子空间聚类方法,2018.11.13,中国, ZL201410828113.0

苗振江,胡碧莹,张强,许万茹,刘汝杰,一种基于道路监控视频的车辆速度检测方法, 2017.07.18,中国，ZL201310503592.4

许万茹