In this paper, we address a central challenge in tabular missing‑data imputation: explicitly identifying and exploiting interdependencies among features to improve reconstruction quality. Current state‑of‑the‑art methods mostly model similarity or interdependence between samples. However, our experiments on real-world tabular datasets show that, when samples are truly independent, building such observation‑level graphs yields only marginal and dataset‑specific performance gains, rather than consistent and generalisable benefits. We therefore introduce the Bipartite and Complete Directed Graph Neural Network (BCGNN). In BCGNN, observations and features are treated as two distinct node types, and each observed cell value is converted into an attributed edge connecting them. The bipartite component inductively learns node embeddings by fully leveraging the information encoded in these attributed edges, while the complete directed graph component explicitly describes and propagates intricate feature–feature dependencies. The combined graph furnishes a robust inductive framework for representation learning while explicitly parameterising higher‑order dependencies among features. Across diverse missing mechanisms, BCGNN outperforms leading imputation baselines, achieving an average 15% reduction in mean absolute error. Extensive experiments confirm that a deeper understanding of feature interdependence markedly enhances embedding quality. BCGNN also delivers superior performance on downstream label‑prediction tasks with missing inputs and demonstrates robust generalisation to unseen data.
谌自奇,华东师范大学统计学院研究员,紫江青年学者,博士生导师。博士毕业于东北师范大学,曾在美国安德森癌症研究中心生物统计系从事博士后研究工作。研究兴趣包含高维统计、因果结构学习、机器学习、生物医学统计等。以第一或通讯作者在JASA、Biometrics、NeurIPS、KDD、AAAI、Neurocomputing等国际权威统计或者计算机期刊(会议)上发表论文20多篇。主持国家自然科学基金面上项目2项、国家自然科学基金重点项目(子课题)1项,国家自然科学基金青年项目1项等,作为骨干力量参与国家重点研发计划和上海市“科技创新行动计划”基础研究领域应用数学重点项目。