| 0 | 0 | 2 |
| 下载次数 | 被引频次 | 阅读次数 |
目的 数字画钟测试是认知障碍早期筛查的关键手段。然而,现有基于图像数据的自动化评分模型普遍精度不高,其“黑箱”特性导致决策过程缺乏临床可解释性。而直接利用大语言模型进行分析虽能增强语义表征,但受限于其固有的“事实幻觉”,难以保障临床评估的可靠性与事实一致性。为应对上述挑战,本研究提出融合幻觉描述校正的画钟测试细粒度智能评分多模态(MHC-CDT)一体化框架,可同步完成智能评分与辅助报告生成任务。方法 首先,创新性地引入临床监督下的语义校正机制,将大语言模型生成的初始文本映射为高置信度的可靠语义特征,以消除事实幻觉导致的负面影响。其次,通过细粒度跨模态注意力机制捕捉图像特征与临床评分指标间的关联,在提升多维度评分任务精度的同时,实现决策的透明化。最后,构建三维联合监督(语法、语义、事实)的报告生成方案,通过多模态前缀注入生成符合临床逻辑的辅助诊断报告。结果 此框架在真实数字画钟测试数据集实验中表现优异,准确率和Macro-F1值分别达到0.74和0.71,显著优于对比基线模型。同时,其生成的报告在事实一致性上高达93.65%,配合可视化的注意力热力图,可有效破解深度学习模型的“黑箱”困境。结论 此框架成功实现从原始信号到多模态评分再到可解释报告的闭环,为数字画钟测试自动评测提供了可靠、可解释的新范式。
Abstract:Objective The digital clock drawing test(dCDT) serves as a crucial tool for early screening of cognitive impairment. However, existing automated scoring models based on image data generally suffer from low accuracy, and their "black-box" nature makes the decision-making process lack clinical interpretability. Although direct analysis using large language models can enhance semantic representation, it is limited by inherent "factual hallucinations", making it difficult to ensure the reliability and factual consistency of clinical assessments. To address the aforementioned challenges, this study proposes an integrated multimodal framework with hallucination description correction for fine-grained intelligent scoring of the clock drawing test(MHC-CDT), which can simultaneously perform intelligent scoring and assist in report generation. Methods Firstly, a clinically supervised semantic correction mechanism is innovatively introduced to map the initial semantic descriptions generated by large language models into high-confidence and reliable semantic features, thereby eliminating the adverse effects caused by factual hallucinations. Secondly, a fine-grained cross-modal attention mechanism is used to capture the correlations between image features and clinical scoring metrics, which not only improves the accuracy of multi-dimensional scoring tasks but also achieves decision-making transparency. Finally, a report generation scheme based on three-dimensional joint supervision(syntax, semantics, and factuality) is constructed, and clinically logical auxiliary diagnostic reports are generated via multimodal prefix injection. Results This framework showed outstanding performance in experiments on a real-world d CDT dataset, achieving an accuracy of 0.74 and a Macro-F1 score of 0.71, which were significantly superior to baseline models. Meanwhile, the factual consistency rate of the generated reports was as high as 93.65%, by combining with the visualized attention heatmaps, it could effectively overcome the "black-box" dilemma of the scoring models. Conclusion This framework successfully establishes a closed loop from raw signals to multimodal scoring and explainable reports, providing a reliable and interpretable paradigm for the automated assessment of the dCDT.
[1]LIVINGSTON G,HUNTLEY J,LIU KY,et al.Dementia prevention,intervention,and care:2024 report of the Lancet standing Commission[J].Lancet,2024,404(10452):572-628.DOI:10.1016/S0140-6736(24)01296-0.
[2]INTERNATIONAL AD.Reducing dementia risk:never too early,never too late[J/OL].(202309-21)[2025-12-3 1].https://www.alzint.org/resource/world-alzheimer-report-2023/.
[3]DUBOIS B,HAMPEL H,FELDMAN HH,et al.Preclinical alzheimer's disease:definition,natural history,and diagnostic criteria[J].Alzheimers Dement,2016,12(3):292-323.DOI:10.1016/j.jalz.2016.02.002.
[4]SHULMAN KI.Clock-drawing:is it the ideal cognitive screening test?[J].Int J Geriatr Psychiatry,2000,15(6):548-561.DOI:10.1002/1099-1166(200006)15:6%3C548::aid-gps242%3e3.0.co;2-u.
[5]ROULEAU I, SALMON DP,BUTTERS N,et al.Quantitative and qualitative analyses of clock drawings in Alzheimer's and Huntington's disease[J].Brain Cogn,1992,18(1):70-87.DOI:10.1016/0278-2626(92)90112-Y.
[6]APRAHAMIAN I,MARTINELLI JE,NERI AL,et al.The clock drawing test:a review of its accuracy in screening for dementia[J].Dement Neuropsychol,2009,3:74-80.DOI:https://doi.org/10.1590/S1980-57642009DN30200002.
[7]SOUILLARD-MANDAR W,DAVIS R,RUDIN C,et al.Learning classification models of cognitive conditions from subtle behaviors in the digital Clock Drawing Test[J].Mach Learn,2016,102(3):393-441.DOI:10.1007/s 10994-015-5529-5.
[8]RIBEIRO MT,SINGH S,GUESTRIN C."Why should I trust you?":explaining the predictions of any classifier[C/OL].Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,NY,USA:Association for Computing Machinery,2016.https://dl.acm.org/doi/10.1145/2939672.2939778.DOI:10.1145/2939672.2939778.
[9]AGBAVOR F,LIANG H.Predicting dementia from spontaneous speech using large language models[J].PLOS Digital Health,2022,1(12):e0000168.DOI:10.1371/journal.pdig.0000168.
[10]ZHANG Y,LI Y,CUI L,et al.Siren's song in the AI ocean:a survey on hallucination in large language models[J].Comput Linguist,2025:1-46.DOI:10.1162/COLI.a.16.
[11]JI Z,LEE N,FRIESKE R,et al.Survey of hallucination in natural language generation[J].ACM Comput Surv,2023,55(12):1-38.DOI:10.1145/3571730.
[12]JING B,XIE P,XING E.On the automatic generation of medical imaging repors[C/OL].Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Melbourne,Australia:Association for Computational Linguistics,2018:2577-2586.https://aclanthology.org/P18-1240/.DOI:10.18653/v1/P18-1240.
[13]SHULMAN KI,SHEDLETSKY R,SILVER IL.The challenge of time:clock-drawing and cognitive function in the elderly[J].Int J Geriatr Psychiatry,1986,1(2):135-140.DOI:10.1002/gps.930010209.
[14]PIERS RJ, DEVLIN KN,NING B,et al.Age and graphomotor decision making assessed with the digital clock drawing test:the framingham heart study[J].J Alzheimers Dis,2017,60(4):1611-1620.DOI:10.3233/JAD-170444.
[15]CHEN S, STROMER D,ALABDALRAHIM HA,et al.Automatic dementia screening and scoring by applying deep learning on clock-drawing tests[J].Sci Rep,2020,10(1):20854.DOI:10.1038/s41598-020-74710-9.
[16]PARK JH.Clock drawing test with convolutional neural networks to discriminate mild cognitive impairment[J].Eur J Psychiatry,2024,38(3):100256.DOI:10.1016/j.ejpsy.2024.100256.
[17]SATO K,NIIMI Y,MANO T,et al.Automated evaluation of conventional clock-drawing test using deep neural network:potential as a mass screening tool to detect individuals with cognitive decline[J].Front Neurol,2022,13:896403.DOI:10.3389/fneur.2022.896403.
[18]ACOSTA JN, FALCONE GJ, RAJPURKAR P, et al.Multimodal biomedical AI[J].Nat Medicine,2022,28(9):1773-1784.DOI:10.1038/s41591-022-01981-2.
[19]HUANG SC,PAREEK A,SEYYEDI S,et al.Fusion of medical imaging and electronic health records using deep learning:a systematic review and implementation guidelines[J].NPJ Digit Med,2020,3(1):136.DOI:10.1038/s41746-020-00341-z.
[20]JONELL P,MO?LL B,H?KANSSON K,et al.Multimodal capture of patient behaviour for improved detection of early dementia:clinical feasibility and preliminary results[J].Front Comput Sci, 2021,3:642633.DOI:10.33 8 9/fc omp.2021.642633.
[21]BANKS R,HIGGINS C,GREENE BR,et al.Clinical classification of memory and cognitive impairment with multimodal digital biomarkers[J].Alzheimers Dement DADM,2024,16(1):e12557.DOI:10.1002/dad2.12557.
[22]FU T,GONZáLEZ M,CONDE J, et al.Have multimodal large language models really learned to tell the time on analog clocks?[J].IEEE Internet Comput,2025,29(4):48-54.DOI:10.1109/MIC.2025.3618144.
[23]SAXENA R,GEMA AP,MINERVINI P.Lost in time:clock and calendar understanding challenges in multimodal LLMs[C/OL].(2025-03-18)[2026-01-08].https://openreview.net/forum?id=5gfC2BmBw6.
[24]ECHTERHOFF JM,LIU Y,ALESSA A,et al.Cognitive bias in decision-making with LLMs[C/OL].Findings of the Association for Computational Linguistics:EMNLP 2024.2024:12640-12653.https://aclanthology.org/2 024.findings-emnlp.73 9/.DOI:10.18 65 3/v 1/2024.findings-emnlp.739.
[25]RUDIN C.Stop explaining black box mach learn models for high stakes decisions and use interpretable models instead[J].Nat mach intell,2019,1(5):206-215.DOI:10.1038/s42256-019-0048-x.
[26]TJOA E,GUAN C.A survey on explainable artificial intelligence(XAI):toward medical XAI[J].IEEE Trans Neural Netw Learn Syst,2021,32(11):4793-4813.DOI:10.1109/TNNLS.2020.3027314.
[27] RADFORD A,KIM JW,HALLACY C, et al.Learning transferable visual models from natural language supervision[C/OL].Proceedings of the 38th International Conference on Mach Learn,2021:8748-8763.https://proceedings.mlr.press/v139/radford21a.html.
[28]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16x16 words:transformers for image recognition at scale[C/OL].International Conference on Learning Representations,2020.https://openreview.net/forum?id=YicbFdNTTy.
[29]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[EB/OL].(2023-08-02)[2025-10-23].https://proceedings.neurips.cc/p aper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
[30]LEE KH,CHEN X,HUA G,et al.Stacked cross attention for image-text matching[C].Computer Vision-ECCV 2018.Cham:Springer International Publishing,2018:212-228.DOI:10.1007/978-3-030-01225-0 13.
[31]SELVARAJU RR,COGSWELL M,DAS A,et al.Grad-CAM:visual explanations from deep networks via gradient-based localization[C/OL].2017 IEEE International Conference on Computer Vision(ICCV),2017.https://ieeexplore.ieee.org/document/823 7336.DOI:10.1109/ICCV.2017.74.
[32]CHEFER H,GUR S,WOLF L.Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers[C].202 1IEEE,2021.DOI:10.1109/ICCV48922.2021.00045.
[33]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C/OL].2016 IEEE Conference on Computer Vision and Pattern Recognition,2016.https://ieeexplore.ieee.org/document/7780459.DOI:10.1109/CVPR.2016.90.
[34]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.DOI:10.103 8/nature14539.
[35]HENDRYCKS D,GIMPEL K.Gaussian error linear units(GELUs)[PP/OL].arXiv:Learning,(2016-06-27)[2025-12-3 1].https://www.semanticscholar.org/paper/Gaussian-Error-Linear-Units-(GELUs)-Hendrycks-Gimpel/de5e7320729f5d3cbb6709eb6329ec41ace-8c95d.
[36]LIN TY,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C/OL].2017 IEEE International Conference on Computer Vision,2017.https://ieeexplore.ieee.org/document/823 7 586.DOI:10.1109/ICCV.2017.324.
[37]LI XL,LIANG P.Prefix-tuning:optimizing continuous prompts for generation[C/OL].Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing,2021.https://aclanthology.org/2021.acl-long.3 53/.DOI:10.1 8653/v1/2021.acl-long.3 53.
[38]WOLF T,DEBUT L,SANH V,et al.Transformers:state-of-the-art natural language processing[C/OL].Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing:SystemDemonstrations.Online:Association for Computational Linguistics,2020.https://aclanthology.org/2020.emnlp-demos.6/.DOI:10.18653/v 1/2020.emnlp-demos.6.
[39]WILLIAMS RJ,ZIPSER D.A learning algorithm for continually running fully recurrent neural networks[J].Neural Comput,1989,1(2):270-280.DOI:10.1162/neco.1989.1.2.270.
[40] RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the limits of transfer learning with a unified text-to-text transformer[J].J Mach Learn Res,2020,21(140):1-67.
[41]MIURA Y,ZHANG Y,TSAI E,et al.Improving factual completeness and consistency of image-to-text radiology report generation[C/OL].Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2021.https://aclanthology.org/202 1.naacl-main.416/.DOI:10.18653/v 1/2021.naacl-main.416.
[42]PASZKE A,GROSS S,MASSA F,et al.PyTorch:an imperative style,high-performance deep learning library[C/OL].Conference on Neural Information Processing Systems,2019.https://proceedings.neurips.cc/paper_files/pape r/2019/hash/b dbca288fee7f92f2bfa9f7012727740-Abstract.html.
[43]KINGMA DP,BA J.Adam:a method for stochastic optimization[C].Proceedings of the 3rd International Conference on Learning Representations(ICLR).2015.
[44]ZHANG T,KISHORE V,WU F,et al.BERTScore:Evaluating Text Generation with BERT[C/OL].Eighth International Conference on Learning Representations,2020.https://iclr.cc/virtual_2020/poster SkeHuCVFDr.html.
[45] LIN CY.ROUGE:A package for automatic evaluation of summaries[C/OL].Text Summarization Branches Out.Barcelona,Spain:Association for Computational Linguistics,2004:74-81.https://aclanthology.org/W04-1013/.
[46]SOKOLOVA M,LAPALME G.A systematic analysis of performance measures for classification tasks[J].Inform Process Manag,2009,45(4):427-437.DOI:10.1016/j.ipm.2009.03.002.
[47]BRODERSEN KH,ONG CS,STEPHAN KE,et al.The balanced accuracy and its posterior distribution[C/OL].2010 20th International Conference on Pattern Recognition,2010.http://ieeexplore.ieee.org/document/5597285/.DOI:10.1109/ICPR.2010.764.
[48]IRVIN J, RAJPURKAR P, KO M, et al.CheXpert:a large chest radiograph dataset with uncertainty labels and expert comparison[C].Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):590-597.DOI:10.1609/aaai.v33i01.3301590.
[49]MAATEN L VAN DER,HINTON G.Visualizing data using t-SNE[J].J Mach Learn Res,2008,9(86):2579-2605.
[50]ROUSSEEUW PJ.Silhouettes:a graphical aid to the interpretation and validation of cluster analysis[J].J Comput Appl Math,1987,20:53-65.DOI:10.1016/0377-0427(87)90125-7.
[51]BEEDE E,BAYLOR E,HERSCH F,et al.A human-centered evaluation of a deep learning system deployed in clinics for the detection of diabetic retinopathy[C/OL].Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems,2020.https://doi.org/10.1145/3 313 831.3376718.DOI:10.1145/3 313 831.3376718.
[52]WIEGREFFE S,PINTER Y.Attention is not not Explanation[C/OL].Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP),2019.https://aclanthology.org/D19-1002/.DOI:10.18653/v 1/D19-1002.
基本信息:
中图分类号:R741;TP391.1;TP18
引用信息:
[1]刘丁溢,郁磊.MHC-CDT:融合幻觉描述校正的数字画钟测试细粒度智能评分多模态框架[J].航天医学与医学工程().
基金信息:
山西省医学重点科研项目重点攻关专项(2022XM39)
2026-06-23
2026-06-23
2026-06-23