Li, Y. et al. Prevalence and trends in diagnosed ADHD among US children and adolescents, 2017–2022. JAMA Netw. Open 6, e2336872 (2023).
American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders 5th edn (American Psychiatric Publishing, 2013).
Shaw, M. et al. A systematic review and analysis of long-term outcomes in attention deficit hyperactivity disorder: effects of treatment and non-treatment. BMC Med. 10, 99 (2012).
Hamed, A. M., Kauer, A. J. & Stevens, H. E. Why the diagnosis of attention deficit hyperactivity disorder matters. Front. Psychiatry 6, 167576 (2015).
McGoey, K. E., Eckert, T. L. & Dupaul, G. J. Early intervention for preschool-age children with ADHD: a literature review. J. Emot. Behav. Disord. 10, 14–28 (2002).
DuPaul, G. J., Kern, L., Gormley, M. J. & Volpe, R. J. Early intervention for young children with ADHD: academic outcomes for responders to behavioral treatment. School Ment. Health 3, 117–126 (2011).
Sonuga-Barke, E. J., Koerting, J., Smith, E., McCann, D. C. & Thompson, M. Early detection and intervention for attention-deficit/hyperactivity disorder. Expert. Rev. Neurother. 11, 557–563 (2011).
Long, N. & Coats, H. The need for earlier recognition of attention deficit hyperactivity disorder in primary care: a qualitative meta-synthesis of the experience of receiving a diagnosis of ADHD in adulthood. Fam Pract. 39, 1144–1155 (2022).
Shephard, E. et al. Systematic review and meta-analysis: the science of early-life precursors and interventions for attention-deficit/hyperactivity disorder. J. Am. Acad. Child Adolesc. Psychiatry 61, 187–226 (2022).
Foy, J. M. & Earls, M. F. A process for developing community consensus regarding the diagnosis and management of attention-deficit/hyperactivity disorder. Pediatrics 115, e97–e104 (2005).
Klein, R. G. et al. Clinical and functional outcome of childhood attention-deficit/hyperactivity disorder 33 years later. Arch. Gen. Psychiatry 69, 1295–1303 (2012).
Du Rietz, E. et al. Trajectories of healthcare utilization and costs of psychiatric and somatic multimorbidity in adults with childhood ADHD: a prospective register-based study. J. Child Psychol. Psychiatry 61, 959–968 (2020).
Rocco, I., Corso, B., Bonati, M. & Minicuci, N. Time of onset and/or diagnosis of ADHD in European children: a systematic review. BMC Psychiatry 21, 575 (2021).
Boulton, K. A. et al. Diagnostic delay in children with neurodevelopmental conditions attending a publicly funded developmental assessment service: findings from the Sydney Child Neurodevelopment Research Registry. BMJ Open 13, e069500 (2023).
Knott, R. et al. Age at diagnosis and diagnostic delay across attention-deficit hyperactivity and autism spectrums. Aust. N. Z. J. Psychiatry 58, 142–151 (2024).
Visser, S. N. et al. Trends in the parent-report of health care provider-diagnosed and medicated attention-deficit/hyperactivity disorder: United States, 2003–2011. J. Am. Acad. Child Adolesc. Psychiatry 53, 34–46.e2 (2014).
Murray, A. L. et al. Sex differences in ADHD trajectories across childhood and adolescence. Dev. Sci. 22, e12721 (2019).
Morgan, P. L., Hillemeier, M. M., Farkas, G. & Maczuga, S. Racial/ethnic disparities in ADHD diagnosis by kindergarten entry. J. Child Psychol. Psychiatry 55, 905–913 (2014).
Holland, J. & Sayal, K. Relative age and ADHD symptoms, diagnosis and medication: a systematic review. Eur. Child Adolesc. Psychiatry 28, 1417–1429 (2019).
Stevens, T., Peng, L. & Barnard-Brak, L. The comorbidity of ADHD in children diagnosed with autism spectrum disorder. Res. Autism Spectr. Disord. 31, 11–18 (2016).
Morgan, P. L., Staff, J., Hillemeier, M. M., Farkas, G. & Maczuga, S. Racial and ethnic disparities in ADHD diagnosis from kindergarten to eighth grade. Pediatrics 132, 85–93 (2013).
Hill, E. D. et al. Prediction of mental health risk in adolescents. Nat. Med. 31, 1840–1846 (2025).
Alam, S., Raja, P. & Gulzar, Y. Investigation of machine learning methods for early prediction of neurodevelopmental disorders in children. Wirel. Commun. Mob. Comput. 2022, 5766386 (2022).
de Lacy, N. et al. Predicting individual cases of major adolescent psychiatric conditions with artificial intelligence. Transl. Psychiatry 13, 314 (2023).
Birkhead, G. S., Klompas, M. & Shah, N. R. Uses of electronic health records for public health surveillance to advance public health. Annu. Rev. Public Health 36, 345–359 (2015).
Yang, S., Varghese, P., Stephenson, E., Tu, K. & Gronsbell, J. Machine learning approaches for electronic health records phenotyping: a methodical review. J. Am. Med. Inform. Assoc. 30, 367–381 (2022).
Solares, J. R. A. et al. Deep learning for electronic health records: a comparative review of multiple deep neural architectures. J. Biomed. Inform. 101, 103337 (2020).
Engelhard. M. M. et al. Predictive Value of early autism detection models based on electronic health record data collected before age 1 year. JAMA Netw. Open 6, e2254303 (2023).
Chen, J. et al. Enhancing early autism prediction based on electronic records using clinical narratives. J. Biomed. Inform. 144, 104390 (2023).
Wang, B. et al. Prediction of early-onset bipolar using electronic health records. J. Child Psychol. Psychiatry 66, 1141–1154 (2025).
Roche, D., Mora, T. & Cid, J. Identifying non-adult attention-deficit/hyperactivity disorder individuals using a stacked machine learning algorithm using administrative data population registers in a universal healthcare system. JCPP Adv. 4, e12193 (2024).
Garcia-Argibay, M. et al. Predicting childhood and adolescent attention-deficit/hyperactivity disorder onset: a nationwide deep learning approach. Mol. Psychiatry 28, 1232–1239 (2023).
Steinberg, E. et al. Language models are an effective representation learning technique for electronic health record data. J. Biomed. Inform. 113, 103637 (2021).
Li, Y. et al. BEHRT: transformer for electronic health records. Sci. Rep. 10, 7155 (2020).
Goldstein, B. A., Navar, A. M., Pencina, M. J. & Ioannidis, J. P. A. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 24, 198–208 (2017).
Dey, T. et al. Survival analysis—time-to-event data and censoring. Nat. Methods 19, 906–908 (2022).
Shi, Y. et al. Racial disparities in diagnosis of attention-deficit/hyperactivity disorder in a US national birth cohort. JAMA Netw. Open 4, e210321 (2021).
Schober, P. & Vetter, T. R. Survival analysis and interpretation of time-to-event data: the tortoise and the hare. Anesth. Analg. 127, 792–798 (2018).
Loh, D. R., Hill, E. D., Liu, N., Dawson, G. & Engelhard, M. M. Limitations of binary classification for long-horizon diagnosis prediction and advantages of a discrete-time time-to-event approach: empirical analysis. JMIR AI 4, e62985 (2025).
Engelhard, M. & Henao, R. Disentangling whether from when in a neural mixture cure model for failure time data. Proc. Mach. Learn. Res. 151, 9571–9581 (2022).
World Health Organization. International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Vol. 1 (World Health Organization, 1992).
Geifman, Y. & El-Yaniv, R. Selective classification for deep neural networks. In Proc. 31st Int. Conf. NeurIPS 4885–4894 (2017).
Pessach, D. & Shmueli, E. A review on fairness in machine learning. ACM Comput. Surv. 55, 51:1–51:44 (2022).
Korrel, H., Mueller, K. L., Silk, T., Anderson, V. & Sciberras, E. Research review: language problems in children with attention-deficit hyperactivity disorder–a systematic meta-analytic review. J. Child Psychol. Psychiatry 58, 640–654 (2017).
Antshel, K. M. & Russo, N. Autism spectrum disorders and ADHD: overlapping phenomenology, diagnostic issues, and treatment considerations. Curr. Psychiatry Rep. 21, 34 (2019).
D’Agati, E., Curatolo, P. & Mazzone, L. Comorbidity between ADHD and anxiety disorders across the lifespan. Int. J. Psychiatry Clin. Pract. 23, 238–244 (2019).
Frazier, T. W., Youngstrom, E. A., Glutting, J. J. & Watkins, M. W. ADHD and achievement: meta-analysis of the child, adolescent, and adult literatures and a concomitant study with college students. J. Learn. Disabil. 40, 49–65 (2007).
Engelhard, M. M. et al. Health system utilization before age 1 among children later diagnosed with autism or ADHD. Sci. Rep. 10, 17677 (2020).
Gruschow, S. M., Yerys, B. E., Power, T. J., Durbin, D. R. & Curry. A. E. Validation of the use of electronic health records for classification of ADHD status. J. Atten. Disord. 23, 1647–1655 (2019).
Bannett, Y. et al. ADHD diagnosis and timing of medication initiation among children aged 3 to 5 years. JAMA Netw. Open 8, e2529610 (2025).
Huang, K.-L. et al. Factors affecting delayed initiation and continuation of medication use for attention-deficit/hyperactivity disorder: a nationwide study. J. Child Adolesc. Psychopharmacol. 31, 197–204 (2021).
Sibley, M. H. et al. Variable patterns of remission from adhd in the multimodal treatment study of ADHD. Am. J. Psychiatry 179, 142–151 (2022).
Prasad, V. et al. Use of healthcare services before diagnosis of attention-deficit/hyperactivity disorder: a population-based matched case-control study. Arch. Dis. Child. 109, 46–51 (2024).
Stolte, A. et al. Using electronic health records to understand the population of local children captured in a large health system in Durham County, NC, USA, and implications for population health research. Soc. Sci. Med. 296, 114759 (2022).
Hurst, J. H. et al. Development of an electronic health records datamart to support clinical and population health research. J. Clin. Transl. Sci. https://doi.org/10.1017/cts.2020.499 (2020).
Shi, Y. et al. Utility of medical record diagnostic codes to ascertain attention-deficit/hyperactivity disorder and learning disabilities in populations of children. BMC Pediatr. 20, 510 (2020).
Richesson, R. L. et al. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. J. Am. Med. Inform. Assoc. 20, e226–e231 (2013).
RxNorm (US National Library of Medicine, 2023); https://www.nlm.nih.gov/research/umls/rxnorm/index.html
LOINC (Regenstrief Institute, 2023); https://loinc.org/
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems (eds Guyon, I. et al.) 5998–6008 (Curran Associates, 2017).
Hoffmann, J. et al. Training compute-optimal large language models. In Proc. Adv. NeurIPS 35 (2022).
Xiong, R. et al. On layer normalization in the transformer architecture. In Proc. 37th International Conference on Machine Learning 10524–10533 (2020).
Shazeer, N. GLU variants improve transformer. Preprint at https://doi.org/10.48550/arXiv.2002.05202 (2020).
Su, J. et al. RoFormer: enhanced transformer with rotary position embedding. Neurocomputing 568, 127063 (2024).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers) (eds Burstein, J. et al.) 4171–4186 (2019).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (2019).
Pascanu, R., Mikolov, T. & Bengio, Y. On the difficulty of training recurrent neural networks. Proc. Mach. Learn. Res. 28, 1310–1318 (2013).
Lee, C., Zame, W., Yoon, J. & Van Der Schaar, M. Deephit: a deep learning approach to survival analysis with competing risks. Proc. AAAI Conf. Artif. Intell. 32, 11842 (2018).
Kvamme, H. & Borgan, Ø. Continuous and discrete-time survival prediction with neural networks. Lifetime Data Anal. 27, 710–736 (2021).
Cox, D. R. Regression models and life-tables. J. R. Stat. Soc. B 34, 187–202 (1972).
Wei, L.-J. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat. Med. 11, 1871–1879 (1992).
Suresh, K., Severn, C. & Ghosh, D. Survival prediction models: an introduction to discrete-time modeling. BMC Med. Res. Methodol. 22, 207 (2022).
Liu, S.-Y. et al. DoRA: weight-decomposed low-rank adaptation. In Proc. 41st International Conference on Machine Learning 235, 32100–32121 (2024).
Bergstra, J. & Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012).
Antolini, L., Boracchi, P. & Biganzoli, E. A time-dependent discrimination index for survival data. Stat. Med. 24, 3927–3944 (2005).
Haider, H., Hoehn, B., Davis, S. & Greiner, R. Effective ways to build and evaluate individual survival distributions. J. Mach. Learn. Res. 21, 1–63 (2020).
Austin, P. C. & Steyerberg, E. W. The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models. Stat. Med. 38, 4051–4065 (2019).
Graf, E., Schmoor, C., Sauerbrei, W. & Schumacher, M. Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 18, 2529–2545 (1999).
Altman, D. G. Practical Statistics for Medical Research (Chapman and Hall/CRC, 1990).
Efron, B. in Breakthroughs in Statistics (eds Kotz, S. & Johnson, N. L.) 569–593 (Springer, 1992); https://doi.org/10.1007/978-1-4612-4380-9_41
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. Adv. NeurIPS 30, 4766–4777 (2017).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Proc. Adv. NeurIPS 32, 8024–8035 (2019).
Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: a next-generation hyperparameter optimization framework. In Proc. KDD 2623–2631 (2019).
Kokhlikyan, N. et al. Captum: a unified and generic model interpretability library for PyTorch. Preprint at https://doi.org/10.48550/arXiv.2009.07896 (2020).
