This methodological research focused on the development and psychometric evaluation of questionnaires. The research phases were conducted in accordance with the framework proposed by Boateng et al. (2018), which includes the phases of item development, questionnaire development, and evaluation [19]. The study was conducted in Kashan from 2024 to 2025.
Phase 1: item developmentIdentification of domain and item generation
A systematic review of the scientific literature was conducted using an operational definition of first aid health literacy related to older adults, focusing on common physical traumas such as falls, burns, injuries, and poisoning. Resources were searched across international databases (PubMed, Web of Science, Scopus, Cochrane, and Google Scholar) and Persian databases (Magiran, SID, and IranDoc) using keywords, including “first aid”, “older adults”, “health literacy”, and “questionnaire”, and their synonyms. After analyzing the findings, an initial set of items and an item blueprint were developed.
Content validity was assessed both qualitatively and quantitatively through the participation of 10 experts in nursing, geriatrics, and trauma, all of whom were familiar with psychometric processes. To qualitatively evaluate content validity, experts provided feedback on the clarity, simplicity, grammar, scoring, and relevance of the content of each item [20]. To quantitatively evaluate content validity, the Content Validity Ratio (CVR) was calculated and interpreted according to Lawshe’s table (>0.79) [21], and the Content Validity Index (CVI) was computed following Waltz and Bausell’s criteria (>0.62) [22]. The Modified Kappa Statistic was calculated and interpreted based on Polit and Beck’s guidelines (>0.6) [23].
Phase 2: questionnaire developmentPre-testing of questions (face validity)
Face validity was evaluated both qualitatively and quantitatively with the participation of 10 older adults of varying ages, genders, and education levels. To qualitatively evaluate face validity, participants provided feedback on the clarity and comprehensibility of the items [24]. To quantitatively evaluate face validity, the item impact was calculated using a 5-point Likert questionnaire. Item impacts above 1.5 were considered acceptable [25].
Sampling and survey administration
The target population consisted of all older adults aged 65 and older who were registered with the comprehensive urban health centers in Kashan, Iran. The expert recommendations that a minimum sample size of 400 is sufficient for ensuring validity in psychometric studies [24]. As such, a multi-stage random sample of 400 older adults was obtained from comprehensive urban health centers. Inclusion criteria were age 65 years or older, Iranian citizenship, residency in Kashan, no history of mental illness (confirmed via medical records and self-report), no cognitive impairment (score of 8 or higher on the AMT), ability to communicate verbally, and informed consent to participate. The exclusion criterion was incomplete questionnaires.
Following the initial development and validity assessment of the questionnaire, an official introduction letter for sampling was obtained from the Vice Chancellor for Research and Technology at Kashan University of Medical Sciences. The first author conducted the multi-stage sampling between August 2024 and January 2025. Kashan was divided into three socioeconomic strata (upper, middle, and lower). Four health centers were randomly selected from each stratum.
Based on the sample size and the older population covered by each health center, eligible older adults were randomly selected from the Integrated Health System (SIB). If an older adult declined participation, another was randomly selected from the same stratum until the desired sample size was reached. Data collection questionnaires included a background information questionnaire, the Persian version of the Abbreviated Mental Test (AMT), and the Health Literacy Instrument for Adults-Short Form (HELIA-SF). These data were collected from the records of older adults in the SIB and through direct interviews conducted at the health centers.
The background questionnaire collected information such as age, gender, education level, marital status, job, economic status, living arrangement, and presence of comorbidities. The qualitative content validity of this questionnaire was confirmed by 10 faculty members of Kashan University of Medical Sciences.
The Persian version of AMT was introduced in Iran by Bakhtiyari et al. (2014). This test consists of 10 items that assess key areas, including important dates, time and year, notable personalities, occupation, place names, counting, and addressing. Each item is scored either zero or one, with a total score of 8 or higher indicating the absence of cognitive disorders. The Persian version of the test demonstrated a Cronbach’s alpha of 0.76 and an intra-class correlation coefficient of 0.89 [26]. This test has been used as one of the inclusion criteria for cognitive impairment screening in our study.
The HELIA-SF was also adapted in Iran by Tavousi et al. (2022) to measure adult health literacy. It consists of two domains: Basic Skills (5 items) and Decision-Making Skills (4 items). Each item is rated on a 5-point Likert questionnaire, with higher scores indicating greater health literacy. The questionnaire’s face, content, and construct validity have been confirmed. Its reliability was reported with a Cronbach’s alpha of 0.91 and an intra-class correlation of 0.81 [27]. In our study, the HELIA-SF was used to examine the convergent validity of the first aid health literacy assessment questionnaire for older adults, with a reported reliability of 0.79.
Item reduction
The correlation coefficients between each item and the total questionnaire, as well as between items, were calculated. Items with correlation coefficients below 0.30 or above 0.70 were considered candidates for deletion [24, 28].
Extraction of factors
The adequacy of the sample was evaluated using the Kaiser–Meyer–Olkin (KMO) measure and Bartlett’s test of sphericity. Due to the binary nature of the response options (true/false), a polychoric correlation matrix was utilized. Principal component analysis (PCA) served as a preliminary item-reduction procedure to summarize the total variance, employing equamax rotation. The number of components was determined based on eigenvalues greater than one, the scree plot, and parallel analysis. A minimum acceptable component loading of 0.40 was established.
Phase 3: questionnaire evaluationTest of dimensionality
This assessment was conducted using convergent validity [24]. All participants completed the HELIA-SF [27] concurrently to serve as a comparative measure against the first aid health literacy assessment questionnaire for older adults, which focuses on a more specific domain. The Pearson correlation coefficient was calculated to examine convergent validity with HELIA-SF.
Tests of reliability
The reliability of the questionnaire was evaluated through assessments of internal consistency and stability. Internal consistency was determined using the Kuder-Richardson coefficient (KR-21), with coefficients exceeding 0.70 deemed acceptable. It is important to note that, since KR-21 assumes a roughly equal level of item difficulty, the true reliability may be lower than reported and could be underestimated if item difficulties vary. This standard aligns with initial validation processes for brief dichotomous questionnaires [29]. To assess test-retest reliability, 20 older adults completed the questionnaire on two separate occasions, with a two-week interval between tests. The absolute-agreement intraclass correlation coefficient (ICC) was calculated using a two-way random model, with ICC values of 0.60 or greater considered satisfactory [30]. Absolute stability was evaluated using the standard error of measurement (SEM) and the minimal detectable change (MDC), taking into account the number of items and the response format.
Tests of validity
Ceiling and floor effects were examined by analyzing the distribution of scores. If more than 15% of participants obtained the maximum or minimum possible score, a ceiling or floor effect was considered [31].
Data analysis
All analyses were performed using JASP version 0.19.3. The normality of the data distribution was assessed using skewness and kurtosis indices, with values within the range of ± 2 indicating a normal distribution. A significance level of p < 0.05 was used in all statistical tests.
Ethical considerations
The present study was approved by the Ethics Committee (Code of Ethics IR.KAUMS.NUHEPM.REC.1403.011, dated May 5, 2024) and the Research Council (Research Code 403016) of Kashan University of Medical Sciences. The researchers adhered to the ethical principles of the Declaration of Helsinki. All participants were informed of the research objectives, the confidentiality of information, and the voluntary nature of their participation, and written informed consent was obtained from each of them.
