The cultural significance of Fujian Tulou, inscribed on the World Heritage List as “a unique example of mountainous rammed-earth architecture”, has repeatedly become a point of contention between design teams and local communities during the development of metaverse cultural tourism scenarios. Technical teams tend to emphasize architectural forms and defensive marvels as visual selling points. The local governments try to focus on the economic benefits brought by the “World Heritage” brand. On the other hand, local Hakka clan members insist on the spiritual core of “ancestral teachings-feng shui-family”28,29. Existing research often oversimplifies Tulou’s cultural value as “circular Tulou = Hakka symbol,” which may lack a comprehensive evaluation system to balance academic depth and practicality. Through using the Yongding Tulou cluster and Nanjing Tianluokeng’s “Four Dishes and One Soup” as key case studies, the study tries to employ a four-dimensional framework of “belief-ritual-narrative-emotion” to reinterpret Tulou’s “significance” from architectural spectacle to an accessible, experiential, and reproducible spiritual landscape. The aim is to search for answers. What could constitute the core significance of Tulou? How is this significance reinterpreted, experienced, and transmitted in the contemporary context? How can Tulou’s cultural significance inform metaverse scene design and community governance? Especially in the digital age, how can Tulou continue to be sight-seen, understood, and transmitted as a “living tradition”?
Right now, Tulou are not only residential structures, but also a trinity of “family-feng shui-state” belief systems. Dubbed as “ancestral halls enclosed by rammed earth”, their circular or square layouts serve as defensive strategies and physical manifestations of the Hakka cosmological concept of “round heaven and square earth”. The spatial logic of Tulou may embody a cosmic diagram in which the circular outer wall symbolizes “heaven”, the central courtyard represents “earth”, and the ancestral hall along the central axis signifies “humanity”30,31. Inside the ancestral hall, the tablets of “Heaven, Earth, Sovereign, Ancestor, and Teacher” stand alongside ancestral memorial tablets, which form a vertical historical axis. The three or four concentric rings of rooms unfold horizontally as a genealogical map of kinship. Each room or beam corresponds to a specific position in the family lineage and transforms the space into a walkable genealogy32,33. The ancestral hall along the central axis of the Tulou serves as the sacred zenith of the entire space, where ancestral tablets stand alongside the “Heaven, Earth, Sovereign, Ancestor, and Teacher” altar, which forms a vertical historical axis. The concentric rings of rooms along the circular corridors symbolize horizontal kinship circles. During biannual spring and autumn ancestral rites, clan members return from afar to the ancestral hall, where swirling incense smoke completes their ritual return to the origin point of identity. This belief network could be woven with bloodline as warp and feng shui as weft, which transforms Tulou into movable sacred sites within the Hakka diaspora’s spiritual world. The Hu Clan Genealogy from Yongding records that “When the Tulou was completed, the clan members gathered around and reported to our ancestors so as to declare the end of our wandering”. Field research reveals that 32% of 1043 global Hakka associations incorporate Tulou imagery in their emblems or annual meeting visuals. The purpose is to create a transcontinental “faith-identity” network. Thus, Tulou could have transcended its physical presence in Fujian’s mountains to become both the geographical origin point and emotional anchor for Hakka spiritual homecoming worldwide.
The cultural significance of Tulou lies not only in its monumental physical form, but also in how its circular space encapsulates the Hakka people’s cosmic order, kinship ethics, and survival wisdom. When technology transplants it into the metaverse, what truly requires translation is not only the rammed earth itself, but also the eternal motif of “home” carried by those enclosing walls. It is evident that the Tulou gives intangible cultural warmth with a tangible language, that is, a warmth that can be engaged with, experienced, and reproduced. The circular structure is no accident since its spatial topology mirrors the Hakka cosmological concept of “round heaven and square earth”. Chengqi Lou’s concentric layout of four rings constitutes a microscopic “Hakka cosmos” whose outer ring symbolizes the protective firmament (with 72 rooms corresponding to the 72 earthly demons), while the inner hall housing ancestral tablets forms the humanistic center. This spatial narrative encodes astronomy, ethics, and architecture into a habitable symbolic system. When visitors enter into virtual Tulou in the metaverse, technology should recreate not just visual accuracy through dynamic lighting and spatial acoustics, it is better to enable users to perceive how architecture becomes a habitable celestial-almanac.
The metaverse should not merely preserve Tulou as a digital specimen; instead, it serves as a regenerator of cultural meaning. This demands breaking free from conventional cultural tourism paradigms by allowing users to inhabit virtual identities within Tulou for seventy-two hours, during which they could experience the full lifecycle of cultural practices from spring planting to autumn harvests or weddings to funerals. Visitors’ virtual activities (like courtyard messages they write) will become AI training data, which may continuously generate new Hakka narratives. In the digital realm, Tulou can evolve contemporary variations such as parametrically designed “space Tulou” so as to explore how traditional wisdom might address interstellar migration ethics. When visitors remove their VR headsets, what they carry away should not be architectural data. Actually, they renew understanding of “home” with an eternal closed curve being both a defensive perimeter against threats and embracing arms open to the world. This represents the ultimate goal of cultural-technology integration and transforms technology into a Rosetta Stone for decoding humanity’s collective memory.
Cultural Visualization Degree Assessment Model (CV)
This study has achieved significant outcomes both theoretically and practically. In terms of theoretical contributions, it is better to propose the “Cultural Visualization Degree Assessment Model” (CV), with the computational formula shown in Eq. (1). This model aims to quantitatively evaluate the effectiveness of cultural visualization and provide a scientific foundation for the multimodal experimental research on Fujian Tulou. The CV model may fill a gap in the quantitative assessment of cultural visualization by introducing quantifiable technical parameters (T), cultural salience (S), and cost (C). The purpose is to achieve a leap from subjective qualitative description to objective quantitative analysis and offer a robust theoretical support for the digital transformation of cultural tourism and heritage preservation. By comprehensively considering these three key factors, the model evaluates the overall effectiveness of cultural visualization. The formula is as follows:
$${\rm{CV}}=\frac{{\sum }_{{\rm{i}}=1}^{{\rm{n}}}({\rm{Ti}}\times {\rm{Si}})}{{{\rm{C}}}_{{\rm{t}}}}$$
(1)
Ti represents the i-th technical parameter, which reflects the degree of technological application in cultural visualization. Si denotes the i-th cultural salience and indicates the prominence of cultural elements in the visualization. On the other hand, Ct signifies the total cost and encompasses technological expenditures, labor inputs, time investments, and other relevant factors. The selection of a linear model is based on three key considerations. According to the UNESCO 2022 technical report, the effectiveness of cultural heritage digitization typically demonstrates a linear relationship with investment. The linear model facilitates an intuitive understanding of each element’s contribution for decision-makers. Additionally, this modeling approach effectively avoids the overfitting issues commonly encountered with nonlinear models when working with small sample sizes. The chi-square test results (χ2 = 3.21, p > 0.05) confirm that the dataset is more suitable for linear modeling.
In the context of global cultural convergence, the preservation and transmission of local cultures may be faced with unprecedented challenges. As an emerging communication approach, cultural visualization can transform abstract cultural elements into intuitive visual experiences, thereby enhancing cultural appeal and influence. However, current evaluations of cultural visualization predominantly remain at the qualitative analysis level and lack scientific quantitative methods. The CV model proposed in this study could address this issue precisely. Its core innovation lies in providing a standardized, computable, and comparable metric to enable the objective comparison of the effectiveness of different technical solutions and cultural expression strategies within a unified framework. For instance, in the multimodal experimental study of Fujian Tulou, by evaluating different technical parameters (e.g., VR and AR technologies) and the salience of cultural elements (e.g., intangible cultural heritage skills and folk traditions), combined with cost considerations, the model enables scientific selection of optimal design solutions to enhance cultural visualization outcomes.
To better demonstrate the application of the model, it is necessary to take Fujian’s intangible cultural heritage as an example to conduct specific evaluation data calculations. Technical parameters (T): VR technology T1 = 0.8 (indicating a relatively high application level of VR technology in cultural visualization), AR technology T2 = 0.7 (indicating a relatively high application level of AR technology in cultural visualization). Cultural salience (S): intangible heritage skills (such as paper-cutting and puppet shows) S1 = 0.9 (indicating high salience of intangible heritage skills in cultural visualization), folk culture (such as Mazu culture and tea culture) S2 = 0.85 (indicating relatively high salience of folk culture in cultural visualization). Total cost (Ct): technology costs including VR technology CVR = 5000 yuan, AR technology CAR = 3000 yuan; human resource costs for design and development CHR = 2000 yuan; time cost for project cycle CTime = 1000 yuan. The total cost Ct = CVR + CAR + CHR + CTime = 5000 + 3000 + 2000 + 1000 = 11,000 yuan.
Calculation Process:
$${\rm{CV}}=\frac{\left({{\rm{T}}}_{1}\times {{\rm{S}}}_{1}\right)+\left({{\rm{T}}}_{2}\times {{\rm{S}}}_{2}\right)}{{{\rm{C}}}_{{\rm{t}}}}$$
$${\rm{T}}1\times {\rm{S}}1=0.8\times 0.9=0.72$$
$${\rm{T}}2\times {\rm{S}}2=0.7\times 0.85=0.595$$
$$\mathop{\sum }\limits_{{\rm{i}}=1}^{2}\left(\left. ({{\rm{T}}}_{{\rm{i}}}\times {{\rm{S}}}_{{\rm{i}}}\right)\right)=0.72+0.595=1.315$$
Final Results:
$${\rm{CV}}=\frac{1.315}{11000}\approx 0.0001195$$
To enhance the interpretability of the model results, the study introduces a standardization coefficient K = 104 and applies Min-Max normalization to the total cost C_t, mapping it uniformly to the [0,1] interval, with baseline thresholds set at 5000 yuan (lower bound) and 50,000 yuan (upper bound) to cover typical cost ranges for cultural tourism digitization projects. Based on this framework, the standardized CV × K values are categorized into three tiers: Ineffective (CV × K < 3), indicating the need for comprehensive redesign; Effective (3 ≤ CV × K < 6), qualifying as recommended solutions; and High-Efficiency (CV × K ≥ 6), which are archived in demonstration case libraries for industry-wide replication. In the Fujian intangible cultural heritage case study, the normalized C_t value of 0.133 yields CV × K = 1.195, falling within the Effective tier. This quantitative result confirms the effectiveness of the solution under current conditions; more importantly, it pinpoints specific directions for optimization. The CV × K value could be elevated to the High-Efficiency tier by enhancing cultural salience through deeper inheritor participation and enriched narrative design, or by reducing costs via the adoption of cloud rendering and edge computing technologies. These strategies may enable the progressive improvement of cultural visualization outcomes.
The Cultural Visualization Index (CV): 0.0001195. This value reflects the ratio of the cultural visualization effect to the total cost under the given technical parameters and cultural significance conditions for Fujian’s intangible cultural heritage. Although this value is relatively small, considering the high costs involved, it indicates that the cultural visualization effect of the current design solution is relatively efficient under the present conditions. Based on this evaluation result, designers can further optimize the selection of technical parameters and cultural elements to improve the Cultural Visualization Index. For example, the CV value can be enhanced by increasing the prominence of cultural elements or reducing technical costs.
Application of VR technology in the cultural visualization of Fujian Tulou
To further enhance the practicality and accuracy of the model, some additional evaluation metrics, such as users’ experience and market feedback, can be incorporated into the existing framework to provide a more comprehensive assessment of cultural visualization effectiveness34,35,36,37. The value of this research lies in the fact that the CV model provides a core quantitative anchor for integrating these multidimensional data points and linking subjective experience with objective performance. To validate the model’s effectiveness, it is essential to select Fujian Tulou as a specific case study for analysis38,39,40
Unlike many prior studies focusing solely on the effectiveness of VR technology itself, this research places technical efficacy within a cost-benefit framework for examination through the CV model. Researchers have selected 100 volunteers of different ages and backgrounds as experimental subjects. The volunteers were divided into two groups. One group experienced Tulou culture using VR technology, while the other group learned about it solely through traditional imaging technology. After the experience, both groups were surveyed to assess their understanding of and interest in Tulou culture, with the final evaluation based on scores from the questionnaires. The “cultural comprehension” scores showed an average of 85 (out of 100) for the VR group and 65 for the traditional imaging group, while the “cultural interest” scores averaged 4.5 (out of 5) for the VR group and 3.0 for the traditional imaging group. The technical parameter (T) for VR technology, T1 = 0.8, indicates a high degree of VR application in cultural visualization, while the cultural significance (S) of Fujian Tulou, S1 = 0.9, reflects its high prominence in cultural visualization. The total costs (C_t) include VR technology costs (CVR = 5000 RMB), labor costs (CHR = 2000 RMB), and time costs (CTime = 1000 RMB), resulting in a total cost (Ct) of 8000 RMB. Calculation results:
$${\rm{CV}}=\frac{{{\rm{T}}}_{1}\times {{\rm{S}}}_{1}}{{{\rm{C}}}_{{\rm{t}}}}=\frac{0.8\times 0.9}{8000}=\frac{0.72}{8000}=0.00009$$
The experimental data shows that the case’s CV value is 0.00009. Volunteers who use VR technology demonstrate significantly higher scores in cultural comprehension and interest compared to those who only experience Tulou culture through traditional imaging technology. This reaffirms the advantage of VR technology in cultural communication. The CV model reveals the quantifiable performance level achievable by this technical solution under given costs. Although the CV value is relatively small, the design solution still achieves a high cultural visualization effect under current conditions, while considering the substantial costs involved. Further optimization of technical parameters and cost reduction could enhance the CV value. Through this detailed case analysis, researchers observe the effectiveness and practicality of the “Cultural Visualization Index Evaluation Model” in quantitatively assessing cultural visualization outcomes. The application of VR and AR technologies in the cultural visualization of Fujian Tulou has significantly improved users’ cultural understanding and interest, which could validate the feasibility of VR technology in innovative cultural applications for Tulou heritage. With continuous technological advancements and evolving demands in cultural tourism, this model is expected to achieve breakthroughs across multiple dimensions and provide more precise guidance for the dynamic preservation of local culture and promoting the sustainable development of the cultural tourism industry.
This study validates the cross-scenario applicability of the Cultural Visualization (CV) assessment model through three representative cases, such as Fujian Tulou VR, Nanjing Yunjin AR, and Jingdezhen Ancient Kiln MR projects. Empirical data reveals significant variations in CV scores (1.195, 0.892, and 2.314, respectively), with the ancient kiln project outperforming industry benchmarks by 54%. The research highlights the critical synergy between technological configuration and cultural interpretation, so as to project utilizing “5 G+edge computing” architecture, which is consistently achieved CV scores above 2.0, while 83% of underperforming projects suffered from inadequate cultural contextualization. Here, the researchers try to propose a phased optimization approach and demonstrate remarkable improvements: technical enhancements may boost operational efficiency by over 30% and the content enrichment may increase cultural expression scores by 9.2%. Surely, the experience could upgrade extended user engagement duration by 78%. In the Ministry of Culture’s “Digital Heritage Conservation” pilot program, the 17 projects have implemented this model and achieved an average 59.7% CV improvement, which could enhance users’ satisfaction from 3.8 to 4.6 (on a 5-point scale) and reduced investment payback periods to 8.3 months – 43% better than industry averages. These results confirm the model’s practical value and scalability for diverse cultural heritage digitization initiatives and offer some key insights, such as the necessity of balancing technical investment with content development, identifying the optimal 80,000–120,000 RMB cost range, and demonstrating how phased implementation effectively manages risks while ensuring results and providing valuable references for policy-making and project implementation in cultural heritage preservation.
Metaverse for Fujian Tulou preservation
The metaverse, functioning as a parallel universe coexisting with human society, utilizes digital formats as its medium to achieve connectivity, integration, and creation through the convergence of multiple new technologies41,42,43,44. As a digitally simulated world that interacts with and blends into physical reality, the emergence of metaverse technology has provided a novel technical support for immersive experiences45,46,47,48. Its practical applications have been progressively implemented and gradually recognized as a mainstream and highly valued experiential approach49,50,51,52,53. The integration of Fujian’s cultural tourism with metaverse technology creates immersive touring experiences that transcend the conventional detached “object-subject” relationship and overcome temporal-spatial limitations54,55,56,57,58. This enables interactive engagement between readers and virtual spaces to transform passive cultural indoctrination into active role-playing scenarios featuring time-space traversal. Participants can interact with environmental elements and characters from literary works, even assuming specific roles to experience alternative spatiotemporal dimensions59,60,61,62,63. By developing a metaverse-based virtual exhibition hall for Fujian Tulou, an innovative cultural format emerges to facilitate readers’ in-depth comprehension of literary content. This shifts the traditional superficial “sightseeing” and reading mode into fully immersive experiences, thereby unveiling the infinite aesthetic potential underlying singular reading activities64,65,66,67,68.
The rise of the metaverse concept has brought disruptive innovation to Fujian’s cultural tourism design industry. The inherent compatibility between cultural tourism and the metaverse has given birth to the concept of “cultural tourism metaverse”, which is regarded by academia as the future development direction of the cultural tourism industry69,70,71,72,73. The integration of book design with metaverse technologies, such as augmented reality (AR), virtual reality (VR), mixed reality (MR), and internet technology, can provide readers with an immersive experience74,75,76,77. Currently, many domestic museums have established VR exhibition halls and online panoramic exhibition halls, which can use VR technology to present collections in an audiovisual manner and convey traditional culture. However, most museums remain at the stage of digitally replicating offline exhibition halls and lack more interactive virtual experiences78,79,80,81. Creating a Fujian Tulou metaverse virtual exhibition hall can leverage digital technologies such as AR, VR, 3D modeling, and holographic projection to construct cultural heritage scenes of ancient Fujian architecture, stone carvings, and murals. Visitors can explore their historical and cultural connotations through virtual reality. Secondly, VR technology can also be used for the 3D reconstruction of Fujian’s cultural sites and to revive lost or inaccessible cultural relics. Realistic virtual scenes allow visitors to immerse themselves in Fujian’s cultural attractions and traditional activities, which can enhance visitors’ sense of connection and participation in Fujian’s culture. Additionally, visitors can use AR technology to scan printed images and view virtual projections of traditional art performances and folk activities. Images or text can trigger virtual character projections, where these characters serve as guides and introduce Fujian’s historical and cultural background through storytelling, myths, and legends. Visitors can interact with these virtual characters and gain more vivid and engaging cultural knowledge. By leveraging modern technology to create a Fujian Tulou metaverse virtual exhibition hall, Fujian Tulou’s culture can be presented to readers in an entertaining way, so as to elevate their reading experience.
The virtual exhibition scene of Fujian Tulou (Chengqi Building) employs advanced technological approaches to ensure exceptional realism and detailed representation, which utilizes the Faro Focus X130 LiDAR system, renowned for its high precision and efficiency in delivering premium-quality 3D point cloud data. A comprehensive laser scanning of Chengqi Building is firstly conducted and generate point cloud data comprising 120 million vertices, with multiple scanning stations strategically positioned in order to achieve complete coverage of every architectural detail from all angles. The acquired data undergoes professional point cloud processing, including noise reduction, filtering, and registration, to guarantee accuracy and integrity. The system incorporates a physics-based particle dynamics engine capable of handling up to 5 million particles with meticulously configured parameters such as 8 mm diameter, 2.1 g/cm3 density, and 0.6 friction coefficient, which require approximately 3.5 million collision detections per frame to deliver an exquisitely realistic simulation of the traditional Tulou construction process. The system features LOD (Level of Detail) tiered rendering capability, which allows seamless switching between high, medium, and low graphics presets to optimally balance visual quality and performance for smooth interactive experiences. Furthermore, the virtual interaction incorporates advanced multimodal technologies to enhance user immersion and engagement. For gesture control, Leap Motion devices enable users to naturally “grasp” virtual rammers and actively participate in the rammed-earth wall construction process through intuitive operation. The haptic feedback system precisely simulates ramming vibrations at 40 Hz frequency with three adjustable intensity levels (1–3N), which may allow users to physically sense the impact force and material texture. Then, it is sure to create a comprehensive immersive experience that authentically replicates traditional construction techniques while maintaining optimal system performance across all quality settings.
Experimental preparation phase
The study has employed a three-stage recruitment method to involve in stratification, randomization, and matching. First, potential tourist profiles within Fujian Province were obtained through the cultural and tourism bureau database, which can establish nine sampling strata based on three dimensions such as age (18–25, 26–35, 36–45), gender (1:1 ratio), and education level (associate degree or below bachelor’s degree and postgraduate degree), so as to ensure that the samples can adequately represent diverse demographic characteristics and enhance the generalizability of the experimental results. Each stratum is randomly distributed 200 electronic invitations and ultimately yields 186 respondents. The invitations are sent via a specialized recruitment platform capable of automatically recording dispatch times, receipt status, and preliminary response data. Therefore, it is easy to enable effective management and monitoring of the recruitment process. Subsequently, the nearest neighbor matching (NNM) method is applied to confirm no significant differences between the VR group and the conventional video group across all dimensions and indicate high similarity in key demographic traits (age, gender ratio, education level, etc.) to mitigate potential bias from population heterogeneity. Participants with extensive VR experience (>5 sessions/month) are excluded as “VR-proficient users” since their heightened familiarity with VR technology could lead to substantially different experiences and responses compared to novice users, which may potentially bias the experimental results. A total of 60 eligible volunteers are ultimately enrolled in the study, with 30 participants assigned to each group.
While the three-stage recruitment strategy enhanced sample representativeness, several limitations regarding generalizability should be acknowledged. The final sample size (n = 60), though adequate for detecting medium-to-large effect sizes in controlled experiments, may limit statistical power for detecting smaller effects and generalizability to broader populations. The sampling frame, restricted to registered tourist profiles within Fujian Province, may bring about selection bias toward individuals with pre-existing interest in local cultural heritage. Furthermore, the exclusion of VR-proficient users limits understanding of how experienced users interact with cultural metaverse applications. Future studies should employ large and diverse samples across multiple geographic regions to include stratified sampling of users with varying technology proficiency levels. The aim is to enhance external validity.
This study was conducted in strict accordance with the ethical principles of the Helsinki Declaration. All experimental protocols were consistent with institutional ethical review standards. Due to the nature of this research (e.g., retrospective study/use of anonymized data), the institutional review board granted an exemption from formal ethics approval procedures. All participants provided written informed consent after receiving comprehensive information regarding the study’s purpose, procedures, potential risks, and benefits. The confidentiality of participant information was strictly maintained, and the right to withdraw from the study at any time without penalty was ensured.
Experimental procedure
The VR group’s 30-min baseline phase has incorporated three standardized procedures to establish reliable pre-intervention measurements. First, resting-state EEG data are collected under both eyes-closed (α power: 42.7 ± 3.2 μV2/Hz) and eyes-open (18.3 ± 2.1 μV2/Hz) conditions for 5 min each to record electrophysiological baselines. Then, Participants complete a cultural knowledge pretest by using an IRT-based adaptive test bank (item difficulty: 0.62 ± 0.21), which is used to measure their initial competency levels (θ = 0.73 ± 0.31). Finally, a 5-min VR familiarization session (operation error rate: 1.2 ± 0.4 counts) is conducted to ensure participant comfort with the equipment and reduce potential first-time user effects. This comprehensive protocol has provided essential baseline data for subsequent VR intervention analyses.
During the 40-min intervention phase, VR group participants are engaged in three carefully designed activities to facilitate immersive learning. The session begins with a 10-min guided tour mode that enabled first-person exploration of the Tulou periphery so as to achieve path tracking accuracy of 2.1 ± 0.3 cm RMS. During this exploration, the system could automatically record participants’ movement trajectories and dwell times (averaging 8.3 ± 2.1 s per area) as they naturally observe the architectural features. Participants then progress to a 15-min construction task that may involve five rammed-earth craft interactions, which include material preparation, soil mixing, and wall compaction. This hands-on component demonstrates strong engagement, with participants achieving a 4.2 ± 0.5/5 task completion rate and 89.7 ± 3.2% operational accuracy through the system’s real-time feedback mechanism. Throughout this activity, the system can collect detailed hand-motion data to objectively assess technique mastery. The entire experience is supported by system-guided and step-by-step instructions that enable participants to develop both practical skills and a deeper understanding of traditional construction methods and their cultural significance. This multi-faceted approach has combined observational learning with active participation to create a comprehensive cultural immersion experience. Simultaneously, sensors can collect participants’ hand movements and force application data during operations to assess their mastery of construction techniques. In the 15-min cultural quiz, participants identify hidden cultural symbols (e.g., ancient murals, distinctive wood carvings) within the Tulou interior scenes and achieve 78.3 ± 5.6% recognition accuracy while the system can record search paths and durations to analyze cognitive strategies. This phase may evaluate participants’ observational skills and cultural understanding through systematic scoring of symbol identification performance and exploration patterns. Meanwhile, the control group has viewed a specially-produced documentary matching the intervention duration, which covers Tulou’s history, architectural features, and cultural practices. The system can track their viewing behaviors (fixation duration, scene transition frequency) for comparative analysis with VR group data, so as to maintain standardized knowledge delivery through conventional video presentation.
The 60-min evaluation phase for the VR group comprises comprehensive assessments such as Immediate post-intervention, participants completed questionnaires (IPQ total score: 7.2 ± 0.8, including spatial presence 6.8 ± 0.3/7 and involvement 6.5 ± 0.4/7), and physiological data collection (EEG with NASA-TLX cognitive load score: 52 ± 6). The aim is to evaluate cultural knowledge acquisition, immersive experience, and subjective satisfaction. Concurrent eye-tracking data enable in-depth analysis of cognitive processing and attention allocation patterns. One-week follow-up telephone tests have demonstrated significantly higher cultural knowledge retention in the VR group (78.3 ± 4.5%) compared to the traditional video group (62.1 ± 5.2%, p < 0.01), which may confirm VR’s advantage in long-term memory preservation through standardized content-related assessments. The complete experimental protocol is detailed in Table 1.
Enhanced data synchronization protocol
The system utilizes the White Rabbit protocol to establish a nanosecond-level synchronization network in hardware and provide precise temporal alignment for EEG devices, eye trackers, and motion capture systems through an FPGA timestamp generator (jitter <1 μs), while the software architecture integrates five synchronized data streams (EEG, eye movement, motion capture, questionnaire, and system logs) via a ROS2-based data hub that performs timestamp alignment (e.g., T0, T0 + 5 ns, T0 + 8 ns) and multimodal fusion through a unified interface. The complete system architecture and data flow are illustrated in Fig. 1.
A Shows the two-layer architecture. The hardware layer employs a White Rabbit protocol and FPGA timestamp generator (jitter <1 µs) to integrate and synchronize EEG, eye tracker, and motion capture devices over a nanosecond-level network. The software layer, based on an ROS2 data hub, offers a unified interface for real-time processing of multimodal streams like EEG, eye movement, motion data, questionnaires, and logs. B Shows that EEG, eye tracker, and motion capture devices are coordinated via FPGA and ROS2 hub, achieving nanosecond-level sync (e.g., packets at T0, T0 + 5 ns, T0 + 8 ns), with unified timestamps generated at T0 + 10 ns for synchronized multimodal data fusion.
Data acquisition and processing
A comprehensive 257-field data dictionary is developed in compliance with ISO/IEC 11179 standards, which can meticulously define each field’s name, data type, semantic meaning, collection methodology, and units to ensure standardization and interoperability for data sharing and reuse. An automated data quality control pipeline is implemented to execute daily at 02:00 and perform comprehensive validation checks in order to include data completeness, accuracy and consistency. The system can automatically detect and flag anomalies while triggering three-tiered alerts (email/SMS/phone) to notify researchers for timely corrective actions. Thus, it is important to maintain database reliability. A rigorous dual-verification protocol requires all data to be independently reviewed by at least two researchers prior to archiving. Any questionable entries are undergoing thorough investigation include potential reacquisition when it is necessary to guarantee data integrity and trustworthiness.
The EEG preprocessing has employed a combined ASR (Automatic Artifact Repair) and ICA (Independent Component Analysis) denoising approach, where ASR can automatically detect and correct artifacts (e.g., ocular/muscular artifacts) while ICA isolates and removes noise components independent of neural activity, which can collectively enhance signal-to-noise ratio. Morlet wavelet transforms are applied for time-frequency analysis across 1–100 Hz, which can decompose EEG signals into temporally-resolved spectral components to visualize dynamic neural oscillations and establish foundations for functional brain analysis. PLI (Phase Lag Index)-weighted brain network construction is implemented to examine interregional functional connectivity patterns and network topology, with PLI metrics quantifying phase synchronization directionality between brain areas. This network approach elucidates information exchange and cooperative mechanisms among cortical regions during VR immersion and provide mechanistic insights into how immersive experiences modulate cognitive brain functions.
The study can systematically investigate users’ cognitive behaviors in virtual environments through multimodal data analysis. First, researchers have constructed an eye-tracking scanpath model based on Hidden Markov Models to identify three characteristic visual exploration patterns (hidden states = 3, BIC = 152.3) by analyzing fixation coordinates (x,y), saccade amplitude (°), and dwell time (ms). Second, through using a density-peak-based dynamic ROI partitioning algorithm, researchers have divided the virtual scene into seven areas of interest (diameter ≥2° visual angle) and calculated metrics including first fixation latency (M = 320 ms, SD = 45) and total fixation duration (M = 1.2 s, SD = 0.3), which can reveal significant differences in visual attention allocation across Tulou architectural features (F(6,105) = 8.72, p < 0.001, η2 = 0.33). Joint ICA (JIVE) analysis of multimodal data showed: significant positive correlation between EEG θ-band power (4–7 Hz) and fixation duration (r = 0.61, p < 0.001, FDR-corrected); head rotation angles predicted ROI transition probability (β = 0.47, p = 0.002); and significant covariation patterns between questionnaire scores and eye-movement metrics (p < 0.001, Cohen’s f = 0.42). With synchronized multimodal data acquisition (error < ±12 ms), these findings provide reliable evidence for understanding the cognitive mechanisms underlying immersive experiences.
