Next Article in Journal
Experimental Evaluation of Possible Feature Combinations for the Detection of Fraudulent Online Shops
Next Article in Special Issue
Linguistic Interval-Valued Spherical Fuzzy Soft Set and Its Application in Decision Making
Previous Article in Journal
Lightweight Super-Resolution Reconstruction Vision Transformers of Remote Sensing Image Based on Structural Re-Parameterization
Previous Article in Special Issue
Ranking Strategic Goals with Fuzzy Entropy Weighting and Fuzzy TOPSIS Methods: A Case of the Scientific and Technological Research Council of Türkiye
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Structured Life Narratives: Building Life Story Hierarchies with Graph-Enhanced Event Feature Refinement

1
Key Laboratory of Knowledge Engineering with Big Data of the Ministry of Education, Hefei University of Technology, Hefei 230002, China
2
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230002, China
3
National Smart Eldercare International S&T Cooperation Base, Hefei University of Technology, Hefei 230002, China
4
Department of Psychiatry, Harvard Medical School, Boston, MA 02115, USA
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(2), 918; https://doi.org/10.3390/app14020918
Submission received: 30 October 2023 / Revised: 17 December 2023 / Accepted: 12 January 2024 / Published: 22 January 2024
(This article belongs to the Special Issue Fuzzy Control Systems: Latest Advances and Prospects)

Abstract

:
The life stories of older adults encapsulate an array of personal experiences that reflect their care needs. However, due to inherent fuzzy features, fragmented natures, repetition, and redundancies, the practical application of the life story approach poses challenges for caregivers in acquiring and comprehending these narratives. Addressing this challenge, our study introduces a novel approach called Life Story Hierarchies with Graph-Enhanced Event Feature Refinement (LSH-GEFR). LSH-GEFR constructs a bilayer graph. Firstly, the event element map leverages intricate relationships between event elements to extract environmental features, providing a detailed context for understanding each event element. Secondly, the event map explores the complex web of relationships between the events themselves, allowing LSH-GEFR to generate a comprehensive understanding of each event and enhance its representation. Subsequently, we conducted experiments on different datasets and found that, in comparison with four advanced event tree generation methods, the proposed LSH-GEFR method outperformed them in terms of path coherence, branch reasonableness, and overall readability when generating life story hierarchies. Over 84.91% of the structured life narratives achieved readability, marking a 5.96% increase over the best-performing approach at the baseline.

1. Introduction

The global population aged over 60 is expected to triple, reaching 2 billion by 2050 [1]. By 2025–2026, an estimated 2 million individuals aged over 50 are expected to encounter social isolation and loneliness, signifying a 49% increase within a decade [2].
People have used life stories to break social isolation and alleviate loneliness among older adults [3]. However, life stories are often obtained through frequent brief interviews or digital storytelling and are seldom recounted in their entirety, resulting in fuzziness, fragmentation, and redundancy [4]. Moreover, the disordered aggregation of life stories results in ambiguous relationships and indistinct themes. This complexity and uncertainty make it difficult for users to perform information mining and make analytical decisions based on life stories [5].
Extracting crucial events from life stories and organizing the events into a hierarchical tree-like structure presents an effective approach to the aforementioned issues. Currently, research on structured event organization predominantly focuses on the news and social media. Li et al. employed Twitter data to construct a storyline of personal events by discerning important occurrences based on user mentions [6]. They prioritized events mentioned once as significant and overlooked frequently discussed events. Yet, this approach is not suitable for older life story organizations, as events that older people often discuss are important in establishing links in elderly care services. Ansah et al. designed an event tree-building algorithm for identifying events through user network communities and modeling their evolution based on temporal proximity and semantic context [7]. However, it did not address the issues of event repetition and fragmentation. Liu et al. introduced a dual-layer graph clustering technique to analyze event streams, segmenting documents into topic sets for event tree creation [8]. Nonetheless, their method assessed event relevance solely on individual event text features, overlooking the interconnections between events.
The uniqueness of older adults’ life narratives, distinct from news or social media data, compounds three challenges in structuring them. Firstly, older adults often recall specific life experiences in varying ways, leading to fragmented and repetitive narratives. This poses challenges for caregivers, who face the pressure of reading and understanding these stories. Thus, in structuring life narratives, it is essential to identify and merge fragmented life stories depicting the same event. Secondly, older adults narrate their life stories without adhering to clear topics, encompassing various topics within their daily conversations. Hence, we need to cluster related events revolving around the same topics. Finally, the representation of event features in life stories needs to consider multidimensional information. Elements such as characters, time, and location within events, along with potential relationships between events, all influence the representation of event features. Presently, deep learning models predominantly focus on enhancing event features based on textual content and trigger words, paying relatively little attention to the information regarding event elements and the potential connections between events. We compress the text features of life stories into two-dimensional space by linear embedding to visualize the problem. Figure 1a illustrates the original feature distribution of a person’s life stories, in which each point represents a life story and different story topics are distinguished with different colors. As shown in Figure 1a, the related life stories are scattered. Our goal is shown in Figure 1b, where relevant event features become more similar, while irrelevant event features become more distinct.
To address the above questions, this paper proposes a novel structured life narrative construction method, LSH-GEFR, which builds life story hierarchies with Graph-Enhanced Event Feature Refinement. The contributions can be summarized as follows:
  • LSH-GEFR builds life story hierarchies that amalgamate identical events into one event node and classify related events into one topic node. The root node, the topic node, and the event node are three different types of nodes that LSH-GEFR uses to organize life stories into hierarchies. These nodes signify the beginning of the tree, the topic of the events on the same branch, and the events under one topic branch, respectively.
  • We propose a bilayer graph to enhance event feature refinement. The first layer is the event element map, and the second layer is the event map established on the event element map. The two maps combine to capture the potential correlation between events to extract additional event features and optimize event feature representation. This leads to an increased similarity among relevant event features while distinguishing irrelevant event features. Consequently, it enhances the identification of event relationships when dealing with limited sample sizes.
  • We conducted a validation of LSH-GEFR’s performance across three distinct datasets. The experimental results show that LSH-GEFR excels in path coherence, branch reasonableness, and overall readability when generating structured life narratives. Over 84.91% of the structured life narratives achieve readability, 5.96% higher than the best-performing approach at the baseline.

2. Related Works

2.1. Life Narratives

Life narratives involve revisiting one’s life experiences through communication with others, serving as an intervention to break social barriers and alleviate the sense of loneliness among older adults [9]. Traditionally, the life experiences of older adults were often documented in physical formats, like storybooks, collages, or memory boxes [10]. Organizing life narratives typically combines textual descriptions with visual elements, such as significant individuals or event photographs. These life narratives are organized around different topics, such as birth, educational pursuits, family life, career endeavors, and travel experiences [11].
With the evolution of technology, digital storytelling aided by computational means has been widely embraced. The life narrative approach, as a form of human intervention, has had applications within eldercare services [12]. Nursing homes routinely engage in extensive communication with older adults and their families upon admission. This communication aims to gather a substantial amount of information about the older adults’ past life experiences, habits, and interests, among other aspects, forming the basis of a “personal biography”. These biographies serve as the foundation for crafting electronic health records and facilitating tailored caregiving services for older adults [3]. In the American eldercare market, a business model merging “memoirs” with loneliness intervention has emerged [13]. This model not only revitalizes traditional memoir products but also renders depression therapy more approachable and readily accepted by older adults.
However, the current application of the life narrative approach heavily relies on manual collection, organization, and information mining, lacking automated means of processing life stories. Life narratives gathered through daily interactions and face-to-face interviews suffer from fragmentation, disorder, and redundancy, imposing burdens on caregivers for story structuring and information extraction.

2.2. Event Feature Refinement with Graph-Enhanced Methodologies

The integration of contextual information stands as one of the methods for optimizing event feature representations within life stories [14]. Among the information types, temporal contextual information holds pivotal significance [15]. Gottschalk et al. employed an event-centric temporal knowledge graph to exploit temporal stamp information on events [16]. On another front, spatial information also plays a crucial role in optimizing event feature representations, particularly in the automated organization research of life stories [17].
Leveraging graph-enhanced methods to optimize event feature representations constitutes the application of a multidimensional information fusion strategy. Beyond temporal and spatial contextual information, graph-enhanced methods can introduce additional contextual cues, aiding models in better comprehending relationships among events. The directed event graph is a method of modeling dependencies among events as a directed graph [18]. Each event is denoted as a node within the graph, while the dependencies among events are represented as edges between nodes. This methodology assists models in grasping the interdependence among events, thereby generating more precise feature representations. Franklin et al. employed advanced techniques such as geographic embeddings, which organically merge location information with event feature representations to enhance the capability of capturing spatial correlations among events [19]. Event graphs treat events as nodes within a graph, and the relationships among events as edges within the graph. By utilizing graph neural networks (GNNs) to model event graphs, it becomes possible to capture complex relationships among events, consequently generating richer feature representations [20].

2.3. Structured Organization of Events

We are faced with the difficulty of a big information explosion in the modern Internet era. There is a constant stream of news stories, blog posts, social network updates, and other textual information on the Internet [21]. Hierarchical structures provide a clear hierarchical way to represent relationships between events, making information easy to understand and navigate [22]. The tree structure is a typical hierarchical structure. Each event in the tree structure can serve as a node in the tree, with parent and child nodes, creating an ordered hierarchical structure [23]. Through the connections between parent and child nodes, event relationships can be explicitly expressed, enabling users to better comprehend causality or temporal order among events [24]. The hierarchical structure of events organized through a tree format allows users to efficiently categorize and classify data, which is crucial for analyzing and managing large event datasets, enhancing data organization and manageability [25,26]. Furthermore, tree structures offer excellent scalability, enabling the continuous addition of new events to support dynamic data expansion [27]. This is particularly valuable for handling accumulating event data and maintaining flexibility in data organization [28]. Most importantly, tree structures can be employed in decision support systems [29], helping users gain a deeper understanding of data, identify trends and patterns, and make decisions based on this information [30].
Ansah et al. leveraged user community, event text, and time information to model an event evolution relationship and presented the event summary as a tree structure [7]. To improve clustering accuracy, more multi-stage clustering models are proposed. Shahaf et al. proposed a two-stage clustering process that extracts events from a large number of given news documents to generate a connection graph that explicitly captures the story’s development [31]. Yan et al. sought to identify the patterns of theme evolution inside events using time-dependent linkages and prior knowledge. By creating efficient sentence selection techniques, this method expanded the basic hierarchical Dirichlet process and detected the evolution of events [32]. Liu et al. proposed a bilayer graph clustering method based on keyword graphs and document graphs to divide numerous documents into different sets of topics; then, the documents under each topic are divided into different sets of events, organizing a large amount of breaking news data into an easy-to-read event story tree [8].
This paper aims to utilize the tree structure to organize the events from life stories to achieve better logical consistency, which could help users quickly extract effective information.

3. Method

In this section, we first give a standardized definition of this work and then introduce the LSH-GEFR architecture.

3.1. Problem Definition

To give readers an insight into structured life narratives, we use Wu’s life stories as an illustrative example. As we can see from Figure 2, the input is life stories from Mr. Wu, and the output is the corresponding structured life narrative. The event tree contains 15 nodes, where the  R O O T  node represents the start of the event tree,  T 1 T 4  nodes represent topic nodes with the form of “time period: topic one, topic two”, and  E 1 E 10  nodes represent event nodes under different topics with the form of “time: summary”. Each event node is accompanied by the corresponding life stories. The link between topic nodes indicates a temporal topic evolution relationship. The link between event nodes indicates a temporal or logical event evolution relationship. As illustrated in Figure 2, there are four branches within Wu’s life story hierarchies, where the branch path  E 1 E 2  is about Wu’s education experience, the branch path  E 3 E 4  is about Wu’s soldier experience, the branch path  E 5 E 7  is about Wu’s work experience in North Korea, and the branch path  E 8 E 10  is Wu’s life after his wife died. In addition, we mark each event node with a different color for different frequencies so that users can quickly distinguish them.
This study aims to structure the life narratives of an older adult into an event tree format that aligns with the logical evolution of events. The event themes constitute the trunk of the event tree, and related events within the same theme are ordered chronologically and logically as branches. Hence, the formal problem description is as follows:
The input consists of the life story narratives of an older adult, and the output is the corresponding event tree. Let d donate one life story document, and  D = d 1 , d 2 , , d n  donate all life stories of one older adult.  e i  is the event extracted from one life story  d i . Compute event relevancy and, based on this, determine whether events within different story documents express the same event. If such a relation exists, consolidate them into a unified event node denoted as E. Next, determine whether the amalgamated event nodes E are thematically cohesive, indicating that they are related to the same topic T. If there is such coherence, cluster all relevant event nodes into a single topic. Finally, organize the topic nodes T and event nodes E into a hierarchical event tree structure based on chronological and logical sequences, which can be represented by  T r = T E , E d , T d , where T represents the theme nodes,  T d  represents the edges connecting the theme nodes, E represents the event nodes, and  E d  represents the edges connecting the event nodes.
The hierarchical structure of life narratives, which consists of themes, events, and detailed life stories, supports caregivers in grasping the life trajectory of older adults. It also enables them to delve into specific events, understanding the event details and developmental processes. This structure furnishes caregivers with the necessary data support and decision-making foundation to devise personalized care plans for older adults [33].

3.2. LSH-GEFR Architecture

The LSH-GEFR architecture is shown in Figure 3. The architecture achieves the hierarchical construction of life narratives through three modules: event extraction, graph-enhanced structure for event fusion and clustering, and event tree generation.
(1) The event extraction module extracts the core events from the life stories, which are the basis of the structured life narratives. (2) The event fusion and clustering based on a graph-enhanced structure module constructs a bilayer graph to refine representation by capturing potential connections between events. The first layer, the event element graph, takes advantage of the relationships between event elements to extract environmental features for each event element. This level of granularity enables us to capture the nuanced details that often lie beneath the surface of life stories, providing a richer context for understanding events. Building upon this foundation, the second layer, the event graph, takes things a step further by delving into the intricate web of relationships between events themselves. This deeper exploration allows LSH-GEFR to uncover previously concealed connections and correlations, significantly enhancing the representation of each event. We accomplish event fusion for identical events and clustering for related events using the D2E-EC model. (3) Finally, the event tree generation module accomplishes the construction and update of the structured life narratives.

3.3. Event Extraction

The core of the life narratives lies in events, emphasizing the actions or transitions in a person’s life occurring at a particular time and place. Therefore, in the event extraction module, we complete the tasks of event detection and event element extraction.
An event is defined as the extracted incident from a life story that represents the central life experience during that episode. From each life story document, we extract only one event. Firstly, we adapt BERT to process life narrative texts, obtaining features that encompass contextual information [34]. Then, we locate the event by identifying the features of the event trigger words. We randomly selected 15% of the dataset’s life stories and utilized the ODEE method to extract events from each life story, while labeling the event trigger words [35]. Following this, two graduate students in the research team conducted manual corrections on the extracted event trigger words. When their opinions diverged, the authors Gui and Yang participated in the discussion to jointly determine. The criteria for manual correction are as follows:
  • The trigger word exists within the original life story text.
  • The trigger word is a verb or a gerund.
  • The trigger word directly prompts the occurrence of an event.
Finally, we use the 15% manually corrected data as the training set to train a classifier that can recognize event trigger words and perform accurate event detection.
Following event detection, we proceed to identify event elements. As depicted in Figure 4, ei, tm, loc, pc, and em refer to the trigger word, time, location, participant, and event mentioned, respectively. To accomplish this, we employ the Language Technology Platform (LTP) to extract named entities from life story documents [36]. Next, we use an encoder-decoder network to integrate the named entity features, the original word features, and the trigger word features to determine the appropriate element for each named entity.

3.4. Graph-Enhanced Structure for Event Fusion and Clustering

We need to determine whether the core events depicted in different life stories are coincident, relevant, or irrelevant events. A person’s life experience can be seen as a coherent organization of past events, where relevance plays a crucial role in encoding [37]. Life events are associated with each other [38]. Therefore, we propose a bilayer graph structure to capture the correlations between events to optimize the event representation. The first layer is the event element map, and the second layer is the event map established on the event element map. We use these connected events to define environment features for a given event. In the following, we will introduce the bilayer graph and describe the clustering procedures.

3.4.1. Bilayer Graph-Enhanced Structure

The event element map, abbreviated as  G m , contains the components of all the events that involved the same person. Each node in  G m  represents an event element that has been extracted by the scheme described in the previous section, Section 3.3. The elements are connected by undirected edges based on the correlation calculation of event element features.
Let m denote an event element and  m  denote the corresponding event element feature vector. Then,  m  can be represented as
m = ( f , ρ , τ , μ )
f  is the vector feature, representing the textual feature of the event element, generated by the BERT pre-trained model.  ρ , τ , and  μ  are numerical features.  ρ  is the part of speech obtained in a lexical analysis of event elements, including nouns, adverbs, verbs, adjectives, and others, which are coded 0, 1, 2, 3, and 4.  τ  represents the named entity type of the event element, including the name of the person, the name of the organization, the place name, and others, coded 0, 1, 2, and 3.  μ  is the semantic role of each event element, including the time, place, causer, patient, trigger word, and event description, encoded 0, 1, 2, 3, 4, and 5.
Given two event elements  m i  and  m j , the correlation  φ m m i , m j  is calculated as below.
φ m m i , m j = ω ρ D i f ρ i , ρ j + ω τ D i f τ i , τ j + ω μ D i f μ i , μ j + ω f S f f i , f j
D i f ( x , y )  is used to calculate the similarity for categorical features, including  ρ τ , and  μ :
D i f ( x , y ) = 1 , x = y 0 , x y
S f f i , f j  is used to calculate the similarity for a numeral feature vector  f  with length n:
S f f i , f j = k = 1 n f i , k f j , k 2
φ m m i , m j  is the combined correlation of  f ρ τ , and  μ  with weights  ω f ω ρ ω τ , and  ω μ , respectively. The event element map is constructed based on  φ m m i , m j .
Each event element does not exist independently, and it is influenced by other elements. Therefore, we further define an environmental feature  mc i  for each element  m i  and extend its feature vector  m i  to  m ¯ i  to capture the influence of related elements on  m i m ¯ i  is represented as
m ¯ i = ( f , ρ , τ , μ , mc i )
where  mc i  is calculated as below.
m c i = m j φ m m i , m j × m j , m j m j φ m m i , m j t o p s o r t φ m m i , m j , 5
The calculation procedure is as follows. We first sort all the event elements that are connected with  m i  according to  φ m m i , m j , then we obtain the top five elements. For these five elements, we further convert  ρ τ , and  μ  to 0 or 1 according to Equation (3) by comparing this element feature with  m i . Finally, we fuse the features of these five elements to obtain  mc i  by a weighted summation with the correlation  φ m m i , m j  as the weight.
Figure 5 illustrates an event element map example. If  φ m m i , m j  reaches the fixed threshold, 0.65, the element  m i  and  m j  are connected by an undirected edge; otherwise, the two elements are not connected. In Figure 5 m 2 m 3 m 4 m 5 m 6  are the five most correlated elements with  m 1 ; therefore, they are fused as the environmental feature of  m 1  to obtain the extended feature  m ¯ 1 .
The event map  G e  is constructed based on the event element map  G m . We use e to denote an event and  e  to denote the corresponding feature vector.  e  is represented as
e = em , et
where  em  is the combination of the extended features of the corresponding event elements, and  et  denotes the original event text.
em = m ¯ ei , m ¯ tm , m ¯ loc , m ¯ ag , m ¯ pa
For two event feature vectors  e i  and  e j , the correlation of these two vectors is defined below.
φ e e i , e j = ω e m S e m em i , em j + ω e t S e t et i , e t j
where  S e m em i , em j  and  S e t et i , et j  denote the similarity of the  em  component and the  et  component of these two event feature vectors, respectively.
As  em  consists of five event element feature vectors,  S e m em i , em j  is calculated based on the calculation of event element correlation, which is defined as follows:
S e m em i , em j = k { e i , t m , l o c , a g , p a } ω m × φ m m i , k , m j , k + ω m c × φ m mc i , k , mc j , k
m i , k  and  mc i , k  form  m ¯ i , k , where  k { e i , t m , l o c , a g , p a } . This equation represents the weighted summation of the correlation of these five event element feature vectors, where each event element feature vector has two components, including the original feature vector and the environmental feature vector, i.e.,  m ¯ i , k = m i , k , mc i , k .
As  et  is the text that describes the event, we use Rouge-L to calculate the similarity  S e t et i , et j .
Similar to the event element map, we also define the environmental feature  ec i  for each event  e i  to capture the influence of related events on  e i . Therefore,  e i  is extended to  e ¯ i = em i , et i , ec i ec i  is calculated as below.
ec i = e j φ e e i , e j × e j , e j e j φ e e i , e j t o p s o r t φ e e i , e j , 5
The calculation procedure is as follows. We first sort all the events that are connected with  e i  according to  φ e e i , e j , then we obtain the top five events. For these five events, we further convert their original event text  et  to text features generated by the pre-training model BERT. We then fuse the features of these five events to obtain  ec i  by a weighted summation with the correlation  φ e e i , e j  as the weight.
Figure 6 illustrates an event map example.  e 2 e 3 e 4 e 5 e 6  are the five most related events with  e 1 , which are fused as the environmental features of  e 1  to obtain the extended feature  e ¯ 1 .

3.4.2. DBN-Enhanced DBSCAN for Event Clustering

Before clustering events, we first merge the same events, which are determined based on the event correlation  φ e . We assume that when the correlation of two events is greater than a fixed threshold, i.e., 0.88, the two events describe the same life story. For the same events, we merge them to form a new event node E and represent its feature as  E = ( Em , Et , Ec ) , where  Em Et , and  Ec  represent the merged event element feature, event text feature, and event environmental feature, respectively.
For the event element feature merging, we apply the majority voting rule to determine the final element value for each type of element. If two element values have the same appearance number, we choose the value with finer granularity as the final element value. Therefore, we can obtain the merged event element feature vector as
Em = m ¯ ei , m ¯ tm , m ¯ loc , m ¯ ag , m ¯ pa
For event text feature merging, we find the event with the greatest number of elements appearing in  Em  to be the most reliable event. We then apply BERT to transfer this event text to the event text feature. For event environmental feature merging, we directly utilize the environmental feature of the most reliable event determined before as the merged feature.
Most clustering algorithms have a weak ability to handle high-dimensional data  [39,40]. Moreover, graph-enhancement-based event feature optimization further increases the event feature dimension. This poses a challenge to the task of clustering related events. Therefore, after the merged event feature vector  E = ( Em , Et , Ec ) , we integrate the deep belief network (DBN) [41] and DBSCAN [42] to build a deep clustering model, D2E-EC. Figure 7 illustrates the framework. The feature vector is initially fed into the DBN to reduce its dimensionality [43]. Subsequently, the diminished feature vector is employed as input for the DBSCAN clustering algorithm.

3.5. Event Tree Generation

We further generate the life event tree by offline construction and dynamical updating. The former constructs a life event tree by forming event nodes, event branches, and topic nodes with events in hand. The latter updates the life event tree when a new life story appears.

3.5.1. Event Tree Construction

The event node is used to illustrate the merged events and takes the form of “time: summary”, such as “1953: Wu met his wife in 1953 in North Korea”. The time component is from the time element of the merged event, and the summary component is the event summary extracted from the most reliable event determined in Section 3.4.2. We adopted the approach proposed in paper [33] to implement event summary generation. Furthermore, it is worth noting that oft-mentioned life stories and infrequently mentioned life stories hold distinct meanings for an individual. As a result, we have employed a color-coded scheme to differentiate event nodes of varying frequencies. The oft-mentioned life stories can serve as a bridge between caregivers and older adults [44].
We aim to form the event branch by creating an event timeline for events in the same cluster based on their time distance and similarity. The following procedures are followed. Firstly, we arrange the events with a non-null time element in chronological order, denoted as  S t . For each event node  E i  with a null time element, we determine its similarity with each event in  S t  based on  φ E  to identify the most relevant event node  E j t  in  S t . Here, we use the features obtained by dimensionality reduction in the DBN to compute the similarity between event nodes. We represent the n-dimensional feature vector of event node E after dimensionality reduction as  E ˜ . The cosine similarity  φ E  can be computed as follows:
φ E ( E ˜ i , E ˜ j ) = k = 1 n E ˜ i , k × E ˜ j , k k = 1 n ( E ˜ i , k ) 2 i = 1 n ( E ˜ j , k ) 2
Then, we compare  φ E ( E ˜ i , E ˜ j 1 t )  and  φ E ( E ˜ i , E ˜ j + 1 t ) . This is to determine whether  E i  should be inserted before or after  E j t . If  φ E ( E ˜ i , E ˜ j 1 t )  is larger than  φ E ( E ˜ i , E ˜ j + 1 t ) , we insert  E i  before  E j t , otherwise, we insert  E i  after  E j t . The final event sequence  S t  forms an event branch in the life event tree.
The topic node serves to illustrate the central theme of each event cluster and serves as the basis for the life event tree. The topic node is represented as “time period: topic word one, topic word two, …, topic word n”, for instance, “1946–1955: Soldier, Shandong”. The time period spans across the events in the same branch. Hence, we extract the start time from the first event with a non-null time element in the branch and use the end time from the last event with a non-null time element. The topic words are obtained using the TextRank algorithm [45], and we select the two highest-ranked topic words for the topic node.
We can construct the life event tree by utilizing the event nodes, event branches, and topic nodes. To begin with, a root node is added as the starting point of the event tree. The topic nodes are organized in chronological order based on the start time of the time period to form the core of the event tree. Subsequently, each event branch is linked to the corresponding topic node to create the branch.

3.5.2. Event Tree Updating

When obtaining a new life story document  d n e w , we perform one of three operations to insert  d n e w  into an existing event tree: merge, extend, or create a new branch [30]. The pseudocode for the event tree updating process is shown in Algorithm 1.
Algorithm 1: Event Tree Update Process
Input:  d n e w  represents the newly inserted life story,  T n =
B 1 E 11 , E 1 n , B 2 E 21 , , E 2 n , B m E m 1 , , E m n  represents the current
 event tree, which consists of multiple events  { E 11 , , E n n }  and multiple branches
{ B 1 , , B m } . The threshold for event similarity is 0.88, and the threshold for events
 belonging to the same cluster is 0.65.
Output:  T n e w  represents the latest event tree after inserting a new life story.
1:
Initialize  e n e w E ( d n e w )
2:
Initialize  ϕ E 0
3:
Initialize  ϕ m a x 0
4:
Initialize  e ϕ N i l
5:
Initialize  B ϕ N i l
6:
for  B i  in  T n  do
7:
   for  E i j  in  B m  do
8:
      ϕ E ϕ E ( E ˜ i j , e n e w ˜ )
9:
     if  ϕ E  >  ϕ m a x  then
10:
         ϕ m a x ϕ e
11:
        e ϕ E i j
12:
        B ϕ B i
13:
   end if
14:
 end for
15:
end for
16:
if  ϕ m a x < 0.65  then
17:
    B n e w e n e w
18:
    T n e w = B 1 E 11 , E 1 n , B 2 E 21 , , E 2 n , B m E m 1 , , E m n , B n e w
19:
else if  ϕ m a x  < 0.88 and  ϕ m a x  > 0.65 then
20:
    B i A p p e n d E v e n t ( B i , e n e w )
21:
    T n e w = B 1 E 11 , E 1 n , B i E i 1 , , E i n , e n e w , B m E m 1 , , E m n
22:
else
23:
    E i j M e r g e E v e n t ( E i j , e n e w )
24:
    T n e w = B 1 E 11 , E 1 n , B 2 E 21 , , E 2 n , B m E m 1 , , E m n
25:
end if
The merge operation is to merge the new life story document  d n e w  to an existing event node  E i  in the tree. We first extract the event  e n e w  from  d n e w  and generate its feature based on the bilayer graph. Then, we compare it with each event node  E i  in the given event tree. If we find several events with a correlation greater than a given threshold, i.e., 0.88, we merge this life story with the event node with the highest correlation by adding the document to the node.
The extend operation is to generate a new event node for the new life story document and insert it into the branch it belongs to. This operation happens when we cannot find an event node that describes the same event as  d n e w  and we can find an event group to which  d n e w  belongs. We compare the new event with existing events and find the event node  E i  with the largest correlation. If this correlation is greater than a given threshold, i.e., 0.6, we then use the cluster that  E i  belongs to as the new event’s cluster. We then create an event node based on the generated event features and insert it into the corresponding event branch according to the time distance or event similarity, following the procedures mentioned in Section 3.5.1. The topic node of this branch will also be updated following the topic node generation procedures mentioned in Section 3.5.1.
The create a new branch operation is to create a new topic node and generate a new event branch with the new life story document. The event extracted from this story will be the first event in this new branch. The topic node and the event branch generation also follow the procedures mentioned in Section 3.5.1.

4. Experiments

In this section, we first illustrate the experimental settings used in the experiments. Next, we briefly describe the three datasets and baselines used in the experiment. After that, we evaluate the performance of the LSH-GEFR method and compare it with several baselines in terms of event path coherence, branch reasonableness, and overall comprehensibility of the event trees. We also investigate the effects of various hyper-parameters. To ensure a fair comparison, we will adopt the same pre-processing and event extraction procedures for all methods before constructing the life story hierarchies.
We use the PyTorch deep learning framework to implement all the models and train them using the Adam optimizer. The hidden layers of the BERT are set to 768. During training, the batch size is 8, and the learning rate is 5  ×   10 5 . The computing infrastructure used is Windows Server 2016 Standard with NVIDIA T4 GPUs.

4.1. Datasets

We conducted a validation of LSH-GEFR’s performance across three distinct datasets, with the data volume for each dataset outlined in Table 1.
The OALS2.0 dataset. The OALS2.0 dataset contains 15,890 life stories from 195 older adults. We collected life stories from various nursing homes, like Laoximen and Waitan in Shanghai and Jing’an in Hefei. Each older adult was interviewed four to five times, following an interview outline. These interviews aimed to explore their education, work experiences, family life, needs for elderly care services, and significant life events. We recruited volunteers aged 65 and above with good communication skills to conduct these interviews. Our dataset consists of voluntarily shared life stories. Following the wishes of older adults to make their life stories public, only 16 stories from nursing home residents met our criteria. To broaden our dataset, we extended the life stories to online sources, looking for stories from older adults aged 65 and above.
The Twitter dataset. The dataset comprises 17,420 events collected from Twitter. We selected five topics from the US Trending list on Twitter, removed duplicates, and added them to the topic list. These topics were then used as keywords to crawl 500 tweets related to the topic. We filtered out tweets containing less than 20 words. As the majority of the tweets did not have a time element, we saved the posting time of the tweet as the time element of the tweet event.
The news of the CNN dataset. We used trending topics as keywords to simulate browser behavior and search on CNN. From this search, we crawled 40 news items related to the topic, with only articles being retrieved and sorted by their relevance to the topic. The posting time of the news article was saved as the time element of the news event. The resulting dataset contains 46,855 English news documents spanning 61 topics.

4.2. Baselines

To evaluate the performance of the LSH-GEFR method, we compared it to the following models:
  • LDA + Temporal Ordering (LDA + TO) [46]: This method builds a single LDA topic model over the datasets and temporally orders the events under the same topic into a timeline chain. This method exemplifies the naive approach to solving the timeline structure summarization problem.
  • Story Forest [8]: Story Forest adopts a bilayer keyword graph to capture event relationships. The goal is to extract and cluster events from breaking news and generate event lines based on clustering results.
  • EventKG [16]: EventKG is a multilingual event-centric temporal knowledge graph. The effectiveness of the biographical timeline generation is demonstrated based on the EventKG.
  • EventNET [47]: EventNET is an automatic event tree generation method based on a single-layer event network.
  • LSH-GEFR (No GE): We remove the graph enhancement part of the method and directly input the original event feature based on Bert into the integrated D2E-EC clustering framework.
  • LSH-GEFR (No D2E-EC): After obtaining the optimized event feature vector based on the graph enhancement model, we group the events based on the event similarity. If the similarity of two events is greater than a threshold, they are grouped into a topic. The first cluster centroid is randomly selected and then iteratively selects the most dissimilar events from the current cluster centroids as the next cluster centroid.

4.3. Path Coherence Evaluation

We adopted the approach proposed in paper [7], evaluating the path coherence of the structured life narratives by computing the distance between the word vectors of the event node pairs within one topic. The distance between two events can be computed by  d i s E i , E j = 1 E i · E j | E i | 2 + | E j | 2 E i · E j . This metric also indirectly measures the validity of an edge between two event nodes in a topic path. The coherence of a topic path is represented by the average of the distances of all event node pairs’ summation:
T d i s E 1 , E 2 , , E k = 1 k 1 i = 1 k 1 d i s E i , E i + 1
The goal is to find out which structure presents the life stories most coherently and logically. We present the results of the evaluation metrics in Table 2. Our LSH-GEFR achieves the best performance as it adopts a bilayer graph to obtain the potential relationships among events. This makes the associated events more similar in terms of feature representation to the extent that the relevant event nodes are also closer together.
Path coherence here refers to the rationality of the ordering of events under the same topic. In the LSH-GEFR algorithm, we consider three aspects to rank events temporally or logically: the time of the event, the position of the event trigger word in the life story, and the correlation between the events. Suppose the event node  E i  is known, and the event with the highest integrated correlation is found from the event pool as the previous or next node of the node. Combining the time, trigger word position, and the correlation between events, the integrated correlation between events is calculated as follows:  R e l E i , E j = t m D i f t m i , t m j × p o s D i f p o s i , p o s j × φ e e i , e j t m D i f t m i , t m j  denotes the distance of the timestamp between two events and  p o s D i f p o s i , p o s j  denotes the distance between the location of the event trigger words of the two events. The algorithm integrates three aspects of two events: temporal distance, trigger word distance, and event relevance, which results in a more reasonable ordering of events under the same topic and logical path coherence.
The LDA + TO algorithm employs simple chronological sorting based solely on temporal cues. While the Story Forest and EventNET algorithms integrate both temporal cues and event relevance into event sorting, their event features rely solely on event text vectors, disregarding the influence of event elements. In contrast, LSH-GEFR’s event features are built on BERT encoding, encompassing relevant features from event elements and event networks. This incorporation strengthens the correlation between similar events, fostering high coherence in paths. Additionally, LSH-GEFR also draws inspiration from EventKG by constructing a timeline of big national events. It transforms blurry time elements into clear event timespans, such as converting “post-Reform and Opening Up” in a life narrative to “after 1979”, enhancing the precision of event time representations. Coupled with the optimization of event features within LSH-GEFR, it achieves superior path coherence compared to EventKG.

4.4. Branch Reasonableness Evaluation

To improve the consistency of the identified events, we group them into branches based on a cohesive theme that connects the events within each branch. The logical grouping of interrelated events under a thematic node plays a significant role in the overall rationality of the event tree structure. To evaluate the rationality of each branch, we use clustering evaluation metrics such as the silhouette coefficient (SC) [48], Davies–Bouldin index (DB) [49], and Calinski–Harabasz index (CH) [50].
We adopt the above three metrics to evaluate the relevant event clustering performance of LSH-GEFR. Table 3 shows the performance of different methods on the interview data and Internet data. Table 4 and Table 5 show the performance of different methods on the Twitter dataset and the news of the CNN dataset. It can be seen from the experimental results that the LSH-GEFR has achieved the best performance. We can also find that, upon removing the bilayer graph or the D2E-EC structure, the clustering performance decreases.
Whether events are distributed under appropriate themes will affect the rationality of the event tree structure. Story Forest employs a bilayer approach to event recognition. The first layer is based on identifying keywords within the event documents, which are then grouped into clusters based on their semantic similarity. Subsequently, a document graph is constructed based on the keyword clusters, and an SVM classifier is trained to identify events with similar semantics and relevance. Similarly, EventKG utilizes an entity distance-based approach for event clustering, followed by training an SVM classifier to determine whether a given entity and its knowledge relation belong to the same class. EventNET, on the other hand, focuses on improving event feature extraction and clustering functions to enhance clustering performance. These methods are more effective than the simple LDA approach. However, these methods do not consider that events have certain relationships with each other, which can affect their thematic relevance. Additionally, there is a significant difference in the number of events under each theme, resulting in an uneven distribution.
The LSH-GEFR algorithm proposed in this paper first utilizes the bilayer graph to capture potential connections between events and optimize event feature representation. Subsequently, the D2E-EC framework is employed to complete the event clustering task. DBSCAN is a density-based clustering method that performs well when the number of events is unevenly distributed under each theme. Furthermore, most clustering methods do not perform well on high-dimensional features; hence, we utilized DBN to compress the features and reduce them to low-dimensional representations that can still represent the original features. Clustering with these compressed features leads to significant improvements in clustering performance.

4.5. Readability Evaluation

The purpose of this application, which deals with life stories, is to free caregivers from the manual task of organizing life stories and reduce the difficulty they face in obtaining useful information from these stories. Therefore, the overall readability of the life story event tree is crucial as it affects whether the caregiver can gain some understanding of the older adult through this event tree. Figure 8 displays a partially structured life narrative generated by the LSH-GEFR method. This structured life narrative systematically organizes multiple fragmented life stories in a “topic-event-life stories” structure. Within Figure 8, fragmented stories 1, 2, 3, and 4 describe the same event, amalgamated as event 5, and presented in a “time + abstract” format. Simultaneously, interrelated events 1, 2, 3, 4, and 5 cluster into topic 2 and are presented in a “time + keywords” format.
In addition to the above experiments, we conducted a user evaluation to measure the quality of the event trees in terms of logical coherence and overall readability. We invited twelve graduate students from the Gerontechnology Lab of Hefei University of Technology to participate in a survey to evaluate the event tree quality. All twelve participants reviewed all event trees and all event branches in each tree generated by all of the methods to evaluate the logical coherence and overall readability through the following questions [27]. Examples of the event tree for each method used in this evaluation are given in the Supplementary Material Table S1.
  • Question 1: Do all the events in each topic truly talk about the same topic (yes or no)?
  • Question 2: Do all the life stories in each event node truly talk about the same event (yes or no)?
  • Question 3: Do you agree that the event branches are logically coherent for each event tree generated by different methods?
  • Question 4: Do you agree that the event tree is overall comprehensible for each event tree generated by different methods?
We first report the effectiveness of our system for event groups and event merging in the OALS 2.0 dataset. In 76.02% of topics, the events truly talk about the same topic (yes to question 1), and the accuracy of merging life stories about the same events is 75.37% (yes to question 2).
Question 3 evaluates the logical coherence of event branches; therefore, each event branch has a score from each participant. Question 4 evaluates the overall readability of each event tree; therefore, each event tree has a score from each participant. Since a seven-point Likert scale is utilized, we count the percentage of each score for each question. A higher percentage of the higher score denotes better performance. The results and descriptive statistics of each participant’s evaluation are shown in the Supplementary Material Table S2.
Figure 9 illustrates the cumulative distribution functions (CDFs) of the percentages of event branch consistency scores for different algorithms. As we can see from Figure 9, the CDF curve of LSH-GEFR is above the curves of other methods. This indicates that the event branches generated by LSH-GEFR have the best logical coherence. Specifically, more than 84.91% of event branches of LSH-GEFR achieve scores higher than four. LSH-GEFR generated significantly more coherent event branches.
The aim of the automatic organization of older adult life stories into an event tree structure is to reduce the difficulty for caregivers to manually organize the life stories and derive useful information from them. The event tree reflects a synopsis of an older adult’s life experience. Therefore, the overall readability of the event tree influences the caregiver’s insight into older adults. Table 6 shows the overall readability score percentages for different event trees. We can find that the event tree generated by LSH-GEFR has the best overall comprehension. From the statistical standard of average score, LSH-GEFR is 0.47, 0.86, 0.87, and 0.14 higher than LDA + TO, Story Forest, EventKG, and EventNET, respectively. According to the seven-point Likert scale, a score of four or higher indicates that the event tree is readable. The results show that 84.91% of the event trees generated by the LSH-GEFR algorithm have a score of four or above.
LSH-GEFR surpasses both Story Forest and EventKG in terms of path coherence and rational branching, consequently earning higher scores in overall comprehensibility. Moreover, unlike EventKG’s simplistic network establishment to describe events, we draw inspiration from Story Forest’s approach. We not only create summaries for each event node but also generate summaries for branches.

4.6. Influence of Parameters

Several important parameters will influence the final clustering performance of LSH-GEFR.
When building the bilayer graph, the threshold  δ e  is used to identify whether two events are related. Therefore,  δ e  determines how many events are connected with a given event  e i . Figure 10 shows the event clustering performance when  δ e  changes in a step of 0.05 from 0 to 1. As we can see, when  δ e  is equal to 0.80, the clustering achieves the best performance in terms of the silhouette coefficient evaluation metric. The reason for this is that when  δ e  approaches 0, the event  e i  becomes connected with too many other events, resulting in the extended event features containing too many noise features in the environmental feature. On the other hand, when  δ e  approaches 1, the event  e i  is influenced by only one or two events, resulting in  e i  losing its valuable environmental features. The other relevancy thresholds are set similarly to  δ e . The parameters in the interval of 0 to 1 are tested with a step increase of 0.05, and we choose the best parameters.
The fusion threshold for life stories is determined after experiments. Figure 11 shows the life story fusion performance when the threshold changes in a step of 0.01 from 0.70 to 1. When the threshold is 0.88, the fusion achieves the best performance in terms of the evaluation metric of the silhouette coefficient.
The density radius  e p s  of DBSCAN is an important parameter in the DBSCAN algorithm and represents the neighborhood radius when density is defined. Figure 12 shows the event clustering performance when  e p s  changes in a step of 0.05 from 0 to 1. When  e p s  is 0.70, the clustering achieves the best performance in terms of the evaluation metric of the silhouette coefficient.

5. Discussion

This paper introduces a novel framework, LSH-GEFR, for the hierarchical structuring of life stories. Within this framework, there is a graph-enhanced module that is aimed at optimizing event representations, thereby enhancing the effectiveness of hierarchical construction in life narratives. The experimental results show that the structured life narratives constructed by LSH-GEFR outperformed the baselines in path coherence, branch reasonableness, and overall readability.
Compared with LDA + TO:
  • LDA + TO builds a single LDA topic model over the datasets and temporally orders the events into a tree structure based on time clues. This method exemplifies the naive approach to solving the topic structured events problem. Compared with LDA + TO, LSH-GEFR is optimized in framework design, event representation, and event relationship handling.
Compared with Story Forest:
  • The Story Forest algorithm selects event features based on word features, TF-IDF-derived structural traits, and semantic features derived from LDA. LSH-GEFR relies on BERT-processed textual attributes as its foundation. Utilizing a dual-layered graph structure, it amalgamates multidimensional information such as event elements, element relations, and event connections to represent event features, enhancing the similarity of relevant event features.
  • Both LSH-GEFR and Story Forest adopt graph-enhancement technology. Compared with Story Forest, which focuses on word co-occurrence in event text, the LSH-GEFR method concentrates on elements like time, location, people, and types within the event element graph. It is more advantageous in the structured event representation that emphasizes “people”, “time”, “ location”, and “what happened”.
Compared with EventKG and EventNET:
  • LSH-GEFR, akin to EventKG’s approach, employs a strategy involving big national event knowledge to optimize and supplement ambiguous temporal elements within events. Within the hierarchical construction of life narratives, this methodology enhances the accuracy of path branching.
  • The LSH-GEFR method combines the relationship between event elements and events to build a two-layer graph enhancement structure and optimize event features. The EventNET algorithm focuses on event relationships using a one-layer event graph, but overlooks the importance of event elements.
  • In the event cluster task, EventKG, and EventNET groups, events are solely based on relevance, resulting in weaker performance in organizing events thematically. The LSH-GEFR method uses a specifically designed D2E-EC event clustering framework based on DBN to reduce the influence of high-dimensional features on the clustering effect, which demonstrates superior performance in event clustering tasks.
In the application of elderly care services, the hierarchical structure of life narratives supports caregivers in swiftly grasping the life trajectory of older adults. It also enables them to delve into specific events, understanding event details and developmental processes. These structured life narratives furnish caregivers with the necessary data support and decision-making foundation to devise personalized care plans for older adults [51].

6. Conclusions

We devised the LSH-GEFR framework, which facilitates the fusion of life narratives depicting the same events and the clustering of events depicting the same topic, to address issues such as fragmentation and disorder that are prevalent in life narrative methods within elderly care services. Experimental results indicate that, compared to current methods for automated event organization, the hierarchical structure of life narratives generated by the LSH-GEFR algorithm possesses enhanced comprehensibility. Notably, over 84.91% of the structured life narratives achieve commendable readability, demonstrating overall structural coherence.
In the graph-enhanced module of the LSH-GEFR algorithm framework, we implemented a dual-layered graph comprising event element graphs and event graphs. This enhancement, in comparison to existing methodologies, focuses on augmenting the influence of event elements and their connections among events beyond the event textual context features and trigger word features. By strengthening these connections, the algorithm improves the fusion of similar events and the clustering of related events, particularly in scenarios with limited life narratives. The experiments demonstrate that structured life narratives generated by LSH-GEFR exhibit good coherence in paths and rationality in branching.
Here are the limitations of the study and future work. Firstly, the insufficient number of samples in the life story dataset limits the effect of life story fusion and clustering. We need to gather more extensive and diverse datasets, with a particular focus on life stories from real communication. This effort will contribute to ensuring the robustness and applicability of the model across various demographic groups. Secondly, the summarization of event nodes within structured life narratives requires enhancement in accuracy. Future directions could involve exploring the utilization of generative large language models (LLM) to improve this accuracy, thereby enhancing the readability of structured life narratives. It is noteworthy that the structured format of life narratives facilitates computer processing, enabling the development of AI generative content (AIGC) for interactions with older adults. Leveraging the structured narratives from the LSH-GEFR method, upcoming research aims to explore how this information can be utilized in digital narrative therapy, potentially enhancing the mental and emotional well-being of older adults.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app14020918/s1, Table S1: Examples of structured life narratives generated by the LSH-GEFR method as well as four other comparative methods. Table S2: The statistical results of each participant’s scores on Question 3 and Question 4.

Author Contributions

Conceptualization, F.G. and J.Y.; methodology, F.G.; software, Y.T.; formal analysis, Y.T.; investigation, N.A.; data curation, F.G.; writing—original draft preparation, F.G.; writing—review and editing, J.Y.; supervision, H.C.; project administration, N.A.; funding acquisition, J.Y. and N.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 62072153), the Anhui Provincial Key Technologies R&D Program (No. 2022h11020015), the Program of Introducing Talents of Discipline to Universities (111 Program) (No. B14025).

Institutional Review Board Statement

Approval was obtained from the research ethics board of the Hefei University of Technology (HFUT20220921001).

Informed Consent Statement

Participants provided informed consent. The information and pictures of older adults mentioned in this paper are all privacy-free.

Data Availability Statement

The OALS2.0 and Twitter datasets are available at https://github.com/gerontech-hfut/StoryWell (accessed on 13 December 2023).

Acknowledgments

We acknowledge the use of the facilities and equipment provided by the Hefei University of Technology. We are grateful to Gao Qinglin and others from Shanghai Tianyu Senior Living Service Co., Ltd. (Shanghai, China) for providing resources and validation for this research. We would like to thank every party stated above for providing help and assistance in this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Béjot, Y.; Yaffe, K. Ageing population: A neurological challenge. Neuroepidemiology 2019, 52, 76–77. [Google Scholar] [CrossRef]
  2. Paradis, S.; Roussel, J.; Bosson, J.L.; Kern, J.B. Use of smartphone health apps among patients aged 18 to 69 years in primary care: Population-based cross-sectional survey. JMIR Form. Res. 2022, 6, e34882. [Google Scholar] [CrossRef] [PubMed]
  3. Stargatt, J.; Bhar, S.; Bhowmik, J.; Al Mahmud, A. Digital storytelling for health-related outcomes in older adults: Systematic review. J. Med. Internet Res. 2022, 24, e28113. [Google Scholar] [CrossRef]
  4. Köber, C.; Habermas, T. Development of temporal macrostructure in life narratives across the lifespan. Discourse Process. 2017, 54, 143–162. [Google Scholar] [CrossRef]
  5. Sun, W.; Wang, Y.; Gao, Y.; Li, Z.; Sang, J.; Yu, J. Comprehensive event storyline generation from microblogs. In Proceedings of the ACM Multimedia Asia, Beijing, China, 15–18 December 2019; pp. 1–7. [Google Scholar]
  6. Li, J.; Cardie, C. Timeline generation: Tracking individuals on twitter. In Proceedings of the 23rd International Conference on World Wide Web, Seoul, Republic of Korea, 7–11 April 2014; pp. 643–652. [Google Scholar]
  7. Ansah, J.; Liu, L.; Kang, W.; Kwashie, S.; Li, J.; Li, J. A graph is worth a thousand words: Telling event stories using timeline summarization graphs. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 2565–2571. [Google Scholar]
  8. Liu, B.; Niu, D.; Lai, K.; Kong, L.; Xu, Y. Growing story forest online from massive breaking news. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 777–785. [Google Scholar]
  9. Browne-Yung, K.; Walker, R.B.; Luszcz, M.A. An examination of resilience and coping in the oldest old using life narrative method. Gerontologist 2017, 57, 282–291. [Google Scholar] [CrossRef] [PubMed]
  10. Thompson, R. Using life story work to enhance care. Nurs. Older People 2011, 23, 16–21. [Google Scholar] [CrossRef] [PubMed]
  11. Subramaniam, P.; Woods, B. Digital life storybooks for people with dementia living in care homes: An evaluation. Clin. Interv. Aging 2016, 11, 1263–1276. [Google Scholar] [CrossRef]
  12. Fiddian-Green, A.; Kim, S.; Gubrium, A.C.; Larkey, L.K.; Peterson, J.C. Restor (y) ing health: A conceptual model of the effects of digital storytelling. Health Promot. Pract. 2019, 20, 502–512. [Google Scholar] [CrossRef]
  13. Keown, K.; Tkatch, R.; Martin, D.; Duffy, M.; Wu, L.; Schaeffer, J.; Wicker, E. Lifebio: Life stories of older adults to reduce loneliness and improve social connectedness. Innov. Aging 2018, 2, 241. [Google Scholar] [CrossRef]
  14. Liu, J.; Zhang, R.; Liu, W.; Zhang, Y.; Gu, D.; Tong, M.; Wang, X.; Xue, J.; Wang, H. Context2Vector: Accelerating security event triage via context representation learning. Inf. Softw. Technol. 2022, 146, 106856. [Google Scholar] [CrossRef]
  15. Yang, Z.; Li, Q.; Xie, H.; Wang, Q.; Liu, W. Learning representation from multiple media domains for enhanced event discovery. Pattern Recognit. 2021, 110, 107640. [Google Scholar] [CrossRef]
  16. Gottschalk, S.; Demidova, E. EventKG: A multilingual event-centric temporal knowledge graph. In Proceedings of the European Semantic Web Conference, Heraklion, Greece, 3–7 June 2018; Springer: Cham, Switzerland, 2018; pp. 272–287. [Google Scholar]
  17. Martz, C.J.; Powell, R.L.; Wee, B.S.C. Engaging children to voice their sense of place through location-based story making with photo-story maps. Child. Geogr. 2020, 18, 148–161. [Google Scholar] [CrossRef]
  18. Yang, C.C.; Shi, X.; Wei, C.P. Discovering event evolution graphs from news corpora. IEEE Trans. Syst. Man Cybern.-Part A Syst. Humans 2009, 39, 850–863. [Google Scholar] [CrossRef]
  19. Franklin, N.T.; Norman, K.A.; Ranganath, C.; Zacks, J.M.; Gershman, S.J. Structured Event Memory: A neuro-symbolic model of event cognition. Psychol. Rev. 2020, 127, 327. [Google Scholar] [CrossRef] [PubMed]
  20. Keith Norambuena, B.F.; Mitra, T. Narrative maps: An algorithmic approach to represent and extract information narratives. Proc. ACM Hum.-Comput. Interact. 2021, 4, 1–33. [Google Scholar] [CrossRef]
  21. El-Kassas, W.S.; Salama, C.R.; Rafea, A.A.; Mohamed, H.K. Automatic text summarization: A comprehensive survey. Expert Syst. Appl. 2021, 165, 113679. [Google Scholar] [CrossRef]
  22. Hua, T.; Zhang, X.; Wang, W.; Lu, C.T.; Ramakrishnan, N. Automatical storyline generation with help from twitter. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, IN, USA, 24–28 October 2016; pp. 2383–2388. [Google Scholar]
  23. Lin, C.; Lin, C.; Li, J.; Wang, D.; Chen, Y.; Li, T. Generating event storylines from microblogs. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, 29 October–2 November 2012; pp. 175–184. [Google Scholar]
  24. Yan, Z.; Tang, X. Hierarchical storyline generation based on event-centric temporal knowledge graph. In Proceedings of the International Symposium on Knowledge and Systems Sciences, Beijing, China, 11–12 June 2022; Springer: Singapore, 2022; pp. 149–159. [Google Scholar]
  25. Li, D.; Yan, L.; Zhang, X.; Jia, W.; Ma, Z. EventKGE: Event knowledge graph embedding with event causal transfer. Knowl.-Based Syst. 2023, 278, 110917. [Google Scholar] [CrossRef]
  26. Keith Norambuena, B.F.; Mitra, T.; North, C. A survey on event-based news narrative extraction. ACM Comput. Surv. 2023, 55, 1–39. [Google Scholar] [CrossRef]
  27. Liu, B.; Han, F.X.; Niu, D.; Kong, L.; Lai, K.; Xu, Y. Story forest: Extracting events and telling stories from breaking news. ACM Trans. Knowl. Discov. Data 2020, 14, 1–28. [Google Scholar] [CrossRef]
  28. Yan, Z.; Tang, X. Narrative Graph: Telling Evolving Stories Based on Event-centric Temporal Knowledge Graph. J. Syst. Sci. Syst. Eng. 2023, 32, 206–221. [Google Scholar] [CrossRef] [PubMed]
  29. Kunimitsu, T.; Pacchetti, M.B.; Ciullo, A.; Sillmann, J.; Shepherd, T.G.; Taner, M.Ü.; van den Hurk, B. Representing storylines with causal networks to support decision making: Framework and example. Clim. Risk Manag. 2023, 40, 100496. [Google Scholar] [CrossRef]
  30. Zhang, C.; Lyu, J.; Xu, K. A storytree-based model for inter-document causal relation extraction from news articles. Knowl. Inf. Syst. 2023, 65, 827–853. [Google Scholar] [CrossRef] [PubMed]
  31. Shahaf, D.; Yang, J.; Suen, C.; Jacobs, J.; Wang, H.; Leskovec, J. Information cartography: Creating zoomable, large-scale maps of information. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; pp. 1097–1105. [Google Scholar]
  32. Yan, R.; Wan, X.; Otterbacher, J.; Kong, L.; Li, X.; Zhang, Y. Evolutionary timeline summarization: A balanced optimization framework via iterative substitution. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, 24–28 July 2011; pp. 745–754. [Google Scholar]
  33. An, N.; Gui, F.; Jin, L.; Ming, H.; Yang, J. Toward better understanding older adults: A biography brief timeline extraction approach. Int. J. Hum.-Interact. 2023, 39, 1084–1095. [Google Scholar] [CrossRef]
  34. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
  35. Liu, X.; Huang, H.Y.; Zhang, Y. Open Domain Event Extraction Using Neural Latent Variable Models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2860–2871. [Google Scholar]
  36. Che, W.; Li, Z.; Liu, T. Ltp: A chinese language technology platform. In Proceedings of the Coling 2010: Demonstrations, Beijing, China, 23–27 August 2010; pp. 13–16. [Google Scholar]
  37. Nusser, L.; Wolf, T.; Zimprich, D. How do we recall the story of our lives? Evidence for a temporal order in the recall of important life story events. Memory 2022, 30, 806–822. [Google Scholar] [CrossRef]
  38. Bluck, S.; Habermas, T. The life story schema. Motiv. Emot. 2000, 24, 121–147. [Google Scholar] [CrossRef]
  39. Tang, Y.; Huang, J.; Pedrycz, W.; Li, B.; Ren, F. A Fuzzy Clustering Validity Index Induced by Triple Center Relation. IEEE Trans. Cybern. 2023, 53, 5024–5036. [Google Scholar] [CrossRef] [PubMed]
  40. Tang, Y.; Chen, R.; Xia, B. VSFCM: A Novel Viewpoint-Driven Subspace Fuzzy C-Means Algorithm. Appl. Sci. 2023, 13, 6342. [Google Scholar] [CrossRef]
  41. Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
  42. Schubert, E.; Sander, J.; Ester, M.; Kriegel, H.P.; Xu, X. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Trans. Database Syst. 2017, 42, 1–21. [Google Scholar] [CrossRef]
  43. Hua, Y.; Guo, J.; Zhao, H. Deep belief networks and deep learning. In Proceedings of the 2015 International Conference on Intelligent Computing and Internet of Things, Harbin, China, 17–18 January 2015; pp. 1–4. [Google Scholar]
  44. Atkinson, R. The life story interview as a bridge in narrative inquiry. In Handbook of Narrative Inquiry: Mapping a Methodology; Sage: New York, NY, USA, 2007; pp. 224–245. [Google Scholar]
  45. Mihalcea, R.; Tarau, P. Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004; pp. 404–411. [Google Scholar]
  46. Chen, L.C. An effective LDA-based time topic model to improve blog search performance. Inf. Process. Manag. 2017, 53, 1299–1319. [Google Scholar] [CrossRef]
  47. Gui, F.; Wu, X.; Hu, M.; Yang, J. Automatic Life Event Tree Generation for Older Adults. In Proceedings of the International Conference on Human-Computer Interaction, Virtual, 26 June–1 July 2022; Springer: Cham, Switzerland, 2022; pp. 366–377. [Google Scholar]
  48. Řezanková, H. Different approaches to the silhouette coefficient calculation in cluster evaluation. In Proceedings of the 21st International Scientific Conference AMSE Applications of Mathematics and Statistics in Economics, Kutná Hora, Czech Republic, 29 August–2 September 2018; pp. 1–10. [Google Scholar]
  49. Singh, A.K.; Mittal, S.; Malhotra, P.; Srivastava, Y.V. Clustering Evaluation by Davies-Bouldin Index (DBI) in Cereal data using K-Means. In Proceedings of the 2020 Fourth International Conference on Computing Methodologies and Communication, Erode, India, 11–13 March 2020; pp. 306–310. [Google Scholar]
  50. Wang, X.; Xu, Y. An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Beijing, China, 2019; Volume 569, p. 052024. [Google Scholar]
  51. Zhou, H.; Xiong, F.; Chen, H. A Comprehensive Survey of Recommender Systems Based on Deep Learning. Appl. Sci. 2023, 13, 11378. [Google Scholar] [CrossRef]
Figure 1. Illustration of life story distribution by applying a linear embedding, where each point represents a life story and different story topics are distinguished with different colors. (a) Feature representation without the LSH-GEFR method. (b) Feature representation using the LSH-GEFR method.
Figure 1. Illustration of life story distribution by applying a linear embedding, where each point represents a life story and different story topics are distinguished with different colors. (a) Feature representation without the LSH-GEFR method. (b) Feature representation using the LSH-GEFR method.
Applsci 14 00918 g001
Figure 2. An illustration of the input and output of the algorithm LSH-GEFR, where the input is life stories from the same person and the output is the corresponding life event tree. The event tree contains three types of nodes: the root node, the topic node, and the event node. Events are marked with different colors according to frequency to denote various meanings.
Figure 2. An illustration of the input and output of the algorithm LSH-GEFR, where the input is life stories from the same person and the output is the corresponding life event tree. The event tree contains three types of nodes: the root node, the topic node, and the event node. Events are marked with different colors according to frequency to denote various meanings.
Applsci 14 00918 g002
Figure 3. An overview of the LSH-GEFR architecture. The inputs are disordered life stories from the same person, and the output is the corresponding event hierarchies. The LSH-GEFR architecture includes event extraction, graph-enhanced structure for event fusion and clustering, and event tree generation.
Figure 3. An overview of the LSH-GEFR architecture. The inputs are disordered life stories from the same person, and the output is the corresponding event hierarchies. The LSH-GEFR architecture includes event extraction, graph-enhanced structure for event fusion and clustering, and event tree generation.
Applsci 14 00918 g003
Figure 4. An illustration of the extracted event elements. Each event contains five elements, including the trigger word, time, location, participant, and event mentioned in the event, denoted by ei, tm, loc, pa, and em, respectively.
Figure 4. An illustration of the extracted event elements. Each event contains five elements, including the trigger word, time, location, participant, and event mentioned in the event, denoted by ei, tm, loc, pa, and em, respectively.
Applsci 14 00918 g004
Figure 5. An example of the event element map. Each node is an event element, and two nodes are connected if the correlation  φ m m i , m j  between  m i  and  m j  reaches a fixed threshold. For simplification, we just show the edges connected with  m 1 . In addition,  m 2 m 3 m 4 m 5 m 6  are the five most correlated elements with  m 1 ; therefore, they are fused as the environmental feature of  m 1  to obtain the extended feature  m ¯ 1 .
Figure 5. An example of the event element map. Each node is an event element, and two nodes are connected if the correlation  φ m m i , m j  between  m i  and  m j  reaches a fixed threshold. For simplification, we just show the edges connected with  m 1 . In addition,  m 2 m 3 m 4 m 5 m 6  are the five most correlated elements with  m 1 ; therefore, they are fused as the environmental feature of  m 1  to obtain the extended feature  m ¯ 1 .
Applsci 14 00918 g005
Figure 6. An example of the event map. Each node denotes an event, and they are connected according to  φ e e i , e j . For simplification, we just show the edges connected with  e i . In addition,  e 2 e 3 e 4 e 5 e 6  are the five most highly related events with  e 1 ; therefore, they are fused as the environmental feature of  e 1  to obtain the extended feature  e ¯ 1 .
Figure 6. An example of the event map. Each node denotes an event, and they are connected according to  φ e e i , e j . For simplification, we just show the edges connected with  e i . In addition,  e 2 e 3 e 4 e 5 e 6  are the five most highly related events with  e 1 ; therefore, they are fused as the environmental feature of  e 1  to obtain the extended feature  e ¯ 1 .
Applsci 14 00918 g006
Figure 7. The D2E-EC clustering framework. Each event feature vector  E = ( Em , Et , Ec )  is first input into DBN to reduce the dimensionality. Then, the low-dimensional vector is used as the input of the clustering algorithm DBSCAN for event clustering.
Figure 7. The D2E-EC clustering framework. Each event feature vector  E = ( Em , Et , Ec )  is first input into DBN to reduce the dimensionality. Then, the low-dimensional vector is used as the input of the clustering algorithm DBSCAN for event clustering.
Applsci 14 00918 g007
Figure 8. The structured life narrative generated by LSH-GEFR is organized in the form of a “topic-event-life story”.
Figure 8. The structured life narrative generated by LSH-GEFR is organized in the form of a “topic-event-life story”.
Applsci 14 00918 g008
Figure 9. CDF of event branch consistency scores for different methods.
Figure 9. CDF of event branch consistency scores for different methods.
Applsci 14 00918 g009
Figure 10. Event clustering performance with different  δ e .
Figure 10. Event clustering performance with different  δ e .
Applsci 14 00918 g010
Figure 11. Life stories fusion performance with different thresholds.
Figure 11. Life stories fusion performance with different thresholds.
Applsci 14 00918 g011
Figure 12. Event clustering performance with different  e p s .
Figure 12. Event clustering performance with different  e p s .
Applsci 14 00918 g012
Table 1. The size of the datasets.
Table 1. The size of the datasets.
DatasetThe Volume of DataThe Volume of Topic
OALS2.015,890195
Twitter dataset17,42080
CNN dataset46,85561
Table 2. Evaluation results for path consistency.
Table 2. Evaluation results for path consistency.
MethodInterview DataInternet DataTwitter DataCNN Data
LDA + TO39.87%44.26%54.09%53.72%
Story Forest54.42%57.62%62.07%61.90%
EventKG58.87%60.93%68.07%67.61%
EventNET63.26%65.46%71.86%70.45%
LSH-GEFR70.14%73.27%79.75%78.34%
Table 3. Comparing different methods on OALS2.0 dataset.
Table 3. Comparing different methods on OALS2.0 dataset.
Data SourceInterview DataInternet Data
MethodSCDBCHSCDBCH
LDA + TO0.30560.40141.92560.37930.38731.9902
Story Forest0.33230.36162.57170.48810.35662.6262
EventKG0.33860.35902.74190.49300.34012.2414
EventNET0.33920.35962.60820.49800.34872.2796
LSH-GEFR (No GE)0.41890.31153.28290.52770.36343.4698
LSH-GEFR (No D2E-EC)0.35860.35012.79240.51000.33752.3340
LSH-GEFR0.44970.30963.55510.57540.31203.7159
Table 4. Comparing different methods on Twitter dataset.
Table 4. Comparing different methods on Twitter dataset.
Data SourceTwitter Data
MethodSCDBCH
LDA + TO0.43770.35373.3915
Story Forest0.52760.31124.2580
EventKG0.55400.32733.8780
EventNET0.55590.31323.7226
LSH-GEFR (No GE)0.54580.33464.5008
LSH-GEFR (No D2E-EC)0.53920.31773.8639
LSH-GEFR0.59370.29114.8479
Table 5. Comparing different methods on CNN dataset.
Table 5. Comparing different methods on CNN dataset.
Data SourceCNN Data
MethodSCDBCH
LDA + TO0.43730.35113.2814
Story Forest0.51780.29154.1373
EventKG0.54640.31723.7431
EventNET0.55340.30523.5871
LSH-GEFR (No GE)0.54400.31994.3913
LSH-GEFR (No D2E-EC)0.53250.30483.7700
LSH-GEFR0.59620.28104.7417
Table 6. Readability score percentages for different methods.
Table 6. Readability score percentages for different methods.
Method1 (%)2 (%)3 (%)4 (%)5 (%)6 (%)7 (%)MSD
LDA + TO9.3311.6912.3224.4719.1915.287.724.411.43
Story Forest7.6810.7111.2719.6820.0817.1513.434.021.77
EventKG3.126.6713.5427.3327.3814.577.394.011.72
EventNET4.336.6710.0520.3121.6423.1213.884.741.79
LSH-GEFR3.353.518.2316.4123.6527.5917.264.881.65
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gui, F.; Yang, J.; Tang, Y.; Chen, H.; An, N. Structured Life Narratives: Building Life Story Hierarchies with Graph-Enhanced Event Feature Refinement. Appl. Sci. 2024, 14, 918. https://doi.org/10.3390/app14020918

AMA Style

Gui F, Yang J, Tang Y, Chen H, An N. Structured Life Narratives: Building Life Story Hierarchies with Graph-Enhanced Event Feature Refinement. Applied Sciences. 2024; 14(2):918. https://doi.org/10.3390/app14020918

Chicago/Turabian Style

Gui, Fang, Jiaoyun Yang, Yiming Tang, Hongtu Chen, and Ning An. 2024. "Structured Life Narratives: Building Life Story Hierarchies with Graph-Enhanced Event Feature Refinement" Applied Sciences 14, no. 2: 918. https://doi.org/10.3390/app14020918

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop