Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex Sentences

Fan, Yunlong; Li, Bin; Sataer, Yikemaiti; Gao, Miao; Shi, Chuanqi; Cao, Siyi; Gao, Zhiqiang

doi:10.3390/app13169412

Open AccessArticle

Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex Sentences

¹

School of Computer Science and Engineering, Southeast University, Nanjing 211189, China

²

Key Laboratory of Computer Network and Information Integration, Southeast University, Ministry of Education, Nanjing 211189, China

³

School of Foreign Languages, Southeast University, Nanjing 211189, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(16), 9412; https://doi.org/10.3390/app13169412

Submission received: 3 June 2023 / Revised: 8 August 2023 / Accepted: 16 August 2023 / Published: 19 August 2023

(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Featured Application

Hierarchical clause annotation could be applied in many downstream tasks of natural language processing, including abstract meaning representation parsing, semantic dependency parsing, text summarization, argument mining, information extraction, question answering, machine translation, etc.

Abstract

Most natural-language-processing (NLP) tasks suffer performance degradation when encountering long complex sentences, such as semantic parsing, syntactic parsing, machine translation, and text summarization. Previous works addressed the issue with the intuition of decomposing complex sentences and linking simple ones, such as rhetorical-structure-theory (RST)-style discourse parsing, split-and-rephrase (SPRP), text simplification (TS), simple sentence decomposition (SSD), etc. However, these works are not applicable for semantic parsing such as abstract meaning representation (AMR) parsing and semantic dependency parsing due to misalignments with semantic relations and unavailabilities to preserve the original semantics. Following the same intuition and avoiding the deficiencies of previous works, we propose a novel framework, hierarchical clause annotation (HCA), for capturing clausal structures of complex sentences, based on the linguistic research of clause hierarchy. With the HCA framework, we annotated a large HCA corpus to explore the potentialities of integrating HCA structural features into semantic parsing with complex sentences. Moreover, we decomposed HCA into two subtasks, i.e., clause segmentation and clause parsing, and provide neural baseline models for more-silver annotations. In evaluating the proposed models on our manually annotated HCA dataset, the performances of clause segmentation and parsing resulted in 91.3% F1-scores and 88.5% Parseval scores, respectively. Due to the same model architectures employed, the performance differences of the clause/discourse segmentation and parsing subtasks was reflected in our HCA corpus and compared discourse corpora, where our sentences contained more segment units and fewer interrelations than those in the compared corpora.

Keywords:

clause hierarchy; hierarchical clause annotation; complex sentences; syntactic parsing; semantic parsing; RST corpus

1. Introduction

Most natural-language-processing (NLP) tasks suffer performance degradation when encountering long complex sentences, such as abstract meaning representation (AMR) parsing [1], semantic dependency parsing [2], constituency parsing [3], semantic role labeling [4], machine translation [5], and text summarization [6]. The intuition to address this issue is first to decompose complex sentences and then re-link simple ones, which share similar ideas with tasks such as rhetorical-structure-theory (RST)-style discourse parsing (RST parsing) [7], split-and-rephrase (SPRP) [8], text simplification (TS) [9], simple sentence decomposition (SSD) (Gao et al. have yet to name their task, and thus, we summarize their main idea into simple-sentence-decomposition (SSD) for the convenience of writing) [10], etc.

However, previous works with such intuitions to process complex sentences are not practical for semantic parsing tasks, such as AMR [11] and semantic dependency parsing [12]. RST parsing, aiming to extract the rhetorical relations among elementary discourse units (EDUs) [13] at a document level, is still an open problem, where the state-of-the-art model only achieves 55.4 and 80.4 Parseval-Full scores for multi- and intra-sentential parsing, respectively. Besides, the blurry definitions of EDUs and the misalignments between rhetorical relations and semantic relations make RST parsing unsuitable for semantic parsing. The SPRP task maintains a big splitting granularity, where the outputs may still be complex sentences. The TS and SSD tasks, which decompose complex sentences into simple ones, cannot preserve the original semantics for rephrasing for simpler syntax (TS) or dropping discourse connectives (SSD).

To avoid the deficiencies of the previous works summarized above, we propose a novel task, hierarchical clause annotation (HCA), based on the linguistic research of clause hierarchy [14], where clauses are fundamental text units centering on a verb phrase and sentences with multiple clauses form a complex hierarchy. Our HCA is a more-lightweight task at the sentence level, has explicit definitions of clauses and appropriate mappings between inter-clause and semantic relations (vs. RST parsing), and aims to annotate complex sentences into a clause hierarchy (vs. SPRP) without changing or dropping any semantics (vs. TS and SSD).

To show the potentialities of HCA to facilitate semantic parsing with complex sentences, we demonstrate the HCA tree, AMR graph, and semantic dependency graph (SDG) of a complex sentence from the AMR 2.0 dataset (https://catalog.ldc.upenn.edu/LDC2017T10, accessed on 15 June 2021) in Figure 1:

\begin{matrix} {[I f I d o n o t c h e c k,]}_{C_{1}} {[I g e t v e r y a n x i o u s,]}_{C_{2}} [w h i c h d o e s s o r t o f g o a w a y a f t e r \\ {15 - 30 m i n s,]}_{C_{3}} {[b u t o f t e n t h e a n x i e t y i s s o m u c h]}_{C_{4}} {[t h a t I c a n n o t w a i t t h a t l o n g .]}_{C_{5}} \end{matrix}

The above sentence is segmented into five clauses

C_{i}

, where:

$C_{2}$ and $C_{4}$ are coordinate and contrastive;
$C_{1}$ and $C_{3}$ are conditional adverbial and relative clauses of $C_{2}$ , respectively;
$C_{5}$ is a resultative adverbial clause of $C_{4}$ .

In the HCA tree, the coordinate relation

B u t

and clauses

C_{i}

are nodes, and subordinate relations are directed edges. As demonstrated in Figure 1, the HCA tree shares the same hierarchy with two semantic parsing representations, indicating the possibility of incorporating HCA’s structural information into semantic parsing with complex sentences.

Inspired by the similarities among the HCA tree, AMR graph, and SDG, we annotated the first HCA corpus with the guidance in previous works of crowdsourcing annotation [15,16]. Furthermore, we adapted the state-of-the-art models [17,18] of the discourse segmentation and parsing tasks for training the baseline models to generate HCA annotations automatically.

Our main contributions are as follows:

We propose a novel framework, hierarchical clause annotation (HCA), to segment complex sentences into clauses and capture their interrelations based on the linguistic research of clause hierarchy, aiming to provide clause-level structural features to facilitate semantic parsing tasks.
We elaborate on our experience developing a large HCA corpus—including determining an annotation framework, creating a silver-to-gold manual annotation tool and ensuring annotating quality. The resulting HCA corpus contains 19,376 English sentences from AMR 2.0, each including at least two clauses.
We decomposed HCA into two subtasks, i.e., clause segmentation and parsing, and adapted discourse segmentation and parsing models for the HCA subtasks. The experimental results showed that the adapted models achieved satisfactory performances in providing reliable silver HCA data.

The rest of this paper is organized as follows: First, the related works are summarized in Section 2, and the proposed HCA framework, along with the manual annotation details of a large HCA corpus from scratch, are detailed in Section 3. Then, the neural end-to-end models for the clause segmentation and clause parsing subtasks are proposed in Section 4. Next, the experimental details and results of evaluating the proposed models are presented in Section 5, and the potentialities of utilizing HCA features in AMR parsing and semantic dependency parsing are discussed in Section 6. Finally, our work is concluded in Section 7.

2. Related Work

2.1. RST Parsing

In a document, the clauses, sentences, and paragraphs are logically connected together to form a coherent discourse. RST [13] provides a general way to describe the relations among parts in a text and postulates a hierarchical discourse structure called the discourse tree (DT). The leaves of a DT can be a clause or a phrase without strict definitions, known as elementary discourse units (EDUs). Adjacent EDUs and higher-order spans are non-overlapping and connected hierarchically through coherence relations. Thus, coherence in discourse can be analyzed in terms of how a nucleus interacts with its surrounding satellites to communicate the relationships between the main ideas and related ideas.

The RST-parsing task generally requires breaking the text into EDUs (i.e., the discourse segmentation task) and linking the EDUs into a DT (i.e., the discourse parsing task). For discourse segmentation, Gessler et al. [17] proposed a Transformer-based neural classifier that enhances contextualized word embeddings with hand-crafted features and achieved the current state-of-the-art performance in the DISRPT 2021 Shared Task on Discourse Unit Segmentation (https://sites.google.com/georgetown.edu/disrpt2021?pli=1, accessed on 1 April 2023). For discourse parsing, Kobayashi et al. [18] explored a strong baseline by integrating previous simple parsing strategies, top-down and bottom-up, with various Transformer-based pretrained language models (PLMs).

Table 1 demonstrates the comparison between our HCA and RST parsing with exemplified sentences from the RST Discourse Tagging Reference Manual (https://www.isi.edu/~marcu/discourse/, accessed on 20 March 2023). Although both tasks aim to extract a tree structure from input texts, their definitions of elementary units and target interrelations vary, leading to the following differences:

The elementary units of HCA are clauses, while those of RST parsing are EDUs, including clauses and phrases. The blurry definitions of an EDU may cause obstacles in RST parsing. For example, phrases “as a result of margin calls” in (1) and “Despite some their considerable incomes and assets” in (2) are segmented as EDUs, but not clauses due to the absence of a verb. Moreover, although the clause “that they have made it” functions as a predicative of the verb “feel”, it cannot be annotated as an EDU.
The rhetorical relations in RST parsing characterize the coherence among EDUs, and some cannot map to semantic relations. For example, the semantic relation between “feel” and “that they have made it” in (2) is not captured in RST parsing.

In summary, the blurry definitions of EDUs and the misalignments between rhetorical and semantic relations make RST parsing unsuitable for semantic parsing compared with our HCA.

2.2. Other Similar Tasks

Some similar tasks share the idea of decomposing complex sentences into simpler parts without capturing their interrelations.

2.2.1. Clause Identification

For the CoNLL-2001 shared task, clause identification, Reference [19] proposes a dataset with the gold standard clause provided by the Penn Treebank II [15], where clause-level tags (i.e., S, SBAR, SBARQ, INV, and SQ) indicate target clauses and clausal conjunctions. The clauses identified in the shared task comprise tensed clauses, non-tensed verb phrases, coordinators, and subordinators.

2.2.2. Split-and-Rephrase

The split-and-rephrase (SPRP) task [8] aims to split a complex input sentence into shorter sentences while preserving meaning. In that task, the emphasis is on sentence splitting and rephrasing. There is no deletion and no lexical or phrasal simplification, but the systems must learn to split complex sentences into shorter ones and to make the syntactic transformations required by the split (e.g., turn a relative clause into a main clause).

2.2.3. Text Simplification

The text simplification (TS) task [20] is the process of reducing the linguistic complexity of a text to improve its understandability and readability while maintaining its original information content and meaning. Typically, it rephrases complex sentences with simpler vocabulary and syntax and ignores trivial clauses from the source.

2.2.4. Simple-Sentence-Decomposition

The simple sentence decomposition (SSD) task [10] converts complex sentences into a covering set of simple sentences derived from the tensed clauses in the source sentence, where shared nouns or pronouns are copied and discourse connectives (e.g., and, but, although, etc.) are dropped.

2.2.5. Summary

Table 2 compares our HCA task and similar tasks above with the exemplified sentence in Section 1:

For clause identification, coordinator “but” and subordinators “If”, “which”, and “that” are segmented out. Besides, non-tensed verb phrases that function as a subject, object, or postmodifier are also target clauses in the task. These cases are out of the definition of annotated clauses in our HCA framework, as redundant hierarchies occur in capturing inter-clause relations.
For split-and-rephrase, the granularity of decomposing is larger than clauses due to the consideration of preserving the original meaning of the input sentence. The outputs (1) and (3) are still complex sentences with two clauses. Additionally, the output (2) is segmented from the relative clause that modifies the “anxious” in the matrix clause, leading to a syntax transformation.
For text simplification, dropping the subordinator “If” and the coordinator “but” leads to the uncertainties of discourse relations between output sentences. Moreover, as replacements for simpler syntax in (3) and (4) bring misalignments between substitute and substituted tokens, text simplification is unsuitable to serve as a preprocess for semantic dependency parsing, which is a token-level task.
For simple sentence decomposition, it also drops clausal connectives like text simplification, leading to the uncertainties of some discourse relations captured by semantic parsing.

2.3. Clause Hierarchy

Clause hierarchy can be described as a cline along which clauses distribute according to their different levels of grammatical integration [14,21,22,23,24,25]. These works propose that clause combinations in many languages can be described as a set of tighter or looser clauses. A tight clause means that a clausal constituent has, in comparison to a loose clause, more dependence on the clause with which it combines, typically a main clause. Table 3 shows three main versions of clause hierarchy and ordered linguistic phenomena according to their clause integration tightness degree.

Matthiessen [22] presented a type of clause hierarchy that extends from syntactic clause combination to cohesion and coherence at the discourse and text level. He considered clause combination to range from tight syntactic “embedding” (e.g., infinitival clauses as a complement to a main verb) to the looser relations of “hypotaxis” (e.g., a finite adverbial) to “parataxis” (e.g., coordination).

Hopper and Traugott [23] also offered a model that shares much with Matthiessen’s, where “parataxis” is the syntactic independence of clauses and “hypotaxis” is a more integrated clause that is syntactically dependent within another clause’s predicate. Tighter still is “subordination”, which, like “embedding” in Matthiessen’s type, covers all clauses that function as a constituent essential to grammaticality, e.g., verbal arguments.

Payne’s clause hierarchy [14] extended from “compound verbs” (tightest) through to separate “sentences” (loosest), where clause combination becomes more or less a single verbal element in his type. Different from the above two types, Payne argues that compound verbs (e.g., go get the book), though uncommon in English, are considered the tightest clause combination as they have two verbal elements placed adjacently in a verb phrase, one of which lacks full finiteness.

In summary, the linguistic research of clause hierarchy provides solid theoretical support for our HCA framework.

3. Hierarchical Clause Annotation

In this section, we elaborate the manual annotation criterion for the proposed HCA task, the representation of an HCA tree, and the process of building the first HCA corpus.

3.1. Annotation Framework

To present a framework for annotation of the clause hierarchy of complex sentences, we referenced and modified Payne’s version [14] of clause hierarchy due to his pellucid and comprehensive definitions of clause combination cases. As demonstrated in Figure 2, we did not consider compound verbs as a clause combination, as these cases are uncommon and produce one-verb clauses after annotation.

With the above version of clause hierarchy, we synthesized the HCA framework and built a dataset under the guidance of the framework. The annotation work consisted of a preprocessing stage with silver annotations transformed from existing schemas (constituent parsing and syntactic dependency parsing) and a manual proofreading phase with gold annotations on an elaborate browser-based annotator.

We list major concepts in the HCA framework.

3.1.1. Sentence and Clause

Sentences, typically starting with a capitalized word and ending with a complete stop, are principally units of written grammar and annotation inputs in HCA. A sentence must consist of at least one clause.

Clauses, considered core units of grammar, center around a verb phrase that largely determines what else must or may occur [26]. Clauses can be categorized by the inner verb type:

Finite: clauses that contain tensed verbs;
Non-finite: clauses that only contain non-tensed verbs such as ing-participles, ed-participles, and to-infinitives.

In the HCA framework, finite clauses should be annotated, while non-finite clauses that are separated by a comma are also segmented out.

3.1.2. Clause Combination

The main ways in which clauses combine to form sentences are by joining clauses of equal syntactic status (coordination) and subordinate relation (subordination):

(1): Coordination and coordinator:

Coordination is an interrelation between clauses that share the same syntactic status and are typically connected by a coordinator such as and, or, but, etc. In addition, coordinators can be correlative structures (e.g., either… or… and not only… but also…) or just substituted by comma punctuation.

(2): Subordination, subordinator, and antecedent:

Subordination occurs in a subordinate clause and a matrix clause that is superordinate to the subordinate clause. Subordination can be cataloged as follows:

Nominative: Function as clausal arguments or noun phrases in the matrix clause and can be subdivided into Subjective, Objective, Predicative, and Appositive.
Relative: Define or describe a preceding noun head in the matrix clause.
Adverbial: Function as a Condition, Concession, Reason, and such for the matrix clause.

Subordinators are the words that introduce a subordinate clause and indicate a semantic relation between the subordinate clause and its matrix clause, including subordinate conjunctions, relative pronouns, and relative adverbs. Simple subordinators contain a single word, e.g., that, wh-words, if, etc., while complex ones consist of more than one word, e.g., as if, so that, even though, etc. Antecedents are nouns or pronouns modified before relative clauses and nouns explained before appositive clauses.

To better explain these HCA definitions, we demonstrate some example sentences, which are segmented into multiple clauses in Table 4.

3.2. HCA Representation

As illustrated in Figure 3, we modeled the two basic hierarchical schemas with the concepts defined above, characterizing inter-clause relations with the same nucleus–satellite pattern in RST. To be specific, coordination is a multinuclear relation that involves two or more clauses (denoted as nucleus node

C_{i}

) dominated by the coordination node

c o

, while subordination is a mononuclear relation (denoted as a directed edge

s u b

) pointing from the matrix clause

C_{1}

(nucleus) to its subordinate clause

C_{2}

(satellite).

As a sentence consists of more clauses, its HCA representation can be a tree structure, where each node is a clause or an inter-clause coordination, and each directed edge is an inter-clause subordination. A three-layer HCA tree of a complex sentence involving five clauses and four interrelations is demonstrated in Figure 1a.

3.3. HCA Corpus

With the annotation framework discussed above, we aimed to build an HCA corpus for further research on the possibilities of applying clausal structure features to semantic parsing tasks. We chose the AMR 2.0 dataset as our corpus base, whose 39,260 sentences were collected from the DARPA BOLT and DEFT programs, various newswire data, and weblog data.

The annotation work was conducted in two phases. First, two existing syntactic features, i.e., constituent and syntactic dependency parse trees, were employed to produce silver HCA annotations with transformation rules. Second, human annotators with prior English grammar research experience and extensive hands-on annotation training reviewed and modified silver annotations in a browser-based annotation tool.

3.3.1. Silver Data from Existing Schemas

Previous researchers [27,28,29,30] utilized constituent-based and syntactic dependency parse trees to extract clauses from sentences with some manual rules. Following the experience from these works, we employed Stanza [31] as our constituent parser and syntactic dependency parser to obtain silver HCA data:

Constituency parse tree:

The constituency parse tree (CPT) represents the syntactic structure of a sentence using a tree, where the nodes are sub-phrases that belong to a specific category in the grammar and the edges are unlabeled. The transformation from the CPT to the silver HCA data consists of three phases:

Traverse non-leaf nodes in the CPT and find the clause-type nodes: S, SBAR, SBARQ, INV, and SQ.
Identify the tokens dominated by a clause-type node as a clause.
When a clause-type node dominates another one, an inter-clause relation between them is determined without an exact relation type.

As demonstrated in Figure 4, the first two clauses of the sentence exemplified in Section 1 are identified through their constituent parse tree. The SBAR node and its child node S are combined as a single clause, as no VP is dominated by the other child constituent IN of SBAR. Moreover, the S node on the top dominates the SBAR node, indicating subordination between the two clauses in dashed boxes.

Syntactic dependency parse tree:

The syntactic dependency parse tree (SDPT) consists of a set of directed syntactic relations between the words in the sentence whose root is either a non-copular verb or the subject complement of a copular verb. The transformation from SDPT to silver HCA consists of three phases:

Use a mapping of dependency relations to clause constituents: subjects (S) and the governor, i.e., a non-copular verb (V), via relation nsubj and such; objects (O) and complements (C) in V’s dependents via relations dobj, iobj, xcomp, ccomp, and such; adverbials (A) in V’s dependents via relations advmod, advcl, prep_in, and such.
When detecting a verb (Note that a copular verb in a clause and other constituents are dependents of the complement) in the sentence, a corresponding clause, consisting of the verb and its dependent constituents, can be identified.
If a clause governs another clause via a dependency relation, the interrelation between them can be determined by the relation label:
- Coordination: conj:and, conj:or, conj:but;
- Subjective: nsubj;
- Objective: dobj, iobj;
- Predicative: xcomp, ccmop;
- Appositive: appos;
- Relative: ccomp, acl:relcl, rcmod;
- Adverbial: advcl.

As demonstrated in Figure 5, the first two clauses of the sentence exemplified in Section 1 are identified through their syntactic dependency tree. Moreover, the inter-clause relation can be inferred as adverbial: conditional with the dependency relation advcl and the subordinator “If”.

3.3.2. Gold Data from Manual Annotator

As discussed above, the syntactic structures of CPT and SDPT can be transformed into clauses and inter-clause relations. However, these silver annotations are still unable to fulfill the need to build an HCA corpus for the following reasons:

Specific inter-clause relations cannot be obtained via the two syntactic structures, where CPT can only provide the existence of a relation without a label, and the dependency relations in SDPT have multiple mappings (e.g., ccomp to Predicative or Relative) or no mapping (e.g., advcl to no exact adverbial subtype such as conditional).
Pre-set transformation rules identify more clauses out of the HCA definitions. For example, the extracted non-finite clauses (e.g., to-infinitives) embedded in its matrix clause are too short and lead to hierarchical redundancies in the HCA tree.
The performances of two syntactic parsers degrade when encountering long and complex sentences, which are the core concerns of our HCA corpus.

Therefore, we recruited a group of human annotators with prior English grammar research experience to proofread these silver HCAs on a browser-based software ClausAnn 1.0 created for the annotation work. The Java Web application ClausAnn provides convenient operations and efficient keyboard shortcuts for annotators, and we open sourced it on our GitHub repository (https://github.com/MetroVancloud/ClausAnn, accessed on 15 May 2023). A typical annotation trial on ClausAnn consists of the following steps:

Review annotations from CPT, SDPT, or other annotators by switching the name tags in Figure 6a.
Choose an existing annotation to proofread or just start from the original sentence.
Segment a text span into two by double-clicking the blank space of a split point and select the relation between them in Figure 6b.
(a)
If the two spans are coordinated, select a specific coordination and label coordinators that indicate the interrelation in Figure 6c.
(b)
If the two spans are subordinated, designate the superordinate one, select a specific subordination, and label subordinators that indicate the interrelation in Figure 6d.
Remerge two text spans into one by clicking the two spans successively.
Repeat Steps 3 and 4 until all text spans are segmented into a set of clauses constructed in a right HCA tree.

3.3.3. Quality Assurance

There were mainly two steps taken jointly to ensure the quality of the final HCA corpus, i.e., multi-round annotation and consistency measurement.

The total annotation work consisted of 39,260 sentences, and three rounds of annotation were arranged by 5%, 5%, and 90% of the total sentences and conducted with a progressive and negotiable strategy. Before the first round, every annotator thoroughly understood the HCA framework and used the tool ClausAnn proficiently after adequate hands-on training. During the first two rounds, the lead annotator, who majors in English grammar, organized a discussion on complex or abnormal cases with other annotators.

For consistency measurement, we tracked inter-annotator agreement (IAA) after each round of the annotation work. As discussed in Section 2.1, the HCA and RST parsing tasks aim to extract the clause/EDU hierarchy from texts. Thus, the evaluation metrics of the discourse segmentation [17] and discourse parsing [18] subtasks in RST parsing were adopted as the IAA metrics in evaluating the annotation quality of the HCA corpus:

P/R/F $_{1}$ on clauses: precision, recall, and F $_{1}$ -score on the segmented clauses, where a positive match means that both segmented clauses from two annotators have the same start and end boundaries.
RST-Parseval [32] on interrelations: consisting of span, nuclearity, relation, and full used to evaluate unlabeled, nuclearity-, relation-, and fully labeled interrelations, respectively, between the matched clauses from two annotators.

In the first two rounds of annotation, a total of 10% sentences were double-annotated, and the ratio was 16% in the last round, higher than 13.8% in the RST-DT corpus [7]. According to the statistics, the IAA measured by the above two metrics grew as the annotation rounds increased, indicating that the two steps of multi-round annotation and consistency measurement played a significant role in ensuring annotation quality.

As shown in Table 5, the final IAA achieved high consistencies, where:

F $_{1}$ -scores on clauses ranged from 98.4 to 100;
RST-Parseval scores on interrelations ranged from 97.3, 97.0, 93.6, and 93.4 to 98.1, 97.8, 94.2, and 94.1 for span, nuclearity, relation, and full, respectively.

Compared with the RST-DT corpus, whose IAA score on EDUs ranged from 95.1 to 100 and IAA scores on rhetorical relations with three metrics, spans, nuclearity, relation, ranged from 77.8, 69.5, and 59.7 to 92.9, 88.2, and 79.2, our HCA corpus reached better consistencies, as the HCA framework has more-restricted definitions on the elementary unit (i.e., clauses) and fewer types of interrelation.

3.3.4. Dataset Detail

The resulting HCA-AMR2.0 dataset was based on AMR 2.0, which contains 39,260 sentences, and 19,376 (49.4%) sentences were paired with an HCA tree, while the rest were simple sentences with only one clause. The train, dev, and test set split followed the original split in AMR 2.0. Detailed statistics are listed in Table 6.

4. Model

In this paper, we modeled hierarchical clause annotation (HCA) as a two-stage task, i.e., clause segmentation and clause parsing, and provide auto-annotation baselines for each subtask. Clause segmentation segments a complex sentence into several clauses, while clause parsing links the clauses with interrelations into a clause tree.

4.1. Clause Segmentation

We modeled clause segmentation as a sequence-labeling task, where the input sentence is a sequence

X = (x_{1}, \dots, x_{i}, \dots, x_{n})

with n tokens and the output label sequence

Y = (y_{1}, \dots, y_{i}, \dots, y_{n}

). Note that

y_{i}

is binary, i.e.,

y_{i} = 1

if

x_{i}

is the head word token for a clause, and otherwise,

y_{i} = 0

. Therefore, we encountered the clause segmentation task with a sequence-tagging model, DisCoDisCo [17], used in the discourse segmentation task: embed the input sentence; encode with a single Bi-LSTM; decode with a linear projection layer; indicate the first token of each clause in the output tag sequence.

In the embedding layer, we relied on three kinds of word embeddings concatenated together:

(1): Bi-LSTM encoded character embeddings;
(2): Static word embeddings from fastText [33];
(3): Fine-tuning word embeddings from a pretrained language model (PLM).

Additionally, we introduced and embedded various grammatical information such as lemmas, parts-of-speech (POSs), and syntactic dependencies generated by Stanza [31]. These tokenwise feature embeddings were concatenated with word embeddings, and the word

w_{i}

in the input sentence was embedded as:

u_{i} = Concat (u_{i}^{w o r d} + u_{i}^{f e a t})

(1)

where

u_{i}^{w o r d}

is concatenated from three kinds of word embeddings and

u_{i}^{f e a t}

is grammatical features embeddings.

The embedding

u_{1 : n}

of the input sentence with n tokens were fed through a BiLSTM network, and the output

s_{i}

was then calculated by a linear projection layer to predict the segmentation tag for the i-th token:

s_{i} = Linear (BiLSTM (u_{1 : n}, i)) .

(2)

Given

s_{i}

, we obtained the predicted tag sequence

\hat{t}

and optimized the cross-entropy loss of

L (\hat{t}, t)

to train the weights of the PLM, the BiLSTM network, and the linear projection layer, where

t

is the gold tag sequence.

4.2. Clause Parsing

Given segmented clauses, {

c_{1}, \dots, c_{i}, \dots, c_{m}

}, in the input sentence, we modeled clause parsing as a sequence-to-sequence transduction, where the output is a linearized binary tree consisting of m clauses and

m - 1

inter-clause relations. Considering the same modeling of clause and discourse parsing, we employed discourse parsers in [18], which contain a span-based parser with the top-down strategy and a shift-reduce transition parser with the bottom-up strategy for their simple architectures and open codes. Overviews of the parsers are shown in Figure 7 and Figure 8. Note that the two parsing strategies share the same word embedding layer to represent text spans.

4.2.1. Text Span Embedding

In the process of clause parsing, the representation of text spans is needed for either “span-to-clause” splitting in the top-down parsing strategy or “clause-to-span” combining in the bottom-up parsing strategy. Therefore, we transformed the input sentence into a subword sequence

{t_{1}, t_{2}, \dots t_{n}}

and obtained the embedding

{w_{1}, w_{2}, \dots w_{n}}

using a PLM. The embedding for a text span

u_{i : j}

, consisting of the i-th clause to the j-th clause, is obtained by averaging the vector of both edge subwords:

u_{i : j} = (w_{b (i)} + w_{e (j)}) / 2,

(3)

where

b (i)

returns the index of the begin subword in the i-th clause and

e (j)

returns that of the end subword in the j-th clause.

4.2.2. Top-Down Strategy

The top-down parser splits each span into smaller ones recursively until the span becomes a single clause. We introduced biaffine networks [34] for span splitting and a loss penalty.

For each position k in a span consisting of the i-th clause to the j-th clause, a scoring function

s_{split} (i, j, k)

is defined as follows:

s_{split} (i, j, k) = h_{i : k} {Wh}_{k + 1 : j} + v_{left} h_{i : k} + v_{right} h_{k + 1 : j}

(4)

where

W

,

v_{left}

, and

v_{right}

are weight matrices in the biaffine layer for splitting a text span. Here,

h_{i : k}

and

h_{k + 1 : j}

are defined as follows:

\begin{matrix} h_{i : k} & = {FFN}_{left} (u_{i : k}), \end{matrix}

(5)

h_{k + 1 : j} = {FFN}_{right} (u_{k + 1 : j})

(6)

Then, the span is split at the end of an inner clause that maximizes Equation (4):

\hat{k} = arg max_{i \leq k < j} s_{split} (i, j, k) .

(7)

When splitting a span at the end of the

\hat{k}

-th clause, the score of the nuclearity and relation labels for the two spans is defined as follows:

s_{label} (i, j, \hat{k}, ℓ) = h_{i : \hat{k}} W^{ℓ} h_{\hat{k} + 1 : j} + v_{left}^{ℓ} h_{i : \hat{k}} + v_{right}^{ℓ} h_{\hat{k} + 1 : j}

(8)

where

W^{ℓ}

,

v_{left}^{ℓ}

, and

v_{right}^{ℓ}

are weight matrices in the biaffine layer for predicting an inter-clause relation. Then, the label that maximizes Equation (8) is assigned to the spans:

\hat{ℓ} = arg max_{ℓ \in L} s_{label} (i, j, \hat{k}, ℓ)

(9)

where

L

denotes three nuclearity labels,

{\overset{s u b .}{⟶}, \overset{s u b .}{⟵}, \overset{c o .}{⟷}}

, for predicting the nuclearity and a set of inter-clause relations labels for predicting the exact relation. Note that the parameters in biaffine layers and the

FFN

s for the nuclearity and relation labeling are learned separately.

4.2.3. Bottom-Up Strategy

Formally, in a shift-reduce model with a bottom-up strategy, the parsing state is denoted as a tuple

(S, Q)

, where S is a stack that stores processed clauses and Q is a queue that contains incoming clauses. Each element in S can be an unreduced clause

e_{i}

or a combined composite item

e_{i : j}

. At each step, the parser chooses one of the following actions with an FFN classifier and updates the state

(S, Q)

:

$S H I F T$ : pop the first clause off Q, and push it onto S.
$R E D U C E$ : pop two elements from S, and push a new combined composite item that has the popped subtrees as its children onto S as a single composite item.

We employed three FFN classifiers,

{FFN}_{act}

,

{FFN}_{nuc}

, and

{FFN}_{rel}

, where

{FFN}_{act}

predicts an action, and the remaining two decide the nuclearity and the label of an inter-clause relation after a

R E D U C E

action. Specifically, the output dimension of

{FFN}_{act}

is 2 (

S H I F T

or

R E D U C E

), that of

{FFN}_{nuc}

is 3 (

\overset{s u b .}{⟶}

,

\overset{s u b .}{⟵}

, or

\overset{c o .}{⟷}

), and that of

{FFN}_{rel}

is the number of inter-clause relations defined in the HCA framework. Three classifier outputs

s_{*}

are defined as

s_{*} = {FFN}_{*} (Concat (u_{s_{0}}, u_{s_{1}}, u_{q_{0}}))

(10)

where function

Concat

concatenates three state vectors:

u_{s_{0}}

is the representation of the top clause stored in S,

u_{s_{1}}

is that in the second clause of S, and

u_{q_{0}}

is that in the first clause in Q. Weights of the PLM and each

FFN

are trained by optimizing the cross-entropy loss of

s_{act}

,

s_{nuc}

, and

s_{rel}

.

5. Experiments

In this section, we elaborate on the experimental details of the proposed baseline models for two HCA subtasks, clause segmentation, and clause parsing, on the novel HCA-AMR2.0 corpus.

5.1. Dataset

Except for the HCA-AMR2.0 corpus, we reference three discourse analysis corpora, GUM [35], STAC [36], and RST-DT [7], to further verify the effectiveness of the proposed models and expound the distinctive advantages of HCA-AMR2.0 over these discourse analysis corpora:

HCA-AMR2.0: The first HCA corpus was annotated on sentences in the AMR 2.0 dataset. The source data included discussion forums collected for the DARPA BOLT AND DEFT programs, transcripts and English translations of Mandarin Chinese broadcast news programming from China Central TV, Wall Street Journal text, translated Xinhua news texts, various newswire data from NIST OpenMT evaluations, and weblog data used in the DARPA GALE program.
GUM: The Georgetown University Multilayer corpus was created as part of the course LING-367 (Computational Corpus Linguistics) at Georgetown University, and its annotation followed the RST-DT segmentation guidelines for English. The text sources consist of 35 documents of news, interviews, instruction books, and travel guides from WikiNews, WikiHow, and WikiVoyage.
STAC: The Strategic Conversation dataset is a corpus of strategic chat conversations in 45 games annotated with negotiation-related information, dialogue acts, and discourse structures in the segmented discourse representation theory (SDRT) framework.
RST-DT: RST Discourse Treebank was developed by researchers at the Information Sciences Institute (University of Southern California), the U.S. Department of Defense, and the Linguistic Data Consortium (LDC). It comprises 385 Wall Street Journal articles from the Penn Treebank annotated with discourse structure in the RST framework.

For the clause segmentation subtask, we selected all three discourse analysis datasets evaluated on the discourse segmentation task as a comparison, and the data files are all in conll format. For the clause parsing subtask, we only selected the RST-DT dataset evaluated on the discourse parsing task as a comparison, as previous works of discourse parsing experiments on the other two datasets used accuracy as the evaluating metric, other than Parseval. The interrelation data files in HCA and RST-DT were preprocessed by Heilman and Sagae’s system [37]. The important dataset statistics related to the experiments are listed in Table 7.

5.2. Experimental Environments

The information on the main hardware and software used in our experimental environments is listed in Table 8.

5.3. Hyper-Parameters

For the hyper-parameters in the models for clause segmentation and clause parsing subtasks, we list their layer, name, and value in Table 9. Note that both models are not trained until reaching the maximum epochs, where the model of clause segmentation is trained in about thirteen epochs for five hours and the model of clause parsing is finished in about seven epochs for ten hours with the experimental environments introduced in Section 5.2. For the batch size, we tested several candidates and found that there was no significant performance gaps. For the optimizer, we tested AdamW and SGD in the hyper-parameter-tuning experiments of both clause segmentation and parsing subtasks. The results showed that AdamW outperformed SGD and converged faster.

5.4. Evaluation Metrics

As described in Section 3.3.3, we introduced two metrics for IAA to evaluate the annotation consistencies between two annotators. Thus, we still used P/R/F

_{1}

on clauses for the clause segmentation subtask and RST-Parseval on inter-clause relations for the clause parsing subtask to evaluate the consistencies between gold and predicted annotations.

5.5. Baseline Models

We aimed to utilize models proposed by previous works on the discourse segmentation and discourse parsing tasks and experimented with our HCA-AMR2.0 corpus to obtain effective baseline models for the novel clause segmentation and parsing subtasks.

5.6. Experimental Results

The experiments of the clause segmentation and parsing subtasks were conducted separately, and the experimental results were compared with those of previous works on discourse segmentation and parsing tasks, respectively.

5.6.1. Results of Clause Segmentation

We adapted the discourse parser DisCoDisCo [17] to the clause segmentation task and conducted an ablation study on different features, e.g., lemma, syntactic dependency, parts-of-speech, and static word embedding fastText, to explore which contributed more to the performances. The experimental results of the clause segmentation task are reported in Table 10, as well as the performances of previous works on the discourse segmentation task for comparison. From the results, we made the following observations:

Compared with the 98.4–100 IAA scores on clause segmentation mentioned in Section 3.3.3, the adapted DisCoDisCo model achieved satisfactory performances, i.e., 91.3 F $_{1}$ -scores.
In the ablation study of different embedded features, the static word embedding fastText, which gained a 4.9 F $_{1}$ -score improvement, contributed the most to all features, although other features also had positive impacts on the performances.
In the discourse segmentation task, the DisCoDisCo model outperformed the GumDrop model on the GUM and RST-DT datasets. Although GumDrop performed better than DisCoDisCo on the STAC dataset, its performance sharply declined with 14.7 F $_{1}$ -scores when removing extra gold features. Thus, we chose to apply the DisCoDisCo model to the clause segmentation subtask.
Experimenting with the same model DisCoDisCo, performances on clause segmentation had about 3–5 F $_{1}$ -scores, lower than on the discourse segmentation, indicating that the performance gaps could be attributed to the corpora. From the statistics in Table 6, an average of 3.1 clauses constitute a sentence in HCA-AMR2.0, more than that in GUM (2.3 EDUs), STAC (1.1 EDUs), and RST-DT (2.6 EDUs). Besides, only HCA-AMR2.0 contains a certain number of weblog data, which are less formal in English grammar and may cause more obstacles for the DisCoDisCo model.

5.6.2. Results of Clause Parsing

We employed the bottom-up and top-down discourse parsers in [18] as our clause-parsing models and conducted experimental trials of the base version of different pretrained language models (PLMs, i.e., BERT [39], RoBERTa [40], SpanBERT [41], XLNet [42], and DeBERTa [43]) to obtain better performances. Table 11 demonstrates the main results of both the bottom-up and top-down models on clause parsing, as well as that on discourse parsing for comparison. From the results, we have the following observations:

Better performances were obtained in clause parsing than in discourse parsing by either top-down or bottom-up parsers with whatever PLMs.
The best performances of clause parsing were obtained by the bottom-up parser with the pretrained DeBERTa, where the performance reached up to 97.0 F $_{1}$ -scores on Parseval-Span and fell to 87.7 F $_{1}$ -scores on Parseval-Full.
All the best performances in either clause parsing or discourse parsing were obtained by parsers with pretrained XLNet and DeBERTa, indicating that these two PLMs are more suitable for relation classification tasks than other PLMs.
Experimenting with a same model and the same pretrained language model, the performances on clause parsing had about 6–9 Parseval-Full scores, higher than that on discourse parsing, indicating that the performance gaps could be attributed to the differences between corpora. RST-DT contains 18 classes of relations partitioned from 78 types of blurry rhetorical relations, while HCA has 18 types of distinguishable semantic relations, half of which were subdivided from adverbial.

To obtain a clause parser with better performances, we conducted experimental trials with the large versions of the PLMs, and the results are illustrated in Table 12. As can be observed from the results:

All the parsers with the corresponding large PLMs performed better than with the base PLMs.
The bottom-up parser with DeBERTa-large achieved a better performance, with a 0.8 F $_{1}$ -score improvement on Parseval-Full over the parser with DeBERTa-base.

6. Discussion

6.1. Potentialities of HCA

As discussed in Section 1, we demonstrated the same hierarchy of the HCA tree, the AMR graph, and the SDG of a complex sentence, indicating the potentialities of utilizing the structural information of the HCA tree to improve semantic parsing. Thus, we provided two case studies, where simple transformation rules derived from the HCA tree can be applied to these two semantic parsing tasks.

6.1.1. Case Study for AMR Parsing

For AMR parsing, we employed the state-of-the-art AMR parser proposed in [1] to predict the AMR graph of the exemplified sentence in Section 1. As shown in Figure 9, two dotted red edges were missed by the parser when compared with the gold AMR, while two solid red edges were mistakenly predicted.

From the HCA tree given in Figure 1a, the clause “I get very anxious” mapping to the subgraph

G_{2}

and the clause “but often the anxiety is so much” mapping to subgraph

G_{4}

were coordinate and contrastive. Therefore, we provide a transformation rule:

Transform the inter-clause relation but to an AMR node contrast-01 and two AMR edges directing to the root nodes of $G_{2}$ and $G_{4}$ .

Meanwhile, the clause “that I can not wait that long” mapping to subgraph

G_{5}

is a resultative adverbial clause subordinated to the clause “but often the anxiety is so much” mapping to the subgraph

G_{4}

. Note that the verb of the matrix clause is the copular “is”, and the subordinate clause modifies the complement “much” in the matrix clause. Therefore, a new transformation rule can be derived:

2.: Transform the inter-clause relation resultative to an AMR node cause-01 and two AMR edges directing to the root node of $G_{5}$ and the node much in $G_{4}$ .

With these two transformation rules, we can delete the solid red edges, which were mistakenly predicted, and add the dotted red edges, which were missed.

6.1.2. Case Study for Semantic Dependency Parsing

For semantic dependency parsing, we employed the state-of-the-art AMR parser proposed in [2] to predict the semantic dependency graph (SDG) of the exemplified sentence in Section 1. As shown in Figure 10, three dotted red edges were missed by the parser when compared with the gold SDG.

As discussed in Section 6.1.1, inter-clause relations among three clauses mapping to subgraphs

G_{2}

,

G_{4}

, and

G_{5}

can be derived by the following transformation rules:

Transform the inter-clause relation but to a dependency edge between root nodes (i.e., anxious and much) of $G_{2}$ and $G_{4}$ .
Transform the inter-clause relation resultative to a dependency edge between root nodes (i.e., much and can) of $G_{4}$ and $G_{5}$ .

Moreover, the relative subordinate clause “which does sort of go away after 15–30 min” mapping to subgraph

G_{3}

modifies the complement “anxious” in the matrix clause “I get very anxious” mapping to subgraph

G_{2}

. Thus, a new transformation rule can be derived:

3.: Transform the inter-clause relation relative to a dependency edge between the root node go $G_{3}$ and the node anxious) of $G_{2}$ .

With these three transformation rules, we can add the dotted red edges, which were missed by the parser.

To sum up, we provided two case studies demonstrating the potentialities of HCA in semantic parsing, where transformation rules derived from the HCA tree were applied to modify two state-of-the-art parsers of the AMR parsing and semantic dependency parsing tasks.

6.2. Future Work

As demonstrated in the experimental results, we adapted the discourse segmenter and discourse parser into our HCA subtasks (i.e., clause segmentation and parsing) and achieved satisfactory performances. However, there is still much room for improvement compared with the IAA scores of manual annotation, and it is our biasness to adapt existing models of discourse segmentation and parsing for our HCA subtasks to serve as baselines. Therefore, we aim to design better models for the HCA task in further research.

In the case studies, we derived some transformation rules from the HCA tree to modify the AMR graph and the SDG, which is explainable to humans, but impractical. For this limitation, we aim to explore better ways of integrating the HCA structure in semantic parsing and more-downstream NLP tasks.

7. Conclusions

In this paper, we proposed a novel framework, hierarchical clause annotation (HCA), to segment complex sentences into clauses and capture inter-clause relations with strict definitions from the linguistic research of clause hierarchy. We aimed to explore the potentialities of integrating the HCA structural features in semantic parsing with complex sentences, avoiding the deficiencies of previous works such as RST-parsing, SRRP, TS, SSD, etc. Following the HCA framework, we built up a large HCA corpus comprising 19,376 English sentences from AMR 2.0. The annotation consisted of silver data transformed from the constituency and syntactic dependency parse trees and gold data annotated by experienced human annotators using a newly created tool, ClausAnn. Moreover, we decomposed HCA into two subtasks, i.e., clause segmentation and clause parsing, and provided effective baseline models for both subtasks to generate more HCA data.

Author Contributions

Conceptualization, Y.F. and Z.G.; methodology, Y.F. and B.L.; software, Y.F.; annotation, all authors; writing—original draft, Y.F.; writing—review and editing, B.L., Y.S., M.G., C.S., S.C. and Z.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in our experiments are publicly available: Hierarchical Clause Annotation-AMR2.0 (HCA-AMR2.0) at https://github.com/MetroVancloud/HCASeg (accessed on 5 November 2022) and https://github.com/MetroVancloud/HCAParser (accessed on 5 November 2022), Abstract Meaning Representation (AMR) Annotation Release 2.0 (AMR 2.0) at https://catalog.ldc.upenn.edu/LDC2017T10 (accessed on 15 June 2021), RST Discourse Treebank (RST-DT) at https://catalog.ldc.upenn.edu/LDC2002T07 (accessed on 11 October 2022), The Georgetown University Multilayer corpus (GUM) at https://gucorpling.org/gum (accessed on 15 April 2023), Strategic Conversation (STAC) at https://www.irit.fr/STAC/ (accessed on 5 April 2023).

Acknowledgments

Our code extends two GitHub repositories https://github.com/gucorpling/DisCoDisCo (accessed on 1 March 2023) and https://github.com/nttcslab-nlp/RSTParser_EMNLP22 (accessed on 10 March 2023), with much thanks to them.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AMR	Abstract meaning representation
BiLSTM	Bidirectional long short-term memory
CPT	Constituency parse tree
FFN	Feed-forward neural network
HCA	Hierarchical clause Annotation
IAA	Inter-annotator agreement
PLM	Pretrained language model
POSs	Parts-of-speech
RST	Rhetorical structure theory
RST-DT	Rhetorical Structure Theory Discourse Treebank
SDG	Semantic dependency graph
SDPT	Syntactic dependency parse tree
SPRP	Split-and-rephrase
SSD	Simple sentence decomposition
TS	Text simplification

References

Sataer, Y.; Shi, C.; Gao, M.; Fan, Y.; Li, B.; Gao, Z. Integrating Syntactic and Semantic Knowledge in AMR Parsing with Heterogeneous Graph Attention Network. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
Li, B.; Gao, M.; Fan, Y.; Sataer, Y.; Gao, Z.; Gui, Y. DynGL-SDP: Dynamic Graph Learning for Semantic Dependency Parsing. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; International Committee on Computational Linguistics: Prague, Czech Republic, 2022; pp. 3994–4004. [Google Scholar]
Tian, Y.; Song, Y.; Xia, F.; Zhang, T. Improving Constituency Parsing with Span Attention. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 1691–1703. [Google Scholar] [CrossRef]
He, L.; Lee, K.; Lewis, M.; Zettlemoyer, L. Deep Semantic Role Labeling: What Works and What’s Next. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 473–483. [Google Scholar] [CrossRef]
Tang, G.; Müller, M.; Rios, A.; Sennrich, R. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 4263–4272. [Google Scholar] [CrossRef]
Xu, J.; Gan, Z.; Cheng, Y.; Liu, J. Discourse-Aware Neural Extractive Text Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 5021–5031. [Google Scholar] [CrossRef]
Carlson, L.; Marcu, D.; Okurovsky, M.E. Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory. In Proceedings of the Second SIGdial Workshop on Discourse and Dialogue, Aalborg, Denmark, 1–2 September 2001. [Google Scholar]
Narayan, S.; Gardent, C.; Cohen, S.B.; Shimorina, A. Split and Rephrase. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 606–616. [Google Scholar] [CrossRef]
Zhang, X.; Lapata, M. Sentence Simplification with Deep Reinforcement Learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 584–594. [Google Scholar] [CrossRef]
Gao, Y.; Huang, T.H.; Passonneau, R.J. ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 3919–3931. [Google Scholar] [CrossRef]
Banarescu, L.; Bonial, C.; Cai, S.; Georgescu, M.; Griffitt, K.; Hermjakob, U.; Knight, K.; Koehn, P.; Palmer, M.; Schneider, N. Abstract Meaning Representation for Sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, Sofia, Bulgaria, 8–9 August 2013; Association for Computational Linguistics: Stroudsburg, PA, USA, 2013; pp. 178–186. [Google Scholar]
Oepen, S.; Kuhlmann, M.; Miyao, Y.; Zeman, D.; Cinková, S.; Flickinger, D.; Hajič, J.; Urešová, Z. SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, CO, USA, 4–5 June 2015; Association for Computational Linguistics: Stroudsburg, PA, USA, 2015; pp. 915–926. [Google Scholar] [CrossRef]
Mann, W.C.; Thompson, S.A. Rhetorical structure theory: Toward a functional theory of text organization. Text-Interdiscip. J. Study Discourse 1988, 8, 243–281. [Google Scholar] [CrossRef]
Payne, T.E. Understanding English Grammar: A Linguistic Introduction; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar] [CrossRef]
Marcus, M.P.; Santorini, B.; Marcinkiewicz, M.A. Building a Large Annotated Corpus of English: The Penn Treebank. Comput. Linguist. 1993, 19, 313–330. [Google Scholar]
Rabani, S.T.; Ud Din Khanday, A.M.; Khan, Q.R.; Hajam, U.A.; Imran, A.S.; Kastrati, Z. Detecting suicidality on social media: Machine learning at rescue. Egypt. Inform. J. 2023, 24, 291–302. [Google Scholar] [CrossRef]
Gessler, L.; Behzad, S.; Liu, Y.J.; Peng, S.; Zhu, Y.; Zeldes, A. DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection. In Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021), Punta Cana, Dominican Republic, 11 November 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 51–62. [Google Scholar] [CrossRef]
Kobayashi, N.; Hirao, T.; Kamigaito, H.; Okumura, M.; Nagata, M. A Simple and Strong Baseline for End-to-End Neural RST-style Discourse Parsing. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 6725–6737. [Google Scholar]
Tjong Kim Sang, E.F.; Déjean, H. Introduction to the CoNLL-2001 shared task: Clause identification. In Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning (ConLL), Toulouse, France, 6–7 July 2001. [Google Scholar]
Al-Thanyyan, S.S.; Azmi, A.M. Automated Text Simplification: A Survey. ACM Comput. Surv. 2021, 54, 1–36. [Google Scholar] [CrossRef]
Givón, T. Syntax: An Introduction; John Benjamins: Amsterdam, The Netherlands, 2001; Volume I. [Google Scholar]
Matthiessen, C.M. Combining clauses into clause complexes: A multi-faceted view. In Complex Sentences in Grammar and Discourse; John Benjamins: Amsterdam, The Netherlands, 2002; pp. 235–319. [Google Scholar]
Hopper, P.J.; Traugott, E.C. Grammaticalization; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Aarts, B. Syntactic Gradience: The Nature of Grammatical Indeterminacy; Oxford University Press: Oxford, UK, 2007. [Google Scholar]
Givón, T. On Understanding Grammar: Revised Edition; John Benjamins: Amsterdam, The Netherlands, 2018; 321p. [Google Scholar]
Carter, R.; McCarthy, M. Cambridge Grammar of English: A Comprehensive Guide; Spoken and Written English Grammar and Usage; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
Feng, S.; Banerjee, R.; Choi, Y. Characterizing Stylistic Elements in Syntactic Structure. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Republic of Korea, 12–14 July 2012; Association for Computational Linguistics: Stroudsburg, PA, USA, 2012; pp. 1522–1533. [Google Scholar]
Del Corro, L.; Gemulla, R. ClausIE: Clause-based open information extraction. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 355–366. [Google Scholar]
Vo, D.T.; Bagheri, E. Self-training on refined clause patterns for relation extraction. Inf. Process. Manag. 2018, 54, 686–706. [Google Scholar] [CrossRef]
Oberländer, L.A.M.; Klinger, R. Token Sequence Labeling vs. Clause Classification for English Emotion Stimulus Detection. In Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics, Barcelona, Spain (Online), 12–13 December 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 58–70. [Google Scholar]
Qi, P.; Zhang, Y.; Zhang, Y.; Bolton, J.; Manning, C.D. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Online, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 101–108. [Google Scholar] [CrossRef]
Morey, M.; Muller, P.; Asher, N. How much progress have we made on RST discourse parsing? A replication study of recent results on the RST-DT. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 1319–1324. [Google Scholar] [CrossRef]
Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching Word Vectors with Subword Information. Trans. Assoc. Comput. Linguist. 2017, 5, 135–146. [Google Scholar] [CrossRef]
Dozat, T.; Manning, C.D. Deep Biaffine Attention for Neural Dependency Parsing. In Proceedings of the International Conference on Learning Representations, San Juan, PR, USA, 2–4 May 2016; pp. 1–8. [Google Scholar]
Zeldes, A. The GUM Corpus: Creating Multilayer Resources in the Classroom. Lang. Resour. Eval. 2017, 51, 581–612. [Google Scholar] [CrossRef]
Asher, N.; Hunter, J.; Morey, M.; Farah, B.; Afantenos, S. Discourse Structure and Dialogue Acts in Multiparty Dialogue: The STAC Corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia, 23–28 May 2016; European Language Resources Association (ELRA): Paris, France, 2016; pp. 2721–2727. [Google Scholar]
Heilman, M.; Sagae, K. Fast Rhetorical Structure Theory Discourse Parsing. arXiv 2015, arXiv:1505.02425. [Google Scholar]
Yu, Y.; Zhu, Y.; Liu, Y.; Liu, Y.; Peng, S.; Gong, M.; Zeldes, A. GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, Minneapolis, MN, USA, 6 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 133–143. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 6 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Joshi, M.; Chen, D.; Liu, Y.; Weld, D.S.; Zettlemoyer, L.; Levy, O. SpanBERT: Improving Pre-training by Representing and Predicting Spans. Trans. Assoc. Comput. Linguist. 2020, 8, 64–77. [Google Scholar] [CrossRef]
Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
He, P.; Liu, X.; Gao, J.; Chen, W. DeBERTa: Decoding-enhanced BERT with Disentangled Attention. arXiv 2021, arXiv:2006.03654. [Google Scholar]

Figure 1. Clauses

C_{i}

in (a) correspond to subgraphs

G_{i}

in (b,c), respectively. Colored directed edges in (a) are inter-clause relations, mapping the same-colored AMR nodes and edges in (b) and semantic dependencies in (c). Note that reentrant AMR relations in (b) introduced by the pronoun “I” are omitted to save space, as well as semantic dependencies between orphan tokens and the root token “If” in (c).

Figure 1. Clauses

C_{i}

in (a) correspond to subgraphs

G_{i}

in (b,c), respectively. Colored directed edges in (a) are inter-clause relations, mapping the same-colored AMR nodes and edges in (b) and semantic dependencies in (c). Note that reentrant AMR relations in (b) introduced by the pronoun “I” are omitted to save space, as well as semantic dependencies between orphan tokens and the root token “If” in (c).

Figure 2. Modified version of Payne’s clause hierarchy.

Figure 3. Two basic hierarchical schemas in HCA, where node

C_{i}

, node

c o

, and edge

s u b

represent a clause, coordination, and subordination, respectively.

Figure 3. Two basic hierarchical schemas in HCA, where node

C_{i}

, node

c o

, and edge

s u b

represent a clause, coordination, and subordination, respectively.

Figure 4. Extract clauses and inter-clause relations via the constituency parse tree. Two clauses in dashed boxes are identified by underlined clause-type nodes S and the child node SBAR. Note that child constituent nodes of the left VP and the right ADJP are omitted to save space.

Figure 5. Extract clauses and inter-clause relations via the syntactic dependency parse tree. Two clauses in dashed boxes are identified by the underlined verb and the governed constituents. The inter-clause relation adverbial can be determined by the dependency advcl between the two clauses.

Figure 6. Operating steps of an annotation trial in the browser-based tool, ClausAnn. (a) Switch annotator labels and review the corresponding annotation. (b) Segment a text span into two and choose a coordination or subordination between them. (c) When choosing coordination, select the exact coordinate relation, i.e., and, or, or but. (d) When choosing subordination, select the superordinate clause and the exact subordinate relation, i.e., Subjective, Objective, and such.

Figure 7. Top-down clause parsing.

Figure 8. Bottom-up clause parsing.

Figure 9. Abstract meaning representation (AMR) graph predicted by the state-of-the-art AMR parser. Red dotted relation edges, which were missed by the parser, can be recovered by transformation rules derived from the HCA tree. Red solid relation edges, which were mistakenly predicted by the parser, can be deleted by transformation rules derived from the HCA tree.

Figure 10. Semantic dependency graph (SDG) predicted by the state-of-the-art semantic dependency parser, DynGL-SDP. Dotted red dependency edges, which were missed by the parser, can be recovered by transformation rules derived from the HCA tree.

Table 1. Comparison between our HCA task and the RST-parsing task. Two exemplified sentences are from the RST Discourse Tagging Reference Manual. Units (i.e., clauses or EDUs) are segmented by square brackets and index marks

_{i}

. Relations between units are represented as arrows directed from a matrix clause or a nucleus EDU to a subordinate clause or a satellite EDU with a specific relation.

Table 1. Comparison between our HCA task and the RST-parsing task. Two exemplified sentences are from the RST Discourse Tagging Reference Manual. Units (i.e., clauses or EDUs) are segmented by square brackets and index marks

_{i}

. Relations between units are represented as arrows directed from a matrix clause or a nucleus EDU to a subordinate clause or a satellite EDU with a specific relation.

Task	Output Description	Output Example
Task	Output Description	Unit	Relation
Hierarchical Clause Annotation	Clause trees built up by clauses and inter-clause relations	(1) [But some big brokerage firms said $]_{1}$ [they don’t expect major problems as a result of margin calls. $]_{2}$	$1 \overset{o b j e c t i v e}{\to} 2$
Hierarchical Clause Annotation	Clause trees built up by clauses and inter-clause relations	(2) [Despite their considerable incomes and assets, one-fourth don’t feel $]_{1} [$ that they have made it. $]_{2}$	$1 \overset{p r e d i c a t i v e}{\to} 2$
RST Parsing	Discourse trees built up by EDUs and rhetorical relations	(1) [But some big brokerage firms said $]_{1}$ [they don’t expect major problems $]_{2} [$ as a result of margin calls. $]_{3}$	$1 \overset{a t t r i b u t i o n}{\to} 2$ $2 \overset{r e s u l t}{\to} 3$
RST Parsing	Discourse trees built up by EDUs and rhetorical relations	(2) [Despite their considerable incomes and assets, $]_{1} [$ one-fourth don’t feel that they have made it. $]_{2}$	$2 \overset{c o n c e s s i o n}{\to} 1$

Table 2. Comparison between our HCA task and similar tasks that decompose complex sentences into parts. The input sentence “If I do not check, I get very anxious, which does sort of go away after 15–30 min, but the anxiety is so much that I can not wait that long.” was selected from the AMR 2.0 dataset and exemplified in Section 1. Underlined words in the Output Example column of each task are modified from the original sentence, while ~~crossed~~ words are deleted from the original sentence.

Task	Output Description	Output Example
Hierarchical Clause Annotation	Finite clauses and non-finite clauses separated by a comma	(1) If I do not check, (2) I get very anxious, (3) which does sort of go away after 15–30 min, (4) but often the anxiety is so much (5) that I can not wait that long.
Clause Identification	Finite clauses, non- tensed verb phrases, coordinators, and subordinators	(1) If (2) I do not check, (3) I get very anxious, (4) which (5) does sort of go away after 15–30 min, (6) but (7) often the anxiety is so much (8) that (9) I can not wait that long.
Split-and -Rephrase	Shorter sentences	(1) If I do not check, I get very anxious. (2) The anxieties does sort of go away after 15–30 min. (3) But often the anxiety is so much that I can not wait that long.
Text Simplification	Sentences with simpler syntax	(1) If I do not check. (2) I get very anxious (3) The anxiety lasts for 15–30 min. (3) ~~But~~ I am often too anxious to wait that long.
Simple Sentence Decomposition	Simple sentences with only one clause	(1) If I do not check. (2) I get very anxious, (3) Th e anxiety does sort of go away after 15–30 min. (4) ~~But~~ Often the anxiety is so much. (5) ~~that~~ I can not wait that long.

Table 3. Three main types of clause hierarchy and the clines of their clause integration tightness degree.

Type	Cline of Clause Integration Tightness Degree
Matthiessen [22]	Embedded > Hypotaxis > Parataxis > Cohesive Devices > Coherence
Hopper and Traugott [23]	Subordination > Hypotaxis > Parataxis
Payne [14]	Compound Verb > Clausal Argument > Relative > Adverbial > Coordinate > Sentence

Table 4. Examples of sentences with different types of inter-clause relations. Clauses are segmented by square brackets and clause marks

C_{i}

. The underlined, double-underlined, and wave-underlined words are coordinators, subordinators, and antecedents, respectively.

Table 4. Examples of sentences with different types of inter-clause relations. Clauses are segmented by square brackets and clause marks

C_{i}

. The underlined, double-underlined, and wave-underlined words are coordinators, subordinators, and antecedents, respectively.

	Relation	Example Sentence
Coordination
(1)	And	[He should have been here at five $]_{C_{1}} [$ and he’s not here yet. $]_{C_{2}}$
Subordination
(2)	Subjective	[What we follow $]_{C_{1}} [$ is a foreign security strategic philosophy. $]_{C_{2}}$
(3)	Objective	[He knows $]_{C_{1}} [$ what it takes to start a business here. $]_{C_{2}}$
(4)	Predicative	[The reason is $]_{C_{1}} [$ that you lack confidence. $]_{C_{2}}$
(5)	Appositive	[I’ve accepted defeat $]_{C_{1}} [$ that this year of my life is a failure. $]_{C_{2}}$
(6)	Relative	[We’ve entered into an age $]_{C_{1}} [$ when dreams can be achieved. $]_{C_{2}}$
(7)	Adverbial	[He’d need to do his exam $]_{C_{1}} [$ before he went. $]_{C_{2}}$

Table 5. Inter-annotation agreement (IAA) of 16% double-annotated sentences in the HCA corpus by ten annotators marked as 1 to 10. Note that bold and underlined figures indicate the highest and lowest consistencies in the corresponding metrics, respectively.

Annotator	Clause			Interrelation
Annotator	P	R	F $_{1}$	Span	Nuc.	Rel.	Full
1, 2	99.9	99.8	99.8	97.9	97.5	94.3	94.1
1, 3	100	100	100	98.1	97.8	94.2	94.1
2, 4	99.8	99.5	99.6	97.5	97.3	93.9	93.8
1, 5	99.0	98.3	98.6	98.0	97.7	94.2	94.0
6, 7	99.6	99.1	99.3	97.3	97.0	93.8	93.6
4, 8	99.4	99.3	99.3	97.9	93.9	93.7	93.5
1, 9	99.0	98.6	98.8	97.2	97.0	93.9	93.8
5, 10	98.8	98.0	98.4	97.3	97.1	93.6	93.4

Table 6. Main statistics of the hierarchical clause annotation dataset based on AMR 2.0 (HCA-AMR2.0). * means that some input sequences contain multiple sentences, and the coordination

M u l S n t

is necessary for these inter-sentence relations in these cases.

^{★}

indicates that

A d v e r b i a l

can be divided into nine sub-types such as

C o n d i t i o n

,

C o n c e s s i o n

, and

P u r p o s e

. Note that “#” represents the number of the subsequent item.

Table 6. Main statistics of the hierarchical clause annotation dataset based on AMR 2.0 (HCA-AMR2.0). * means that some input sequences contain multiple sentences, and the coordination

M u l S n t

is necessary for these inter-sentence relations in these cases.

^{★}

indicates that

A d v e r b i a l

can be divided into nine sub-types such as

C o n d i t i o n

,

C o n c e s s i o n

, and

P u r p o s e

. Note that “#” represents the number of the subsequent item.

Item	Occurrence	Relation	Occurrence
# Sentences (S)	39,260	MulSnt *	3249
# of S with HCA	19,376	And	14,952
# of S in Train Set	17,885	Or	974
# of S in Dev Set	740	But	3704
# of S in Test Set	751	Subjective	992
# of Tokens (T)	521,805	Objective	8741
# of Clauses (C)	57,330	$P r e d i c a t i v e$	1009
# of Avg. T/S	26.9	Appositive	667
# of Avg. T/C	9.1	Relative	7095
# of Avg. C/S	3.1	Adverbial ^★	7777

Table 7. Main statistics of the HCA-AMR2.0, GUM, STAC, and RST-DT datasets. Note that “#” represents the number of the subsequent item, “Unit” or “U” represents the clause or elementary discourse unit (EDU), “S” represents sentences, and “Rels.” represents inter-clause/EDU relations. Thus, “# Units/Sentences” means the number of units or sentences, “# Avg. U/S” means the average number of units per sentence, and “# Avg. Rels./U” means the average number of inter-clause/EDU relations per unit.

Dataset	# Units (U)/Sentence (S)			# Avg. U/S	# Rels.	# Rel. Types	# Avg. Rels./U
Dataset	Train	Dev	Test	# Avg. U/S	# Rels.	# Rel. Types	# Avg. Rels./U
HCA- AMR2.0	52,758/17,885	2222/740	2350/751	3.1	49,160	18	0.86
GUM	14,766/6346	2219/976	2283/999	2.3	-	-
STAC	9887/8754	1154/991	1547/1342	1.1	-	-	-
RST-DT	17,646/6671	1797/716	2346/928	2.6	19,778	18	0.91

Table 8. Hardware and software used in our experiments.

Environment	Value
Hardware
CPU	Intel i9-10900K @ 3.7 GHz (10-core) (Intel Corporation, Santa Clara, CA, USA)
GPU	NVIDIA RTX 3090Ti (24 G) (Nvidia Corporation, Santa Clara, CA, USA)
Memory	64 GB
Software
Python	v3.8.16
Pytorch	v1.12.1
Anaconda	v4.10.1
CUDA	v11.3
IDE	PyCharm v2022.2.3

Table 9. Final hyper-parameters’ configuration of the clause segmentation model. Note that “#” represents the number of the subsequent item.

Layer	Hyper-Parameter	Value
Clause Segmentation Model
Character Embedding (Bi-LSTM)	layer	1
	hidden_size	64
	dropout	0.2
Word Embedding	fastText	300
Word Embedding	Electra	1024 (large)
Feature Embedding	POS/Lemma/DP	100
Bi-LSTM	layer	1
	hidden_size	512
	dropout	0.1
Trainer	optimizer	AdamW
	learning rate	5e-4, 1e-4
	# epochs	60
	patience of early stopping	10
	validation criteria	+span_f1
Clause Parsing Model
Word Embedding	pretrained language model	768/1024 (base/large)
FFN	hidden_size	512
FFN	dropout	0.2
Trainer	optimizer	AdamW
	learning rate	2e-4, 1e-5
	weight decay	0.01
	batch size (# spans/actions)	5
	# epochs	20
	patience of early stopping	5
	gradient clipping	1.0
	validation criteria	RST-Parseval-Full

Table 10. Performances of the adapted DisCoDisCo model on HCA-AMR2.0 for clause segmentation, and performances of DisCoDisCo and GumDrop on three datasets for the contrastive task, discourse segmentation. Note that * and ° indicate gold annotated features from the corresponding dataset and silver features annotated by Stanza, respectively. Bold numbers are the best scores on each dataset. All the experiments on the clause segmentation task were conducted for five runs with different seeds, and the experimental results were averaged.

Task	Dataset	Model	P	R	F $_{1}$
Discourse Segmentation	GUM	GumDrop [38]	96.5	90.8	93.5
		- all feats. *	97.7	87.4	92.3
		DisCoDisCo [17]	93.9	94.4	94.2
		- all feats. *	92.7	92.6	92.6
	STAC	GumDrop [38]	95.3	95.4	95.3
		- all feats. *	85.0	76.7	80.6
		DisCoDisCo [17]	96.3	93.6	94.9
		- all feats. *	91.8	92.1	91.9
	RST-DT	GumDrop [38]	94.9	96.5	95.7
		- all feats. *	96.3	94.6	95.4
		DisCoDisCo [17]	96.4	96.9	96.6
		- all feats. *	96.8	95.9	96.4
Clause Segmentation	HCA-AMR2.0	DisCoDisCo	92.9	89.7	91.3
		- lem. °	86.8	93.9	90.2
		- dp. °	91.0	87.7	89.3
		- pos °	91.4	87.6	89.4
		- all feats. °	89.2	85.4	87.2
		- fastText	90.5	82.7	86.4

Table 11. Performances of the top-down and bottom-up parsers with various pretrained language models (PLMs) for the clause parsing and discourse parsing tasks, which were evaluated by four RST-Parseval metrics, i.e., span, nuclearity (nuc.), relation (rel.), and full. Standard deviations for three runs are shown in parentheses. Bold numbers are the best scores for each task with each model. Note that we only conducted experiments on the HCA-AMR2.0 and RST-DT datasets for clause parsing and discourse parsing, respectively.

Model	PLM	Discourse Parsing				Clause Parsing
Model	PLM	Span	Nuc.	Rel.	Full	Span	Nuc.	Rel.	Full
Top-Down	BERT	92.6 ± 0.53	85.7 ± 0.41	75.4 ± 0.45	74.7 ± 0.54	96.0 ± 0.20	92.1 ± 0.27	85.8 ± 0.51	85.7 ± 0.47
	RoBERTa	94.1 ± 0.46	88.4 ± 0.46	79.6 ± 0.17	78.7 ± 0.11	96.5 ± 0.02	93.1 ± 0.09	87.1 ± 0.23	87.1 ± 0.22
	SpanBERT	94.1 ± 0.15	88.8 ± 0.19	79.4 ± 0.49	78.5 ± 0.39	96.4 ± 0.13	92.6 ± 0.12	86.1 ± 0.15	85.9 ± 0.15
	XLNet	94.8 ± 0.39	89.5 ± 0.39	80.5 ± 0.59	79.5 ± 0.53	96.6 ± 0.07	93.5 ± 0.20	87.0 ± 0.57	86.9 ± 0.56
	DeBERTa	94.2 ± 0.33	89.0 ± 0.16	80.1 ± 0.43	79.1 ± 0.32	96.6 ± 0.02	93.3 ± 0.07	87.2 ± 0.04	87.2 ± 0.10
Bottom-Up	BERT	91.9 ± 0.34	84.4 ± 0.31	74.4 ± 0.37	73.8 ± 0.30	96.3 ± 0.22	92.3 ± 0.36	86.1 ± 0.37	86.0 ± 0.32
	RoBERTa	94.4 ± 0.12	89.0 ± 0.34	80.4 ± 0.47	79.7 ± 0.51	96.4 ± 0.16	92.8 ± 0.20	86.6 ± 0.56	86.6 ± 0.58
	SpanBERT	93.9 ± 0.24	88.2 ± 0.19	79.3 ± 0.37	78.4 ± 0.29	96.5 ± 0.09	92.6 ± 0.06	86.4 ± 0.20	86.2 ± 0.22
	XLNet	94.7 ± 0.31	89.4 ± 0.24	81.2 ± 0.27	80.4 ± 0.34	96.9 ± 0.10	93.6 ± 0.09	87.4 ± 0.33	87.3 ± 0.31
	DeBERTa	94.6 ± 0.38	89.8 ± 0.65	81.0 ± 0.64	80.2 ± 0.70	97.0 ± 0.10	94.0 ± 0.17	87.8 ± 0.39	87.7 ± 0.38

Table 12. Clause-parsing results with large versions of pretrained language models (PLMs), XLNet and DeBERTa (RST-Parseval).

^{†}

indicates PLMs with a large version. Standard deviations for three runs are shown in parentheses. Bold numbers are the best scores for each model.

Table 12. Clause-parsing results with large versions of pretrained language models (PLMs), XLNet and DeBERTa (RST-Parseval).

^{†}

indicates PLMs with a large version. Standard deviations for three runs are shown in parentheses. Bold numbers are the best scores for each model.

Model	PLM	Span	Nuc.	Rel.	Full
Top-Down	XLNet	96.6 ± 0.07	93.5 ± 0.20	87.0 ± 0.57	86.9 ± 0.56
	XLNet $^{†}$	96.7 ± 0.27	93.6 ± 0.33	87.6 ± 0.36	87.6 ± 0.37
	DeBERTa	96.6 ± 0.02	93.3 ± 0.07	87.2 ± 0.04	87.2 ± 0.10
	DeBERTa $^{†}$	97.0 ± 0.14	94.0 ± 0.29	87.6 ± 0.69	87.6 ± 0.61
Bottom-Up	XLNet	96.9 ± 0.10	93.6 ± 0.09	87.4 ± 0.33	87.3 ± 0.31
	XLNet $^{†}$	97.0 ± 0.34	93.7 ± 0.46	87.6 ± 0.67	87.6 ± 0.67
	DeBERTa	97.0 ± 0.10	94.0 ± 0.17	87.8 ± 0.39	87.7 ± 0.38
	DeBERTa $^{†}$	97.4 ± 0.08	94.5 ± 0.13	88.6 ± 0.27	88.5 ± 0.28

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fan, Y.; Li, B.; Sataer, Y.; Gao, M.; Shi, C.; Cao, S.; Gao, Z. Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex Sentences. Appl. Sci. 2023, 13, 9412. https://doi.org/10.3390/app13169412

AMA Style

Fan Y, Li B, Sataer Y, Gao M, Shi C, Cao S, Gao Z. Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex Sentences. Applied Sciences. 2023; 13(16):9412. https://doi.org/10.3390/app13169412

Chicago/Turabian Style

Fan, Yunlong, Bin Li, Yikemaiti Sataer, Miao Gao, Chuanqi Shi, Siyi Cao, and Zhiqiang Gao. 2023. "Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex Sentences" Applied Sciences 13, no. 16: 9412. https://doi.org/10.3390/app13169412

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex Sentences

Abstract

Featured Application

Abstract

1. Introduction

2. Related Work

2.1. RST Parsing

2.2. Other Similar Tasks

2.2.1. Clause Identification

2.2.2. Split-and-Rephrase

2.2.3. Text Simplification

2.2.4. Simple-Sentence-Decomposition

2.2.5. Summary

2.3. Clause Hierarchy

3. Hierarchical Clause Annotation

3.1. Annotation Framework

3.1.1. Sentence and Clause

3.1.2. Clause Combination

3.2. HCA Representation

3.3. HCA Corpus

3.3.1. Silver Data from Existing Schemas

3.3.2. Gold Data from Manual Annotator

3.3.3. Quality Assurance

3.3.4. Dataset Detail

4. Model

4.1. Clause Segmentation

4.2. Clause Parsing

4.2.1. Text Span Embedding

4.2.2. Top-Down Strategy

4.2.3. Bottom-Up Strategy

5. Experiments

5.1. Dataset

5.2. Experimental Environments

5.3. Hyper-Parameters

5.4. Evaluation Metrics

5.5. Baseline Models

5.6. Experimental Results

5.6.1. Results of Clause Segmentation

5.6.2. Results of Clause Parsing

6. Discussion

6.1. Potentialities of HCA

6.1.1. Case Study for AMR Parsing

6.1.2. Case Study for Semantic Dependency Parsing

6.2. Future Work

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI