What is syntactic analysis?
Syntactic analysis is an analysis that focuses on understanding the logical meaning of sentences or of parts of sentences.
While lemmatization focuses purely on feature extraction and data cleaning, syntactic analysis analyzes the relationship between words and the grammatical structure of sentences. This is very important in understanding the actual meaning of the sentence.
Consider these sentences:
- The lion is a ferocious animal.
- Ferocious is animal a the lion.
Since both the sentences have the same words, lexical analysis would not be able to tell that the second sentence is syntactically incorrect and does not make sense.
Syntactic analysis is required to examine the relationship between the words in a sentence and the grammatical structure of a sentence in order to derive the true meaning of the sentence.
You could refer to syntactic analysis as the process of analyzing the strings of symbols in natural language in conformance with grammatical rules.
There are many elements of sentences that lexical analysis ignores, which syntactic analysis accounts for. For example, lexical analysis ignores stop words, which could change the entire meaning of a sentence. Lexical analysis does not even bother with identifying the parts-of-speech of the words from a sentence that is being analyzed.
Syntactic analysis also pays attention to the order of the words in a sentence, another aspect that is completely ignored by lexical analysis. It also considers the morphology of the words in the sentence, which lexical analysis simply cannot account for.
How do we do syntactic analysis?
When you perform syntactic analysis of sentences with the tools of traditional grammar, there are six steps that you can follow:
- Segmentation I: Identifying clause boundaries and word boundaries
- Classification I: Determining the parts of speech
- Segmentation II: Identifying constituents
- Classification II: Determining the syntactic categories for the constituents
- Determining the grammatical functions of the constituents
- Drawing the syntactic structure
What is the purpose of syntactic analysis?
Its purpose is to understand the structure of input text, from the smallest basic symbols, all the way to sentences, and then derive logical meaning from it.
Syntactic analysis is an extremely important aspect of natural language processing (NLP) because it assists in figuring out the grammatical meaning of any sentence.
What are the levels of syntactic analysis?
Here are the levels of syntactic analysis:
1. Part-of-speech (POS) tagging
This is the first level of syntactic analysis. Part-of-speech tagging is a vital part of syntactic analysis and involves tagging words in the sentence as verbs, adverbs, nouns, adjectives, prepositions, etc.
Part-of-speech tagging helps us understand the meaning of the sentence. All other parsing techniques make use of part-of-speech tags.
2. Constituency parsing
Constituency parsing involves the segregation of words from a sentence into groups, on the basis of their grammatical role in the sentence.
Noun Phrases, Verb Phrases, and Prepositional Phrases are the most common constituencies, while other constituencies like Adverb phrases and Nominals also exist.
3. Dependency parsing
Dependency parsing is widely used in free-word-order languages. In dependency parsing, dependencies are formed between the words themselves.
When two words have dependencies between them, one word is the head while the other one is the child or the dependent.
What’s the difference between Lexical and Syntactic analysis?
Lexical analysis focuses on data Cleaning and feature extraction with the help of techniques like stemming, lemmitization, correction of misspelled words, etc.
On the other hand, syntactic analysis aims to identify the roles played by words in a sentence, interpret the relationship between words, and interpret the grammatical structure of sentences.
If you’d like an example, look at these sentences:
Tom is a wise man.
Is Tom a man wise?
All the words are the same in both sentences, but only the first sentence is syntactically correct and can be understood with ease.
However, you cannot make these distinctions when you just make use of basic lexical processing techniques. To make these distinctions, you would need to employ more sophisticated syntax processing techniques so that you can understand the relationship between individual words in a sentence.
There are quite a few aspects of a sentence that syntactic analysis takes into account but lexical analysis does not. Some of these aspects include:
The order and meaning of words
The syntactical analysis seeks to extract the dependency of words with other words in the content. If the order of the words is changed, then it will be harder to comprehend the sentence.
Retaining Stop-Words
If the stop-words are removed, the entire meaning of the sentence might change.
Morphology of Words
Stemming and Lemmatization will bring the words to their base form, thereby changing the grammar of the sentence.
Parts-of-speech of Words in a Sentence
It is critical to identify the right part-of-speech of a word.
What is Derivation in syntactic analysis?
If you want to get the input string, you would require a sequence of production rules. The derivation is a set of production rules. While the process of parsing is carried out, you need to determine the non-terminal, which is to be replaced along with determining the production rule with the help of which the non-terminal will be replaced.
We will explain two types of derivations, which you can make use of to decide which non-terminal to replace with the production rule.
Left-most Derivation
In the left-most derivation, the sentential form of the input gets scanned and replaced from the left to the right. In this case, the sentential form is referred to as the left-sentential form.
Right-most Derivation
In the right-most derivation, the sentential form of input gets scanned and replaced from right to left. In this case, the sentential form is referred to as the right-sentential form.