What is dependency parsing?
Dependency parsing involves exploring the dependencies between words in a sentence to gain an understanding of its grammatical structure. It breaks sentences into multiple components and works on the concept that there are direct links (or dependencies) between every linguistic unit in a sentence.
Relations between linguistic units or words are indicated with directed arcs in a typed dependency structure. Relationships between words are indicated by dependency tags.
When there are dependencies between two words, one word is the head while the other one is the dependent (or child). There are now 37 universal syntactic relations included in the Universal Dependency V2 taxonomy. In addition to these, a vast range of language-specific tags also exist.
Dependency parsing can identify the subjects and objects of a verb, while also showing you which words modify or describe the subject.
You could consider dependency parsing to be the process of listing every single word in a sentence as a node and linking them to their dependents, thereby defining the grammatical structure of that sentence.
What is a dependency tree?
Dependency trees are directed graphs that follow three rules:
- They have a single designated root node that does not have any incoming arcs.
- Each vertex, other than the root note, has exactly one incoming arc.
- A unique path exists between the root node and every single vertex in the set of vertices.
These rules work together to make sure that every word has just one head, that the
dependency structure is well connected, and that there is only one root node from which
a unique directed path connects each of the words in the sentence.
How is NLTK used to perform dependency parsing?
The Natural Language Toolkit (NLTK) package can be used for Dependency Parsing, which is a set of libraries and codes used during statistical Natural Language Processing (NLP) of human language.
There are several methods through which you can perform dependency parsing with the use of NLTK. Two of these techniques include:
Probabilistic, projective dependency parser
Probabilistic, projective dependency parsers predict new sentences by making use of natural language data that is gathered from hand-parsed sentences. These parsers are known for making mistakes and they work with a limited collection of coaching information.
Stanford parser
The Stanford NLP Group’s CoreNLP offers NLP tools in Java. You can make use of this Java Library along with NLTK to parse dependencies in Python. Thus parser supports a large number of languages, including, but not limited to English, Chinese, German, and Arabic.
You’d start by downloading the Stanford CoreNLP zip file and Stanford CoreNLP model jar file from the CoreNLP website.
You can run these three commands to download the required libraries and unzip the zip file.
wget https://nlp.stanford.edu/software/stanford-corenlp-4.2.2.zip
wget https://nlp.stanford.edu/software/stanford-corenlp-4.2.2-models-english.jar
unzip /content/stanford-core NLP-4.2.2.zip
After you download these libraries, you can import the StanfordDependencyParser from NLTK.
To visualize the dependency that CoreNLP generates, you can either extract a labeled and directed NetworkX Graph object using dependency.nx_graph() function or you could even generate a DOT definition in Graph Description Language using dependency.to_dot() function. The DOT definition can be visualized as a graph using GraphViz.
What are the other methods to implement dependency parsing in Python?
Here are two other ways to implement dependency parsing in Python:
Using spaCy
You can make use of spaCy, an open-source Python library for Natural Language Processing, to implement dependency parsing.
To start, you’d want to install spaCy and load the language model that you need to use. The smallest English model available in spaCy is en_core_web_sm. This language model has a size of 12MB. You can check out other available models if you go through the spaCy English Models.
spaCy even offers you a built-in dependency visualizer known as a display that you can use to generate dependency graphs for sentences.
Using Stanza
The Stanford NLP Group also developed Stanza. Stanza offers you a Neural Network NLP Pipeline that you can be customized, as well as and a Python wrapper over the Stanford CoreNLP package, which makes it easier to use the CoreNLP features without downloading the jar files.
You start by installing Stanza. Then you need to import Stanza and download the required language model. Here’s a list of the available language models that you can make use of.
Then you initialize the neural pipeline through the use of stanza.Pipeline() function. The first parameter is the language to use. Optional parameter processors can be passed which can be a dictionary or a comma-separated string to configure the processors to use in the pipeline.
You can access a list of all the processors on Pipeline and Processors. Some of the processors could need to be preceded by some other processor in the pipeline, otherwise, they will not function. As an example, the pos processor requires to tokenize and met processors, so you would have to make use of these two processors in the pipeline as well if you intend to use the pos processor.
Now, you pass your sentence through the pipeline and maintain all the results in the doc variable. If you print doc. sentences, you’ll notice a list for every one of the sentences that were passed through the pipeline. Every list contains the results of all token information and linguistic features.
You can then we can call the print_dependencies() function for all the sentences in the doc object. This function will print tuples with three values — the token, the index of the head, and the related nature.
What is the difference between Dependency Parsing and Constituency Parsing?
Dependency Parsing
- Dependency Parsing is where the syntax of the sentence is expressed in terms of dependencies between words rather than the sentence structure and relationship.
- Dependency Parsing uses common algorithms treebank searching algorithms, Arc-eager or beam search, and react-sentence-tree.
- For example, in this sentence "I wore a t-shirt", dependency parsing would define the correlation between the words about their dependency on each other.
Constituency Parsing
- Constituency parsing is very helpful in visualizing the entire syntactical structure of a sentence with the help of a parse tree, which is useful in word processing systems for grammar checking.
- Constituency parsing can be attained with multiple algorithms like Cocke-Kasami-Younger (CKY) algorithm and the probabilistic context-free grammars (PCFGs) algorithm.
- Carrying forward the same example, Constituency parsing would break down the entire sentence and show the structure of the same by highlighting their part-of-speech, noun, adjective, and verbs of/in the sentence.