14 Textrank - Introduction

After learning about the Pagerank algorithm and specifically how it was implemented, I wanted to use it for some NLP to see how it faired. Luckily, this Textrank paper had the same idea so I can learn from their findings.


This paper is fundamentally investigating the graph structure and how a graph of global information can provide compelling results by finding the importance of each vertex globally, rather than relying only on local vertex information. This has been fundamental to the Internet as we know it today by being able to sort out important vertices.

This paper investigates the application of TextRank on two language processing tasks: keyword and sentence extraction.

The Model

The ranking algorithm is based on an iteratively solved graph network built with vertices and edges. The Pagerank is defined as S(Vi)=(1d)+djIn(Vi)1Out(Vj)S(Vj)S\left(V_{i}\right)=(1-d)+d * \sum_{j \in I n\left(V_{i}\right)} \frac{1}{\left|O u t\left(V_{j}\right)\right|} S\left(V_{j}\right).

The Pagerank builds a markov matrix, that with enough iterations converges on an “importance” score derived from the network globally. The paper notes their damping factor dd is set to 0.85 as in the Pagerank implementation.

Text as a graph

It is not trivial to move from a graph of websites to a graph of words of a corpus because the relationships are different, so thought must go into deciding how to build the graph. For one, the website implementation is unweighted, so each vote is equal; however, with text, the connections are less binary. The paper chose to use weighted graphs to address this issue.

Their model solution is similar to Pagerank with the inclusion of addressing the weighting issue:

WS(Vi)=(1d)+dVjIn(Vi)wjiVkOut(Vj)wjkWS(Vj)W S\left(V_{i}\right)=(1-d)+d * \sum_{V_{j} \in I n\left(V_{i}\right)} \frac{w_{j i}}{\sum_{V_{k} \in O u t\left(V_{j}\right)} w_{j k}} W S\left(V_{j}\right)