Stanford NLP REST Server
I’ve implemented a small REST server on top of Stanford’s NLP software, specifically the Part of Speech tagging and the Entity recognition functions. It could easily be extended to more functions, but this is what I need right now.
It will work fairly well as a replacement for functions in AWS Comprehend. Comprehend is great, but at around $1 for each Mb of text analyzed, it could get a little expensive for just playing around with large corpuses. It’s not a drop in replacement, but translation should be 1–1.
You can get the code on GitHub here: https://github.com/dpjanes/iotdb-nlp/tree/main/stanford/server, and there’s documentation there how to install and so forth. The server code is necessarily licensed the same as Stanford’s: GPLv3.
It will take both POST and GET commands, and has an optional security module if you want to keep it up as service. There’s a few rough edges, feel free to send patches if you’ve made changes.
Here’s an example of Entity detection:
{
"delta": 0.011,
"items": [
{
"begin": 16,
"document": "Smith",
"end": 21,
"score": 0.8121054261060908,
"tag": "PERSON",
"token": "entity"
},
{
"begin": 36,
"document": "London",
"end": 42,
"score": 0.9966959414574724,
"tag": "LOCATION",
"token": "entity"
},
{
"begin": 44,
"document": "Ontario",
"end": 51,
"score": 0.8270727690781802,
"tag": "LOCATION",
"token": "entity"
}
]
}
And here’s an example of POS detection:
{
"delta": 0.052,
"items": [
{
"score": 0.99,
"document": "The",
"end": 3,
"tag": "DT",
"begin": 0,
"token": "pos"
},
{
"score": 0.99,
"document": "Sign",
"end": 8,
"tag": "VB",
"begin": 4,
"token": "pos"
},
{
"score": 0.99,
"document": "of",
"end": 11,
"tag": "IN",
"begin": 9,
"token": "pos"
},
{
"score": 0.99,
"document": "the",
"end": 15,
"tag": "DT",
"begin": 12,
"token": "pos"
},
{
"score": 0.99,
"document": "Four",
"end": 20,
"tag": "CD",
"begin": 16,
"token": "pos"
},
{
"score": 0.99,
"document": "Chapter",
"end": 29,
"tag": "NN",
"begin": 22,
"token": "pos"
},
{
"score": 0.99,
"document": "I",
"end": 31,
"tag": "PRP",
"begin": 30,
"token": "pos"
},
{
"score": 0.99,
"document": ".",
"end": 32,
"tag": ".",
"begin": 31,
"token": "pos"
},
...
{
"score": 0.99,
"document": "upon",
"end": 1859,
"tag": "IN",
"begin": 1855,
"token": "pos"
},
{
"score": 0.99,
"document": "it",
"end": 1862,
"tag": "PRP",
"begin": 1860,
"token": "pos"
},
{
"score": 0.99,
"document": ".",
"end": 1863,
"tag": ".",
"begin": 1862,
"token": "pos"
},
{
"score": 0.99,
"document": "”",
"end": 1864,
"tag": "''",
"begin": 1863,
"token": "pos"
}
]
}