Setting up Stanford Named Entity Recognizer on Ubuntu

David Janes
2 min readOct 15, 2020

Some simple NLP stuff, as an alternative to eg AWS Comprehend.

“Stanford NER is a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names.”

The software is from here:
https://nlp.stanford.edu/software/CRF-NER.html

First make sure you have Java:

sudo apt-get default-jre

Then it’s quite straight forward:

wget https://nlp.stanford.edu/software/stanford-ner-4.0.0.zip
unzip stanford-ner-4.0.0.zip
cd stanford-ner-4.0.0

And to test:

$ sh ner.sh sample.txtThe/O fate/O of/O Lehman/ORGANIZATION Brothers/ORGANIZATION ,/O the/O beleaguered/O investment/O bank/O ,/O hung/O in/O the/O balance/O on/O Sunday/O as/O Federal/ORGANIZATION Reserve/ORGANIZATION officials/O and/O the/O leaders/O of/O major/O financial/O institutions/O continued/O to/O gather/O in/O emergency/O meetings/O trying/O to/O complete/O a/O plan/O to/O rescue/O the/O stricken/O bank/O ./OSeveral/O possible/O plans/O emerged/O from/O the/O talks/O ,/O held/O at/O the/O Federal/ORGANIZATION Reserve/ORGANIZATION Bank/ORGANIZATION of/ORGANIZATION New/ORGANIZATION York/ORGANIZATION and/O led/O by/O Timothy/PERSON R./PERSON Geithner/PERSON ,/O the/O president/O of/O the/O New/ORGANIZATION York/ORGANIZATION Fed/ORGANIZATION ,/O and/O Treasury/ORGANIZATION Secretary/O Henry/PERSON M./PERSON Paulson/PERSON Jr./PERSON ./O

You can also just run it as Java, in this case with the “XML” option:

$ java -mx700m \
-cp "./stanford-ner.jar:./lib/*" \
edu.stanford.nlp.ie.crf.CRFClassifier \
-loadClassifier ./classifiers/english.all.3class.distsim.crf.ser.gz \
-textFile FILENAME \
-outputFormat inlineXML
The fate of <ORGANIZATION>Lehman Brothers</ORGANIZATION>, the beleaguered investment bank, hung in the balance on Sunday as <ORGANIZATION>Federal Reserve</ORGANIZATION> officials and the leaders of major financial institutions continued to gather in emergency meetings trying to complete a plan to rescue the stricken bank. Several possible plans emerged from the talks, held at the <ORGANIZATION>Federal Reserve Bank of New York</ORGANIZATION> and led by <PERSON>Timothy R. Geithner</PERSON>, the president of the <ORGANIZATION>New York Fed</ORGANIZATION>, and <ORGANIZATION>Treasury</ORGANIZATION> Secretary <PERSON>Henry M. Paulson Jr</PERSON>.

It also has a server mode — basically a socket send / response thing, which I totally would not expose on the Internet:

$ java -mx500m \
-cp stanford-ner.jar edu.stanford.nlp.ie.NERServer \
-port 9191 \
-loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz
$ telnet localhost 9191
I know that Sherlock Homes lived in London, UK.
I/O know/O that/O Sherlock/PERSON Homes/PERSON lived/O in/O London/LOCATION ,/O UK/LOCATION ./O

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

David Janes
David Janes

Written by David Janes

Entrepreneur. Technologist. Mercenary Programmer.

No responses yet

Write a response