skip to main content

Caltech Library Workshop – Text as Data: An Introduction to Natural Language Processing (Zoom Session)

Wednesday, May 24, 2023
12:00pm to 1:15pm
Add to Cal
Online Event

This introduction to Natural Language Processing (NLP) covers the management and analysis of text using core Python programming language, and the open source libraries NLTK (natural language toolkit) and spaCy. Some prior experience with Python programming will be useful, but is not assumed. The three one-hour workshops will include the following topics:

Friday May 5, 12:00-1:15pm: Text processing in Python

  • strings and their properties
  • strings as iterables, lists
  • comparing and searching strings
  • regular expressions

Friday May 19, 12:00-1:15pm: NLTK

  • text preprocessing (spellchecking, stemming and lemmatization)
  • word contexts, frequency distribution
  • parts-of-speech tagging
  • named entity recognition
  • sentiment analysis

Wednesday May 24, 12:00-1:15pm: spaCy

  • statistical modeling of text
  • word vectors and similarity
  • processing pipelines

Registration is required:

For more information, please contact Stephen Davison by email at [email protected] or visit