Lecture 2: Lexical association measures and hypothesis testing

Pre-lecture Readings.

  • Lexical association
    • Named entities: http://www.nltk.org/book/ch07.html
      • Information extraction architecture
        • raw text->sentence segmentation->takenization0<part of speech tagging->entity detection->relation detection
        • chunking: segments and labels multi-token sequences as illustrated in 2.1.../images/chunk-segmentation.png
          • Noun-phrase (NP) chunking
          • tag patterns: describe sequences of tagged words
          • Chunking with Regular Expressions
          • Exploring Text Corpora
        • Chinking: define a chink to be a sequence of tokens that is not included in a chunk
        • Representing chunks: tags & trees
        • Training Classifier-Based Chunkers
    • Accurate methods for the statistics of surprise and coincidence
  • An introduction to lexical association measures:
    • We use the term lexical association to refer to the association between words.
    • 3 types of association:
      • collocational association: restricting combination of words into phrases
      • semantic association: reflecting the semantic relationship between words
      • d cross-language association: corresponding to potential translations of words between different languages
  • Topics:
    • Knowledge/ data->zipf
    • Hypothesis testing
    • Collations
    • NER
  • Knowledge/ data
    • Rationalist:
    • Empiricist: driven by observation
    • Zipf’s law: the frequency of any word is inversely proportional to its rank in the frequency table.
      •                                      Empiricist                          Rationalist
      • core                              frequent
      • periphery                     infrequent                     exception relation
    • Example 1
      • Alice like it. a book.
      • Question: what does Alice like?
      • Instead of using a string of word, use a tree structure to represent a sentence.
      •              CP
      •    spec           IP
      • what         Alice like
    • Example 2 (Rationalist)
      • Which book did mary say that Bill thinks Alice like?
      • He dropped a book about info theory.
      • What did he drop a book about? (questionable) What did he write a book about? (Good)
      • He dropped [a book about info theory].
      • S                                   NP
    • Example 3
      • What does Mary read?
      •        SBarQ
      • WhNP                             SQ
      • WP           VR           NP             VBR                   NP
      •  what         odes     mary              read                  T*-1
    • Balance between rationalist and empiricist
    • Accuracy vs. coverage
    • Data-driven algorithms less concerned with over generation
    • Semi-supervised learning
    • How to relate the algorithm to the wat we learned.
    • 80-20 rule
    • Remove the low-frequency word and focus on the frequent words for the structure.
  • Hypothesis testing (Chi^2 test)
    • P(H|E) H: hypothesis, E: evidence
    • P(q = 0.7 | 76 success, 24 failures)
    • pmf:
    • R = P(H1|E) / P(H2|E)
    • Recipe for statistical hypothesis
      • Step 1: H0: null hypothesis (what we think is not true)
        • Example: coin flip: coin has only one side.
      • Step 2: Test statistic: number we compute based on evidence collection
      • Step 3: Identify test statistics distribution: if H0 were true.
      • Step 4: Pick a threshold to convince that the H0 is not true.
      • Step 5: Do an experiment and calculate the test statistic.
      • Step 6: If it sufficiently improbable assuming H0 is true, should reject H0.
        • H0: q = 0.05, H1: q != 0.05
    • Example:
      • H0: P(H) = 0.05
      • R~Binomial(n, p)
      • Likelihood func: b(r,n,p) = C(n r) p^r (1-p)^(n-r)
      • Test statistics: # heads
      • Threshold:
        • alpha = 10e-10 Type II: “miss”, “false negative”
        • alpha = 0.2 Type I: “false positive”
        • Typically, people choose alpha = 0.05
      • The value on the edge (alpha = 2.5%) is critical value.
      • Tips:
        • alpha is selected in advance. (p-hacking)
        • Cannot be used to prove something is true, only can show that something is not true.
        • Chi^2? X^2
    • Difference between Chi^2 test and T-test
    • How do I know how many times of experiment is convincing?
    • Collocation:  collocation is a sequence of words or terms that co-occur more often than would be expected by chance.
      • Kict the bucket = to die

Leave a Reply

Your email address will not be published. Required fields are marked *