CMSC773: HW1

  • Question2:
    • Word order:
      • Explanation: Word order refers to the structure of a sentense:
        • Alexa, when is your birthday?
        • (Alexa answers)
        • Alexa, when your birthday is?
        • (Alexa answers)
        • This will test whether alexa handles some wrong word order.
    • Inflectional morphology:
      • Explanation:
  • Question 3:
  • Question 4:
  • Question 5:
  • Question 6:
    • New definitions:
      • log-entropy weighting
      • cosine similarity

CMSC773: pre-coarse knowledge

  • Dependency Parsing
    • Dependency: focuses on relations between words
      • Typed: Label indicating relationship between words
      • Untyped: Only which words depend
    • Phrase structure: focuses on identifying phrases and their recursive structure
    • Dependency parsing methods:
      • Shift reduce:
        • Predict from left to right
        • Fast, but slightly less accurate
        • MaltParser
      • Spanning tree:
        • Calculate full tree at once
        • Slightly more accurate, slower
        • MSTParser, Eda
      • Cascaded chunking
        • Chunk words into phrases, find heads, delete non-heads, repeat
        • CaboCha
    • Example
      • Maximum Spanning Tree
        • Each dependency is an edge in a directed graph
        • Assign each edge a score
        • Keep the tree with the highest score
      • Cascaded Chunking
        For Japanese
      • Shift-Reduce
        • Two data Structures:
          • Queue: of unprocessed words
          • Stack: of partially processed words
        • Process: at each point, choose”
          • Shift: move one word from queue to stack
          • Reduce left: top word on stack is head of second word
          • Reduce right: second word on stack is head of top word
        • Classification for shift-reduce
          • We have a weight vector for “shift” “reduce left”,“reduce right”: Ws, Wl, Wr
          • Calculate feature functions from the queue and stack: φ(queue, stack)
          • Multiply the feature functions to get scores: Ss = Ws * φ(queue, stack)
          • take the highest score and choose the function
        • Features for Shift Reduce:
          • Features should generally cover at least the last stack entries and first queue entry
        • Algorithm:
          • Input:
            • weights:Ws, Wl, Wr
            • queue: [(1 word1, Pos1), (2, word2, Pos2), …]
            • stack: [(0, ROOT, ROOT)]
          • Output:
            • heads = [-1, head1, head2, …]
          • Pseudo Code
          • Training
            • Can be trained using perceptron algorithm
            • Do parsing, if correct answer, corr different from classifier answer ans, update weights
            • Shift-reduce training algorithm
    • CKY (Cocke-Kasami-Younger) Parsing Algorithm & Earley Parsing Algorithms
      • Definitions:
        • CFG (Context-free Grammar): G = (N, Σ, P, S)
          • N is a finite set of non-terminal symbols
          • Σ is a finite set of terminal symbols (tokens)
          • P is a finite set of productions (rules)
            • Production : (α, β)∈P will be written α → β where α is in N and β is in (N∪Σ)*
          • S is the start symbol in N
        • CNF (Chomsky Normal Form):
          A context-free grammar where the right side of each production rule is restricted to be either two non terminals or one
          terminal. Production can be one of following formats: A->a || A->BC.

          Any CFG can be converted to a weakly equivalent grammar in CNF
      • Parsing Algorithms:
        • Should know:
          • CFGs are basis for describing (syntactic) structure of NL sentences
          • Parsing Algorithms are core of NL analysis systems
          • Recognition vs. Parsing:
            • Recognition-deciding the membership in the language
            • Parsing–Recognition+ producing a parse tree for it
          • Parsing is more “difficult” than recognition (time complexity)
          • Ambiguity-an input may have exponentially many parses.
        • Top-down VS Bottom-up
          • Top-down (goal driven): from start symbol down
          • Bottom-up (data driven): from the symbols up
        • Naive vs. dynamic programming
          • Naive: enumerate everything
          • Back tracking: try something, discard partial solutions
          • DP: save partial solutions in a table
        • CKY: Bottom-up DP
        • Earley Parsing: top-down DP
    • Chart Parsing:
    • Lambda Calculus
    • Perceptron Learning
    • Restricted Boltzmann Machines
    • Statistical Hypothesis Testing
    • Latent Semantic Analysis
    • Latent Dirichlet Allocation
    • Gibbs Sampling
    • Inter-rater Reliability
    • IBM Models for MT
    • Phrase-based Statistical MT
    • Hierarchical Phrase-based Statistical MT
    • Lambda Reductions
    • Probabilistic Soft Logic
    • Open Information Extraction
    • Semantic Role Labeling
    • Combinatory Categorial Grammar (CCG)
    • Tree Adjoining Grammar
    • Abstract Meaning Representation (AMR)
    • Wikification