- Question2:
- Word order:
- Explanation: Word order refers to the structure of a sentense:
- Alexa, when is your birthday?
- (Alexa answers)
- Alexa, when your birthday is?
- (Alexa answers)
- This will test whether alexa handles some wrong word order.
- Explanation: Word order refers to the structure of a sentense:
- Inflectional morphology:
- Explanation:
- Word order:
- Question 3:
- Question 4:
- Question 5:
- Question 6:
- New definitions:
- log-entropy weighting
- cosine similarity
- New definitions:
Category: CMSC773
CMSC773: pre-coarse knowledge
- Dependency Parsing
- Dependency: focuses on relations between words
- Typed: Label indicating relationship between words
- Untyped: Only which words depend
-
Phrase structure: focuses on identifying phrases and their recursive structure
- Dependency parsing methods:
- Shift reduce:
- Predict from left to right
- Fast, but slightly less accurate
- MaltParser
- Spanning tree:
- Calculate full tree at once
- Slightly more accurate, slower
- MSTParser, Eda
- Cascaded chunking
- Chunk words into phrases, find heads, delete non-heads, repeat
- CaboCha
- Shift reduce:
- Example
- Maximum Spanning Tree
- Each dependency is an edge in a directed graph
- Assign each edge a score
- Keep the tree with the highest score
-
Cascaded ChunkingFor Japanese
- Shift-Reduce
- Two data Structures:
- Queue: of unprocessed words
- Stack: of partially processed words
- Process: at each point, choose”
- Shift: move one word from queue to stack
- Reduce left: top word on stack is head of second word
- Reduce right: second word on stack is head of top word
- Classification for shift-reduce
-
We have a weight vector for “shift” “reduce left”,“reduce right”: Ws, Wl, Wr
- Calculate feature functions from the queue and stack: φ(queue, stack)
- Multiply the feature functions to get scores: Ss = Ws * φ(queue, stack)
- take the highest score and choose the function
-
- Features for Shift Reduce:
-
Features should generally cover at least the last stack entries and first queue entry
-
- Algorithm:
- Input:
- weights:Ws, Wl, Wr
- queue: [(1 word1, Pos1), (2, word2, Pos2), …]
- stack: [(0, ROOT, ROOT)]
- Output:
- heads = [-1, head1, head2, …]
- Pseudo Code
- Training
- Can be trained using perceptron algorithm
-
Do parsing, if correct answer, corr different from classifier answer ans, update weights
- Shift-reduce training algorithm
- Input:
- Two data Structures:
- Maximum Spanning Tree
- CKY (Cocke-Kasami-Younger) Parsing Algorithm & Earley Parsing Algorithms
- Definitions:
- CFG (Context-free Grammar): G = (N, Σ, P, S)
-
N is a finite set of non-terminal symbols
- Σ is a finite set of terminal symbols (tokens)
- P is a finite set of productions (rules)
-
Production : (α, β)∈P will be written α → β where α is in N and β is in (N∪Σ)*
-
- S is the start symbol in N
-
- CNF (Chomsky Normal Form):
A context-free grammar where the right side of each production rule is restricted to be either two non terminals or oneterminal. Production can be one of following formats: A->a || A->BC.Any CFG can be converted to a weakly equivalent grammar in CNF
- CFG (Context-free Grammar): G = (N, Σ, P, S)
- Parsing Algorithms:
- Should know:
-
CFGs are basis for describing (syntactic) structure of NL sentences
-
Parsing Algorithms are core of NL analysis systems
- Recognition vs. Parsing:
-
Recognition-deciding the membership in the language
-
Parsing–Recognition+ producing a parse tree for it
-
-
Parsing is more “difficult” than recognition (time complexity)
-
Ambiguity-an input may have exponentially many parses.
-
- Top-down VS Bottom-up
- Top-down (goal driven): from start symbol down
- Bottom-up (data driven): from the symbols up
- Naive vs. dynamic programming
- Naive: enumerate everything
- Back tracking: try something, discard partial solutions
- DP: save partial solutions in a table
- CKY: Bottom-up DP
- Earley Parsing: top-down DP
- Should know:
- Definitions:
- Chart Parsing:
- Lambda Calculus
- Perceptron Learning
- Restricted Boltzmann Machines
- Statistical Hypothesis Testing
- Latent Semantic Analysis
- Latent Dirichlet Allocation
- Gibbs Sampling
- Inter-rater Reliability
- IBM Models for MT
- Phrase-based Statistical MT
- Hierarchical Phrase-based Statistical MT
- Lambda Reductions
- Probabilistic Soft Logic
- Open Information Extraction
- Semantic Role Labeling
- Combinatory Categorial Grammar (CCG)
- Tree Adjoining Grammar
- Abstract Meaning Representation (AMR)
- Wikification
- Dependency: focuses on relations between words