Talk: Learning efficiency of outcome in games

  • By Eva Tardos
  • Repeated games
    • player’s value/cost additive over periods, while playing
    • players try to learn what is the best from past data
    • what can we say about the outcome? how long do they have to stay to ensure OK social welfare?
  • Result: routing, limit for very small users
    • Theorem:
      • In any network with continuous, non-decreasing cost functions and small users
      • cost of Nash with rates ri for all i <= cost of opt with rate 2ri for all i
    • Nash equilibrium: stable solution where no player had incentive to deviate.
    • Price of Anarchy = cost of worst Nash equilibrium / social optimum cost;
  • Examples of price of anarchy bounds
  • Price of anarchy in auctions
    • First price is auction
    • All pay auction…
    • Other applications include:
      • public goods
      • fair sharing
      • Walrasian Mechanism
  • Repeated game that is slowly changing
    • Dynamic population model
      • at each step t each player I is replaced with an arbitrary new player with probability p
      • in a population of N players, each step, Np players replaced in expectation
      • population changes all the time: need to adjust
      • players stay long enough…
  • Learning in repeated game
    • what is learning?
    • Does learning lead to finding Nash equilibrium?
    • fictitious play = best respond to past history of other players goal: “pre-play” as a way to learn to play Nash
  • Find a better idea when the game is playing?
    • Change of focus: outcome of learning in playing
  • Nash equilibrium of the one shot game?
    • Nash equilibrium of the one-shot game: stable actions a with no regret for any alternate strategy x.
    • cost_i(x, a_-i) >= cost_i(a)
  • Behavior is far from stable
  • no regret without stability: learning
    • no regret: for any fixed action x (cost \in [0,1]):
      • sum_t(cost_i(a^t)) <= sum_t(cost_i(x, a_-i^t)) + error
      • error <= √T (if o(T) called no-regret)
  • Outcome of no-regret learning in a fixed game
    • limit distribution sigma of play (action vectors a = (a1, a2,…,an))
  • No regret leanring as a behavior model:
    •  Pro:
      • no need for common prior or rationality assumption on opponents
      • behavioral assumption: if there is a consistently good strategy: please notice!
      • algorithm: many simple rules ensure regret approx.
      • Behavior model ….
  • Distribution of smallest rationalizable multiplicative regret
    • strictly positive regret: learning phase maybe better than no-regret
  • Today (with d options):
    • sum_t(cost_i(a^t)) <= sum_t(cost_i(x, a_-i^t)) + √Tlogd
    • sum_t(cost_i(a^t)) <= (1 + epsilon)sum_t(cost_i(x, a_-i^t)) + log(d)/epsilon
  • Quality of learning outcome
  • Proof technique: Smoothness
  • Learning and price of anarchy
  • Learning in dynamic games
    • Dynamic population model
      • at each step t each player I is replaced with an arbitrary new player with probability p
    • how should they learn from data?
  • Need for adaptive learning
  • Adapting result to dynamic populations
    • inequality we wish to have
  • Change in optimum solution
  • Use differential privacy -> stable solution

How to write MP4 with OpenCV3

When trying to write MP4 file (H264), I tired code of

And I got error saying:

This problem is solved by changing the fourcc to the ASCII number directly to cv2.VideoWriter(), i.e.

reference:

https://devtalk.nvidia.com/default/topic/1029451/jetson-tx2/-python-what-is-the-four-characters-fourcc-code-for-mp4-encoding-on-tx2/

Paper Reading: View Direction and Bandwidth Adaptive 360 Degree Video Streaming using a Two-Tier System

Each segment is coded as a base-tier (BT) chunk, and multiple enhancement-tier (ET) chunks.

BT chunks:

represent the entire 360 view at a low bit rate and are pre-fetched in a long display buffer to smooth the network jitters effectively and guarantee that any desired FOV can be rendered with minimum stalls.

ET chunks:

Facebook 360 video:
https://code.facebook.com/posts/1638767863078802
Assessment:
https://code.facebook.com/posts/2058037817807164

PointNet, PointNet++, and PU-Net

point cloud -> deep network -> classification / segmentation / super-resolution

traditional classification / segmentation:

projection onto 2D plane and use 2D classification / segmentation

unordered set

point(Vec3) -> feature vector (Vec5) -> normalize (end with the bound of the pointcloud)

N points:

segmentation:

feature from N points ->NxK classes of each point (each point will have a class)

classification:

feature from N points -> K x 1 vector (K classes)

 

Lecture 10: Neural Network

  1. Deep learning
  2. Representation learning
  3. Rule-based
    1. high explainability
  4. Linguistic supervision
  5. Semi-supervision
    1. have small set of data with label
    2. has large set of data without label
  6. Recurrent-level supervision
  7. Language structure

description lengths DL= size(lexicon) + size( encoding)

  1. lex1
    1. do
    2. the kitty
    3. you
    4. like
    5. see
  2. Lex2
    1. do
    2. you
    3. like
    4. see
    5. the
    6. kitty
  3. How to evaluate the two lexicons?
    1. lex 1 have 5 words, lex 2 has 6 words
    2. Potential sequence
      1. lex1: 1 3 5 2, 5 2, 1 3 4 2
      2. lex2: 1 3 5 2 6, 5 2 6, 1 3 4 2 6
  1. MDL: minimum description lengths
    1. unsupervised
    2. prosodic bootstrapping

Boltzmenn machine

Lexical space

relatedness vs. similarity

  • use near neighbors: similarity
  • use far neighbors: relatedness

ws-353 has similarity & relatedness

loss function:

 

project:

Part1: potential methods

  • LDA
  • readability
  • syntactic analysis