Talk: Learning efficiency of outcome in games

  • By Eva Tardos
  • Repeated games
    • player’s value/cost additive over periods, while playing
    • players try to learn what is the best from past data
    • what can we say about the outcome? how long do they have to stay to ensure OK social welfare?
  • Result: routing, limit for very small users
    • Theorem:
      • In any network with continuous, non-decreasing cost functions and small users
      • cost of Nash with rates ri for all i <= cost of opt with rate 2ri for all i
    • Nash equilibrium: stable solution where no player had incentive to deviate.
    • Price of Anarchy = cost of worst Nash equilibrium / social optimum cost;
  • Examples of price of anarchy bounds
  • Price of anarchy in auctions
    • First price is auction
    • All pay auction…
    • Other applications include:
      • public goods
      • fair sharing
      • Walrasian Mechanism
  • Repeated game that is slowly changing
    • Dynamic population model
      • at each step t each player I is replaced with an arbitrary new player with probability p
      • in a population of N players, each step, Np players replaced in expectation
      • population changes all the time: need to adjust
      • players stay long enough…
  • Learning in repeated game
    • what is learning?
    • Does learning lead to finding Nash equilibrium?
    • fictitious play = best respond to past history of other players goal: “pre-play” as a way to learn to play Nash
  • Find a better idea when the game is playing?
    • Change of focus: outcome of learning in playing
  • Nash equilibrium of the one shot game?
    • Nash equilibrium of the one-shot game: stable actions a with no regret for any alternate strategy x.
    • cost_i(x, a_-i) >= cost_i(a)
  • Behavior is far from stable
  • no regret without stability: learning
    • no regret: for any fixed action x (cost \in [0,1]):
      • sum_t(cost_i(a^t)) <= sum_t(cost_i(x, a_-i^t)) + error
      • error <= √T (if o(T) called no-regret)
  • Outcome of no-regret learning in a fixed game
    • limit distribution sigma of play (action vectors a = (a1, a2,…,an))
  • No regret leanring as a behavior model:
    •  Pro:
      • no need for common prior or rationality assumption on opponents
      • behavioral assumption: if there is a consistently good strategy: please notice!
      • algorithm: many simple rules ensure regret approx.
      • Behavior model ….
  • Distribution of smallest rationalizable multiplicative regret
    • strictly positive regret: learning phase maybe better than no-regret
  • Today (with d options):
    • sum_t(cost_i(a^t)) <= sum_t(cost_i(x, a_-i^t)) + √Tlogd
    • sum_t(cost_i(a^t)) <= (1 + epsilon)sum_t(cost_i(x, a_-i^t)) + log(d)/epsilon
  • Quality of learning outcome
  • Proof technique: Smoothness
  • Learning and price of anarchy
  • Learning in dynamic games
    • Dynamic population model
      • at each step t each player I is replaced with an arbitrary new player with probability p
    • how should they learn from data?
  • Need for adaptive learning
  • Adapting result to dynamic populations
    • inequality we wish to have
  • Change in optimum solution
  • Use differential privacy -> stable solution

PointNet, PointNet++, and PU-Net

point cloud -> deep network -> classification / segmentation / super-resolution

traditional classification / segmentation:

projection onto 2D plane and use 2D classification / segmentation

unordered set

point(Vec3) -> feature vector (Vec5) -> normalize (end with the bound of the pointcloud)

N points:

segmentation:

feature from N points ->NxK classes of each point (each point will have a class)

classification:

feature from N points -> K x 1 vector (K classes)