Striatum – Contextual Bandit in Python

Striatum (https://github.com/ntucllab/striatum) is a Python package for contextual bandit.

What is contextual bandit?

It’s a simple special case in reinforcement learning. In contextual bandit, we take one action when an instance of data comes, and then get the reward, update the model, and don’t consider how the action affect the future decision. If you are not familiar with reinforcement learning, I’m pretty sure you totally don’t understand what I’m talking about, so I provide some tutorial (Chinese) from one of the developer, taweihuang, in the Striatum project:

  1. basic definition and theory of multi-armed bandit
  2. 賭徒的人工智慧1:吃角子老虎 (Bandit) 問題
  3. 賭徒的人工智慧2:臨床試驗中的吃角子老虎機
  4. 賭徒的人工智慧3:吃角子的投資策略

What does striatum mean?

Everyone asks me this question when I introduce Striatum. Striatum is 紋狀體 in Chinese. Actually, I’m not very sure what it is because it’s yangarbiter who gave the name. I think it’s a part of brain that is related to the reward system of humans, which is important for us to learn things. Contextual bandit is also a kind of reward system, so we decided to use this name.

Why do we want Striatum?

Before we start this project, there was another Python package, StreamingBandit, that implements some contextual bandit algorithms. It provides web interface, so you can fill some core part of the algorithm, and then run the experiment. However, it didn’t modularize well, so I think it’s very hard to be used in other applications.

Therefore, we aim to build a library that is very flexible, and has enough performance to be used in production. In many real-world application (e.g., recommender system), the number of actions and the number of request per second can be very large, so we should manage the model storage, action storage and history storage very carefully. Because the storage management is very different among systems, we make it possible for user to define how to do it:

  1. model storage: how to store the model after updating and how to load the model
  2. history storage: how to store the request and find it when we get the (delayed) reward
  3. action storage: how to add/remove actions and define some special properties of each action

Striatum only provides the core contextual bandit algorithms, and some common storage manipulation (e.g., in-memory storage). I actually have tried it with Django’s ORM, and implementing it is easy.

History and current situation

這 project 是我跟 yangarbiter 在 2016 年 5 月發起的,那時候,我們實驗室有很多人研究 contextual bandit,但就是沒有寫出一個 library。我個人其實覺得 contextual bandit 比 active learning 更好用,但我們實驗室之前主力開發的 open source 是 active learning 用的 libact

contextual bandit 其實有相當多的應用,例如各式各樣需要 online optimize reward 的推薦系統。也有很多 reinforcement learning 可以在捨去某些資訊後變成 contextual bandit。我覺得如果能有好用的 contextual bandit library,應該會有很大的幫助,於是就開始開發。

我到目前為止主導了 Striatum 的各種 interface 設計,和開發方向,taweihuang 在中間幫忙了一段時間,寫出了一些 simulation 和 algorithm 的 prototype (我後來改了很久XD 因為真的很 prototype)。

重點來了,其實我們算是已經沒有在開發了XD 發這篇文章其中一個目的就是看有沒有人要接手,最好是我們實驗室的人。沒在開發的原因很簡單,因為我現在手上沒有應用需要這個東西,而且短時間內的未來應該也沒有。而其他幾個開發者也差不多是這樣,所以就停滯了QQ

目前比較嚴重的問題是,EXP4 這演算法因為一開始寫得不太好,再加上不是很實用,後來經過比較大的架構改變後,就懶得把它修好了。另外就是 model storage 沒有針對不同演算法做優化,導致使用 DB 時會不斷進行整個 model 的 save/load,在 LinUCB 問題特別嚴重。以下開放報名接手解決~~~

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: