Striatum (https://github.com/ntucllab/striatum) is a Python package for contextual bandit.
What is contextual bandit?
It’s a simple special case in reinforcement learning. In contextual bandit, we take one action when an instance of data comes, and then get the reward, update the model, and don’t consider how the action affect the future decision. If you are not familiar with reinforcement learning, I’m pretty sure you totally don’t understand what I’m talking about, so I provide some tutorial (Chinese) from one of the developer, taweihuang, in the Striatum project:
- basic definition and theory of multi-armed bandit
- 賭徒的人工智慧1：吃角子老虎 (Bandit) 問題
What does striatum mean?
Everyone asks me this question when I introduce Striatum. Striatum is 紋狀體 in Chinese. Actually, I’m not very sure what it is because it’s yangarbiter who gave the name. I think it’s a part of brain that is related to the reward system of humans, which is important for us to learn things. Contextual bandit is also a kind of reward system, so we decided to use this name.
Why do we want Striatum?
Before we start this project, there was another Python package, StreamingBandit, that implements some contextual bandit algorithms. It provides web interface, so you can fill some core part of the algorithm, and then run the experiment. However, it didn’t modularize well, so I think it’s very hard to be used in other applications.
Therefore, we aim to build a library that is very flexible, and has enough performance to be used in production. In many real-world application (e.g., recommender system), the number of actions and the number of request per second can be very large, so we should manage the model storage, action storage and history storage very carefully. Because the storage management is very different among systems, we make it possible for user to define how to do it:
- model storage: how to store the model after updating and how to load the model
- history storage: how to store the request and find it when we get the (delayed) reward
- action storage: how to add/remove actions and define some special properties of each action
Striatum only provides the core contextual bandit algorithms, and some common storage manipulation (e.g., in-memory storage). I actually have tried it with Django’s ORM, and implementing it is easy.
History and current situation
contextual bandit 其實有相當多的應用，例如各式各樣需要 online optimize reward 的推薦系統。也有很多 reinforcement learning 可以在捨去某些資訊後變成 contextual bandit。我覺得如果能有好用的 contextual bandit library，應該會有很大的幫助，於是就開始開發。
我到目前為止主導了 Striatum 的各種 interface 設計，和開發方向，taweihuang 在中間幫忙了一段時間，寫出了一些 simulation 和 algorithm 的 prototype (我後來改了很久XD 因為真的很 prototype)。
目前比較嚴重的問題是，EXP4 這演算法因為一開始寫得不太好，再加上不是很實用，後來經過比較大的架構改變後，就懶得把它修好了。另外就是 model storage 沒有針對不同演算法做優化，導致使用 DB 時會不斷進行整個 model 的 save/load，在 LinUCB 問題特別嚴重。以下開放報名接手解決~~~