flexs.baselines.explorers.dqn¶
DQN explorer.
-
class
flexs.baselines.explorers.dqn.
DQN
(model, rounds, sequences_batch_size, model_queries_per_batch, starting_sequence, alphabet, log_file=None, memory_size=100000, train_epochs=20, gamma=0.9, device='cpu')[source]¶ Bases:
flexs.explorer.Explorer
DQN explorer class.
DQN Explorer implementation, based off https://colab.research.google.com/drive/1NsbSPn6jOcaJB_mp9TmkgQX7UrRIrTi0.
Algorithm works as follows: for N experiment rounds
collect samples with policy policy updates using Q network:
Q(s, a) <- Q(s, a) + alpha * (R(s, a) + gamma * max Q(s, a) - Q(s, a))
-
pick_action
(all_measured_seqs)[source]¶ Pick an action.
Generates a new string representing the state, along with its associated reward.
-
propose_sequences
(measured_sequences_data)[source]¶ Propose top sequences_batch_size sequences for evaluation.
- Return type
Tuple
[ndarray
,ndarray
]
-