flexs.baselines.explorers.ppo

PPO explorer.

class flexs.baselines.explorers.ppo.PPO(model, rounds, sequences_batch_size, model_queries_per_batch, starting_sequence, alphabet, log_file=None)[source]

Bases: flexs.explorer.Explorer

Explorer which uses PPO.

The algorithm is:
for N experiment rounds

collect samples with policy train policy on samples

A simpler baseline than DyNAPPOMutative with similar performance.

add_last_seq_in_trajectory(experience, new_seqs)[source]

Add the last sequence in an episode’s trajectory.

Given a trajectory object, checks if the object is the last in the trajectory. Since the environment ends the episode when the score is non-increasing, it adds the associated maximum-valued sequence to the batch.

If the episode is ending, it changes the “current sequence” of the environment to the next one in last_batch, so that when the environment resets, mutants are generated from that new sequence.

propose_sequences(measured_sequences_data)[source]

Propose top sequences_batch_size sequences for evaluation.

Return type

Tuple[ndarray, ndarray]