flexs.baselines.explorers.dyna_ppo¶

DyNA-PPO explorer.

class flexs.baselines.explorers.dyna_ppo.DynaPPO(landscape, rounds, sequences_batch_size, model_queries_per_batch, starting_sequence, alphabet, log_file=None, model=None, num_experiment_rounds=10, num_model_rounds=1, env_batch_size=4)[source]¶

Bases: flexs.explorer.Explorer

Explorer which implements DynaPPO.

This RL-based sequence design algorithm works as follows:

for r in rounds:: train_policy(experimental_data_rewards[r]) for m in model_based_rounds:

train_policy(model_fitness_rewards[m])

An episode for the agent begins with an empty sequence, and at each timestep, one new residue is generated and added to the sequence until the desired length of the sequence is reached. The reward is zero at all timesteps until the last one, when the reward is reward = lambda * sequence_density + sequence_fitness where sequence density is the density of nearby sequences already proposed.

As described above, this explorer generates sequences constructively.

Paper: https://openreview.net/pdf?id=HklxbgBKvr

add_last_seq_in_trajectory(experience, new_seqs)[source]¶

Add the last sequence in an episode’s trajectory.

Given a trajectory object, checks if the object is the last in the trajectory. Since the environment ends the episode when the score is non-increasing, it adds the associated maximum-valued sequence to the batch.

If the episode is ending, it changes the “current sequence” of the environment to the next one in last_batch, so that when the environment resets, mutants are generated from that new sequence.

propose_sequences(measured_sequences_data)[source]¶

Propose top sequences_batch_size sequences for evaluation.

Return type: Tuple[ndarray, ndarray]

class flexs.baselines.explorers.dyna_ppo.DynaPPOEnsemble(seq_len, alphabet, r_squared_threshold=0.5, models=None)[source]¶

Bases: flexs.model.Model

Ensemble from DyNAPPO paper.

Ensembles many models together but only uses those with an $r^2$ above a certain threshold (on validation data) at test-time.

train(sequences, labels)[source]¶: Train the ensemble, calculating $r^2$ values on a holdout set.

class flexs.baselines.explorers.dyna_ppo.DynaPPOMutative(landscape, rounds, sequences_batch_size, model_queries_per_batch, starting_sequence, alphabet, log_file=None, model=None, num_experiment_rounds=10, num_model_rounds=1)[source]¶