Help a non-programmer install GPT-2 (and its prerequisites)

**NichG** · 2020-10-14, 06:17 AM (ISO 8601)

Originally Posted by Bohandas

That, plus changing a bunch of tabs to spaces, worked. I got it working shortly after you posted it.

One more question. Is there any way to make it automatically log output text to a file? I recently lost some outputs because I forgot to copy and paste before closing the program.

The line that says 'print(text)' is where the outputs are being generated. So you could instead write that to a file, like (at the same indent level where print is):

Code:

f = open("output.log","a")
f.write(text)
f.close()

Or if you're running this from a console, you can add '> output.log' to print to a file rather than printing to the console. Like 'python whatever_program_file.py arg1 arg2 arg3 > output.log' where whatever_program_file.py is the name of this code file, and the arg1 arg2 arg3 are whatever and however many command-line arguments you're currently using.

**Bohandas** · 2020-11-04, 01:05 AM (ISO 8601)

Originally Posted by Bohandas

Additionally, is there any way to use a saved model older than the latest one? The checkpoint file doesn't seem to spawn copies when the training program autosaves the way the other files do.

I figured this one out btw. It can be opened with notepad and apparently it just points the program to the correct name for the data file, so if it said something like:

Code:

model_checkpoint_path: "model-221"
all_model_checkpoint_paths: "model-210"
all_model_checkpoint_paths: "model-221"

and you were using an earlier model file you would just change the "221" there to the correct number

**Bohandas** · 2021-02-25, 02:45 AM (ISO 8601)

I can't remember what button to press to interrupt the generation process

**NichG** · 2021-02-25, 04:15 AM (ISO 8601)

Ctrl-C will kill a process. Is there a built in thing?

**Bohandas** · 2021-03-27, 06:40 PM (ISO 8601)

Originally Posted by NichG

Ctrl-C will kill a process. Is there a built in thing?

Thank you. That's what I was thinking of

Originally Posted by NichG

So there's a bit in the code that says:

Code:

while True:
   raw_text = input("Model prompt >>> ")
   while not raw_text:
      print('Prompt should not be empty!')
      raw_text = input("Model prompt >>> ")

That's what's getting the text that you're going to process. You could replace that with something like:

Code:

f = open(filename,"r")
raw_text = f.read()
f.close()

Note that Python is picky about indentation, so when you remove that outer 'while True:' loop you have to make sure that everything under it is de-indented to the same level correctly. E.g. if you had

Code:

while True:
   aaa
   while False:
      bbb
   ccc
ddd

Then removing the outer while loop, it would become:

Code:

aaa
while False:
   bbb
ccc
ddd

As far as checkpoints, GPT-2 checkpoints should be rather large, no? So it's not surprising if they're not storing each one separately. So you'd have to pick which checkpoints you want to preserve somehow (e.g. copy them to a separate file during the training process, for example).

Is there any simple way to get this to keep spitting out samples over and over again. setting a value for nsamples in the console doesn't seem to work.

**NichG** · 2021-03-27, 11:37 PM (ISO 8601)

Originally Posted by Bohandas

Thank you. That's what I was thinking of

Is there any simple way to get this to keep spitting out samples over and over again. setting a value for nsamples in the console doesn't seem to work.

Based on the code I pasted before, it looks like nsamples would work. Can you paste the current version of your code?

Edit: Okay, maybe the issue is that you're setting nsamples in the console. You have to change what you pass to that function, wherever it gets called. You could change nsamples=1 in the function definition to, say, nsamples = 10 or whatever. Or pass it as an argument in the console: interact_model(nsamples = ...)

**Bohandas** · 2021-03-28, 02:59 PM (ISO 8601)

I tried changing it to 3 in the code and it still just displayed one thing. I think maybe it's processing several but not displaying them.

Here's the code as it stands

Code:

#!/usr/bin/env python3

import fire
import json
import os
import numpy as np
import tensorflow as tf

import model, sample, encoder

def interact_model(
    model_name='117M',
    seed=None,
    nsamples=3,
    batch_size=1,
    length=None,
    temperature=1,
    top_k=0,
    top_p=0.0
):
    """
    Interactively run the model
    :model_name=117M : String, which model to use
    :seed=None : Integer seed for random number generators, fix seed to reproduce
     results
    :nsamples=1 : Number of samples to return total
    :batch_size=1 : Number of batches (only affects speed/memory).  Must divide nsamples.
    :length=None : Number of tokens in generated text, if None (default), is
     determined by model hyperparameters
    :temperature=1 : Float value controlling randomness in boltzmann
     distribution. Lower temperature results in less random completions. As the
     temperature approaches zero, the model will become deterministic and
     repetitive. Higher temperature results in more random completions.
    :top_k=0 : Integer value controlling diversity. 1 means only 1 word is
     considered for each step (token), resulting in deterministic completions,
     while 40 means 40 words are considered at each step. 0 (default) is a
     special setting meaning no restrictions. 40 generally is a good value.
    :top_p=0.0 : Float value controlling diversity. Implements nucleus sampling,
     overriding top_k if set to a value > 0. A good setting is 0.9.
    """
    if batch_size is None:
        batch_size = 1
    assert nsamples % batch_size == 0

    enc = encoder.get_encoder(model_name)
    hparams = model.default_hparams()
    with open(os.path.join('models', model_name, 'hparams.json')) as f:
        hparams.override_from_dict(json.load(f))

    if length is None:
        length = hparams.n_ctx // 2
    elif length > hparams.n_ctx:
        raise ValueError("Can't get samples longer than window size: %s" % hparams.n_ctx)

    with tf.Session(graph=tf.Graph()) as sess:
        context = tf.placeholder(tf.int32, [batch_size, None])
        np.random.seed(seed)
        tf.set_random_seed(seed)
        output = sample.sample_sequence(
            hparams=hparams, length=length,
            context=context,
            batch_size=batch_size,
            temperature=temperature, top_k=top_k, top_p=top_p
        )

        saver = tf.train.Saver()
        ckpt = tf.train.latest_checkpoint(os.path.join('models', model_name))
        saver.restore(sess, ckpt)

        f = open("sample.txt","r")
        raw_text = f.read()
        f.close()
        context_tokens = enc.encode(raw_text)
        generated = 0
        for _ in range(nsamples // batch_size):
            out = sess.run(output, feed_dict={
			context: [context_tokens for _ in range(batch_size)]
		})[:, len(context_tokens):]
        for i in range(batch_size):
            generated += 1
            text = enc.decode(out[i])
            print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
            print(text)
        print("=" * 80)

if __name__ == '__main__':
    fire.Fire(interact_model)

**NichG** · 2021-03-28, 08:54 PM (ISO 8601)

It looks like maybe you dropped an indent. Instead of:

Code:

        for _ in range(nsamples // batch_size):
            out = sess.run(output, feed_dict={
			context: [context_tokens for _ in range(batch_size)]
		})[:, len(context_tokens):]
        for i in range(batch_size):
            generated += 1
            text = enc.decode(out[i])
            print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
            print(text)

I think maybe it should be:

Code:

        for _ in range(nsamples // batch_size):
            out = sess.run(output, feed_dict={
			context: [context_tokens for _ in range(batch_size)]
		})[:, len(context_tokens):]
            for i in range(batch_size):
                generated += 1
                text = enc.decode(out[i])
                print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
                print(text)

That second 'for' should be at the same indentation level as 'out = ...'

**Peelee** · 2021-03-29, 09:03 AM (ISO 8601)

The Mod on the Silver Mountain: Thread timed out.

**truemane** · 2021-03-30, 07:22 AM (ISO 8601)

Metamagic Mod: thread re-opened.

**Bohandas** · 2021-05-01, 03:09 PM (ISO 8601)

Originally Posted by NichG

It looks like maybe you dropped an indent. Instead of:

Code:

        for _ in range(nsamples // batch_size):
            out = sess.run(output, feed_dict={
			context: [context_tokens for _ in range(batch_size)]
		})[:, len(context_tokens):]
        for i in range(batch_size):
            generated += 1
            text = enc.decode(out[i])
            print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
            print(text)

I think maybe it should be:

Code:

        for _ in range(nsamples // batch_size):
            out = sess.run(output, feed_dict={
			context: [context_tokens for _ in range(batch_size)]
		})[:, len(context_tokens):]
            for i in range(batch_size):
                generated += 1
                text = enc.decode(out[i])
                print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
                print(text)

That second 'for' should be at the same indentation level as 'out = ...'

Thank you. This worked perfectly.

One other question. I've found that if I try to use the "--noise" argument while training a model I get an error message saying "AttributeError: module 'tensorflow' has no attribute 'random'". Have you had any experience with this?

**NichG** · 2021-05-01, 06:48 PM (ISO 8601)

Originally Posted by Bohandas

Thank you. This worked perfectly.

One other question. I've found that if I try to use the "--noise" argument while training a model I get an error message saying "AttributeError: module 'tensorflow' has no attribute 'random'". Have you had any experience with this?

Nope. I thought it could be because the library moved where it put the random stuff in its hierarchy, but I checked and tf.random should exist.

I should also say, I don't actually use this particular implementation anymore. I'd probably use Huggingface's implementations since they have a standardized interface, and they're expanding to things like a community-trained GPT-3 model for example. But if you finally have this one working I can understand not wanting to switch.

**Bohandas** · 2021-06-08, 12:00 AM (ISO 8601)

I'm actually not sure if my computer would be able run GPT-3. It barely runs GPT-2.

**NichG** · 2021-06-08, 12:56 AM (ISO 8601)

One of the ways to do this kind of stuff is to use Google Colab instances. You can basically use one for free for something like 12 hours a day, and you can get a GPU allocation that can run most of the stuff people are playing with. The code basically loads in a Jupyter Notebook interface and for a lot of the good ones, there's a simple 'enter prompt here and press Run All' kind of workflow to it.

I'm not sure there's one for the community GPT-3 yet, but these things exist for DALL-E and various other interfaces to CLIP (various kinds of 'specify a sentence and it will draw the image' models that have exploded this year).

**Bohandas** · 2021-10-02, 02:26 PM (ISO 8601)

Do you know what the recommended system requirements are to run the 345M model locally, and/or what the minimum system requirements are to run the 345M model without risking a system crash (as happened the last time I attenpted to train the 345M model)

**truemane** · 2021-10-02, 07:02 PM (ISO 8601)

Metamagic Mod: thread Necromancy

Thread: Help a non-programmer install GPT-2 (and its prerequisites)

Thread Tools

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Spoilers

Re: Help a non-programmer install GPT-2 (and its prerequisites)

Posting Permissions