Tech Help Help a non-programmer install GPT-2 (and its prerequisites) [Archive]

Bohandas

2020-09-04, 04:07 AM

I've been trying to install GPT-2 (https://github.com/minimaxir/gpt-2-simple), the pregram described in this thread (https://forums.giantitp.com/showthread.php?595533-Co-DMing-with-an-AI&p=24254144), but I have no idea even how to install its prerequisites like Tensorflow and Jupyter, or even any clear idea of what, specifically, those other programs do.

Can somebody give me step by step instructions on how to set this stuff up?

NichG

2020-09-04, 07:25 AM

I think I used the PyTorch version (which is now included in the transformers package https://github.com/huggingface/transformers). It's probably also useful to install jupyter so you can work with notebooks. Basically, install a python3 environment, install pip for python3 if it isn't done automatically, and then (either on console in Linux, or using the console that comes with Python3 in Windows iirc):

pip install torch transformers jupyter

If you want to use tensorflow and the old repository I linked in the other thread, it's just:

pip install tensorflow

Maybe just using this would be easier: https://github.com/graykode/gpt-2-Pytorch
A bonus of using that codebase is that they have a Google Colab set up already that you should be able to just run (https://colab.research.google.com/github/graykode/gpt-2-Pytorch/blob/master/GPT2_Pytorch.ipynb)
I'm not sure it's set up for fine-tuning to your own data though, but I bet you can find an existing Colab notebook somewhere that is (tutorial here: https://towardsdatascience.com/fine-tuning-gpt2-on-colab-gpu-for-free-340468c92ed but it looks like it requires coding). Also saves you the trouble of dealing with hardware requirements on your end. Just be sure that if you fine-tune it, download the model before your session times out so you can use it again in the future without having to retrain!

Bohandas

2020-09-04, 03:16 PM

has anybody built a frontend for this?

NichG

2020-09-04, 08:37 PM

There are some online, but they're expensive to host and so have been going commercial. There used to be 'talk to Transformer' but it became a commercial app called InferKit. There's also AI Dungeon, which is a more narrow application but the GPT-2 version is still free to interact with I think.

Fine-tuning in particular takes a few hours of training on GPU, so you'd certainly be paying for it if someone else were hosting the model.

So, I suppose there's probably a business opportunity here to make an all-in-one package you could just download, train, and generate with.

Bohandas

2020-09-04, 10:16 PM

Would the procedure you described run locally?

NichG

2020-09-05, 02:26 AM

Which? The Colab stuff runs on Google resources, but its free. If you install Python/PyTorch/etc, it runs locally on your machine. AI Dungeon and the like run on their servers.

Bohandas

2020-09-06, 06:20 AM

I was referring to the python version. I think I'll take another shot at trying to install it later today. I don't have a lot of faith in my chances though. Every tutorial I've found so far seems to assume that the user knows how to use Python.

(are those code blocks in the github page you linked supposed to be typed into the python console? Or are they supposed to be copied and pasted into a .py file or something like that?

NichG

2020-09-06, 07:02 AM

I was referring to the python version. I think I'll take another shot at trying to install it later today. I don't have a lot of faith in my chances though. Every tutorial I've found so far seems to assume that the user knows how to use Python.

(are those code blocks in the github page you linked supposed to be typed into the python console? Or are they supposed to be copied and pasted into a .py file or something like that?

Yeah, these things are still pretty much programming-required outside of the occasional web service people launch... But if you're up for it, the programming stuff you need is relatively simple compared to other kinds of programming (well, depending on how good the developers of each implementation were at compartmentalizing things at least). I suppose part of it is that Python tends to be run as a script rather than compiling into standalone executables like C++/etc stuff tend to be, so people leave a lot more of the programmer-level interface things exposed when sharing Python stuff. E.g. people'll just write command-line options before implementing GUIs, and rather than making installers that come with all the libraries packaged along-side, they'll just assume you'll install the needed stuff in your development environment on your own so they don't have to ship a 100+mb Tensorflow library alongside a 3kb source file :)

As far as the code blocks go:

If you're using Jupyter, you have something like an interactive code editor where you can paste things in blocks, and hit Shift-Enter to run the blocks one at a time. It's convenient since you can put boilerplate up at the top of the file, but you can run a bunch of different commands having executed all the boilerplate up to that point. Once it's set up, it runs a server on your machine and connects to the notebook over your web-browser. You could also put it in a .py file, but then you need to make sure you have stuff to control the .py file on the commandline, which would take a bit more programming.

Bohandas

2020-09-06, 05:28 PM

Do you know of any youtube videos that show every step of the process from beginning to end? I found a few tutorials but they all seem to skip steps

Or any scripts that do the whole thing?

Or anywhere where I can download a zip file of a directory with everything set up and ready to go? (like the completely set up copy of python itself with tensorflow etc plus all the scripts needed with nothing omitted)

EDIT:
And can anybody explain why none of the scripts on either https://github.com/nshepperd/gpt-2 or https://github.com/rish-16/gpt2client seem to do anything? Do they need to be in a certain folder? Do I need to drag a file onto them? Do they still need to have a bunch of other stuff set up first

EDIT:
Or is there anywhere where I can get it explained step-by step as if I were a small child or a rubber duck

Bohandas

2020-09-06, 08:45 PM

Ok. Update I've been following the directions at https://medium.com/@ngwaifoong92/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f

I tried to do a preliminary test of it after Part 1 by skipping the fine-tuning stage (sections 2 and 3) and going straight to Step 4

HOWEVER I keep getting the error message:

Traceback (most recent call last):
File "generate_unconditional_samples.py", line 7, in <module>
import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'

when I type:

C:\Programs\gpt-2-finetuning\src>python generate_unconditional_samples.py --model_name 117M

NichG

2020-09-06, 08:50 PM

Ok. Update I've been following the directions at https://medium.com/@ngwaifoong92/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f

I tried to do a preliminary test of it after Part 1 by skipping the fine-tuning stage (sections 2 and 3) and going straight to Step 4

HOWEVER I keep getting the error message:

Traceback (most recent call last):
File "generate_unconditional_samples.py", line 7, in <module>
import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'

when I type:

C:\Programs\gpt-2-finetuning\src>python generate_unconditional_samples.py --model_name 117M

That looks like you didn't install tensorflow yet. Did you do the thing with pip?

For the thing with gpt2client, that's a library that you install and can then import to another Python program. So you install that with pip (or by downloading and doing python setup.py --install), and then write some code to import and call it (like the example code blocks they provide). The advantage of that library is that it's potentially just five lines of code to use it: one to import the library, one to instantiate and download the model you want, one to load your text from a file, one to fine-tune, and one to generate new text.

The gpt-2 repository on the other hand looks like it comes with a bunch of specific scripts that are used to do different things with the model - so the train.py script to train/fine-tune, etc. It's probably a bit more complex to figure out which scripts you should be using yourself, and which are there as supporting/include files.

Bohandas

2020-09-06, 09:47 PM

Ok, so how do I install tensorflow? I've tried looking it up and got like a dozen different answers, none of which I fully understood.

Is there a way to do it just by typing things into the console like with the other modules mentioned in the gtp tutorial I found earlier (https://medium.com/@ngwaifoong92/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f)?

EDIT:
Can you tell me what went wrong here:

C:\Programs\gpt-2-finetuning\src>pip3 install --upgrade tensorflow
ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none)
ERROR: No matching distribution found for tensorflow

EDIT:
Ok, figured that one out. Apparently Tensorflow isn't compatible with Python 3.8 so I had to downgrade it to 3.6

I seem to have gotten Tensorflow installed now but now I'm getting another error which I will write down tomorrow after I get some sleep

NichG

2020-09-07, 01:39 AM

Ok, so how do I install tensorflow? I've tried looking it up and got like a dozen different answers, none of which I fully understood.

Is there a way to do it just by typing things into the console like with the other modules mentioned in the gtp tutorial I found earlier (https://medium.com/@ngwaifoong92/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f)?

EDIT:
Can you tell me what went wrong here:

C:\Programs\gpt-2-finetuning\src>pip3 install --upgrade tensorflow
ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none)
ERROR: No matching distribution found for tensorflow

EDIT:
Ok, figured that one out. Apparently Tensorflow isn't compatible with Python 3.8 so I had to downgrade it to 3.6

I seem to have gotten Tensorflow installed now but now I'm getting another error which I will write down tomorrow after I get some sleep

Okay, glad you're making progress at least!

Bohandas

2020-09-07, 02:57 PM

Ok, here's the new issue

C:\Programs\gpt-2-finetuning\src>python generate_unconditional_samples.py --model_name lyric 117M
2020-09-07 15:33:03.364750: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-09-07 15:33:03.395537: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "generate_unconditional_samples.py", line 9, in <module>
import model, sample, encoder
File "C:\Programs\gpt-2-finetuning\src\model.py", line 3, in <module>
from tensorflow.contrib.training import HParams
ModuleNotFoundError: No module named 'tensorflow.contrib'

Bohandas

2020-09-07, 05:24 PM

https://www.youtube.com/watch?v=rSCBvu_kijo

Got it running after upgrading python to 3.7.9 and DOWNGRADING Tensorflow to 1.7 (https://stackoverflow.com/questions/48435006/modulenotfounderror-no-module-named-tensorflow-contrib-lite-toco-python)

EDIT:
Now I just need to test the fine-tuning function

Bohandas

2020-09-10, 02:26 AM

New problem. My computer has crashed twice while running this (although not before I got some nice outputs). It stopped responding and the monitor glitched out in such a way that it looked like it was a painting that someone had smeared with a rag.

Any advice on stopping it from crashing?

NichG

2020-09-10, 01:01 PM

New problem. My computer has crashed twice while running this (although not before I got some nice outputs). It stopped responding and the monitor glitched out in such a way that it looked like it was a painting that someone had smeared with a rag.

Any advice on stopping it from crashing?

Sounds like the GPU got pushed too hard. You could try running on CPU and see if that fixes it. Also if you can look at the GPU load (on Linux this would be nvidia-smi, not sure where it is in Windows) then you can see if you're using too much memory or running hot or something. If it's a memory issue, you might be able to ask Tensorflow to reserve less (https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory).

Bohandas

2020-09-15, 03:23 AM

I think it may have been use of the 355M model that was overtaxing my machine. I've only used the 117M model since then and have had no crashes since then, even after leaving the computer training the model for nearly 24 hours.

On that note, how long do you think is an appropriate amount to train the program for? Is 1000 iterations too many? Also, do you know if they have a listing anywhere of recommended system specs to run the medium and large models?

NichG

2020-09-15, 03:37 AM

I think it may have been use of the 355M model that was overtaxing my machine. I've only used the 117M model since then and have had no crashes since then, even after leaving the computer training the model for nearly 24 hours.

On that note, how long do you think is an appropriate amount to train the program for? Is 1000 iterations too many? Also, do you know if they have a listing anywhere of recommended system specs to run the medium and large models?

It depends on the dataset, but if you're fine-tuning I'd tend towards fewer rather than more. The proper way to do it is to check for overfitting: have a hold-out set of data, and evaluate the model on the hold-out set when you train, and then stop when the hold-out set starts to get worse rather than better. I'd think more like 5 or 10 iterations rather than 1000 for fine-tuning unless you have a large amount of data to fine-tune with.

I don't have detailed specs, but my guess is you need at least 8gb of memory on the card, if not 12gb, for the 355M model. For the 1.5 billion parameter model it supposedly requires 32gb, so...
However, you could squeeze out a bit more if you quantize the model to 16 bit floats rather than doubles (this may require coding). Training takes more memory than inference, as well (maybe about a factor of 2 difference, though there are methods to reduce that such as gradient checkpointing, so if the library you're using implements those it won't be quite so bad).

Bohandas

2020-09-15, 03:17 PM

I'm gonna have to go back and check if I even have this configured to use GPU assistance at all. (Is there any quick way to check this, btw?)

EDIT:
Also, is there any way to increase the outputted sample length ab0ve 1024?

NichG

2020-09-15, 09:54 PM

I'm gonna have to go back and check if I even have this configured to use GPU assistance at all. (Is there any quick way to check this, btw?)

EDIT:
Also, is there any way to increase the outputted sample length ab0ve 1024?

To see if you're using GPU, easiest would be to just look at the nvidia control panel and see if gpu memory usage goes up when you run the thing.

You can modify the generate_unconditional_samples.py thing to specify a certain length:

def sample_model(
model_name='117M',
seed=None,
nsamples=0,
batch_size=1,
length=None,
temperature=1,
top_k=0,
top_p=0.0
):

Change length=None to whatever value you want.

Also, inside train.py there are a bunch of commandline arguments. This one is relevant to memory stuff: --memory_saving_gradients

Bohandas

2020-09-19, 12:30 AM

I'd think more like 5 or 10 iterations rather than 1000 for fine-tuning unless you have a large amount of data to fine-tune with.

The .txt file is about 8 megabytes.

NichG

2020-09-19, 06:39 AM

The .txt file is about 8 megabytes.

So I guess I'd try maybe 10 iterations and see where that gets, and push further if its still too generic.

Bohandas

2020-09-28, 02:26 AM

Regarding the interactive sample program (https://github.com/nshepperd/gpt-2/blob/finetuning/src/interactive_conditional_samples.py). The tutorial (https://medium.com/@ngwaifoong92/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f) said something about modifying it to take text files as inputs in order to allow inputs with line breaks. Do you have any idea how to do that.

EDIT:
Additionally, is there any way to use a saved model older than the latest one? The checkpoint file doesn't seem to spawn copies when the training program autosaves the way the other files do.

NichG

2020-09-28, 04:55 AM

Regarding the interactive sample program (https://github.com/nshepperd/gpt-2/blob/finetuning/src/interactive_conditional_samples.py). The tutorial (https://medium.com/@ngwaifoong92/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f) said something about modifying it to take text files as inputs in order to allow inputs with line breaks. Do you have any idea how to do that.

EDIT:
Additionally, is there any way to use a saved model older than the latest one? The checkpoint file doesn't seem to spawn copies when the training program autosaves the way the other files do.

So there's a bit in the code that says:

while True:
raw_text = input("Model prompt >>> ")
while not raw_text:
print('Prompt should not be empty!')
raw_text = input("Model prompt >>> ")

That's what's getting the text that you're going to process. You could replace that with something like:

f = open(filename,"r")
raw_text = f.read()
f.close()

Note that Python is picky about indentation, so when you remove that outer 'while True:' loop you have to make sure that everything under it is de-indented to the same level correctly. E.g. if you had

while True:
aaa
while False:
bbb
ccc
ddd

Then removing the outer while loop, it would become:

aaa
while False:
bbb
ccc
ddd

As far as checkpoints, GPT-2 checkpoints should be rather large, no? So it's not surprising if they're not storing each one separately. So you'd have to pick which checkpoints you want to preserve somehow (e.g. copy them to a separate file during the training process, for example).

Bohandas

2020-10-01, 11:33 PM

So there's a bit in the code that says:

while True:
raw_text = input("Model prompt >>> ")
while not raw_text:
print('Prompt should not be empty!')
raw_text = input("Model prompt >>> ")

That's what's getting the text that you're going to process. You could replace that with something like:

f = open(filename,"r")
raw_text = f.read()
f.close()

Note that Python is picky about indentation, so when you remove that outer 'while True:' loop you have to make sure that everything under it is de-indented to the same level correctly. E.g. if you had

while True:
aaa
while False:
bbb
ccc
ddd

Then removing the outer while loop, it would become:

aaa
while False:
bbb
ccc
ddd

so would the "r" part be the name of the file I'm going to have it read? Or do I replace the word "filename" with the filename? And if so, do I need to put the full filepath in?

As far as checkpoints, GPT-2 checkpoints should be rather large, no? So it's not surprising if they're not storing each one separately. So you'd have to pick which checkpoints you want to preserve somehow (e.g. copy them to a separate file during the training process, for example).

I don't mean the model file, I mean the file actually named "checkpoint" which is only 1 kb

NichG

2020-10-04, 11:17 AM

The "r" thing tells it to read the file ("w" means write, "rb" means read binary, etc). Replace filename with the actual name of the file (in quotes).

If the checkpoints are only 1kb, I don't think you can generate anything from them. It must be some other kind of statistics tracking output file?

Bohandas

2020-10-04, 10:18 PM

Ok, I've got:

#!/usr/bin/env python3

import fire
import json
import os
import numpy as np
import tensorflow as tf

import model, sample, encoder

def interact_model(
model_name='117M',
seed=None,
nsamples=1,
batch_size=1,
length=None,
temperature=1,
top_k=0,
top_p=0.0
):
"""
Interactively run the model
:model_name=117M : String, which model to use
:seed=None : Integer seed for random number generators, fix seed to reproduce
results
:nsamples=1 : Number of samples to return total
:batch_size=1 : Number of batches (only affects speed/memory). Must divide nsamples.
:length=None : Number of tokens in generated text, if None (default), is
determined by model hyperparameters
:temperature=1 : Float value controlling randomness in boltzmann
distribution. Lower temperature results in less random completions. As the
temperature approaches zero, the model will become deterministic and
repetitive. Higher temperature results in more random completions.
:top_k=0 : Integer value controlling diversity. 1 means only 1 word is
considered for each step (token), resulting in deterministic completions,
while 40 means 40 words are considered at each step. 0 (default) is a
special setting meaning no restrictions. 40 generally is a good value.
:top_p=0.0 : Float value controlling diversity. Implements nucleus sampling,
overriding top_k if set to a value > 0. A good setting is 0.9.
"""
if batch_size is None:
batch_size = 1
assert nsamples % batch_size == 0

enc = encoder.get_encoder(model_name)
hparams = model.default_hparams()
with open(os.path.join('models', model_name, 'hparams.json')) as f:
hparams.override_from_dict(json.load(f))

if length is None:
length = hparams.n_ctx // 2
elif length > hparams.n_ctx:
raise ValueError("Can't get samples longer than window size: %s" % hparams.n_ctx)

with tf.Session(graph=tf.Graph()) as sess:
context = tf.placeholder(tf.int32, [batch_size, None])
np.random.seed(seed)
tf.set_random_seed(seed)
output = sample.sample_sequence(
hparams=hparams, length=length,
context=context,
batch_size=batch_size,
temperature=temperature, top_k=top_k, top_p=top_p
)

saver = tf.train.Saver()
ckpt = tf.train.latest_checkpoint(os.path.join('models', model_name))
saver.restore(sess, ckpt)

f = open("sample.txt","r")
raw_text = f.read()
f.close()
context_tokens = enc.encode(raw_text)
generated = 0
for _ in range(nsamples // batch_size):
out = sess.run(output, feed_dict={
context: [context_tokens for _ in range(batch_size)]
})[:, len(context_tokens):]
for i in range(batch_size):
generated += 1
text = enc.decode(out[i])
print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
print(text)
print("=" * 80)

if __name__ == '__main__':
fire.Fire(interact_model)

But it's outputting the error:

Traceback (most recent call last):
File "text_insert_conditional_samples.py", line 73, in <module>
context_tokens = enc.encode(raw_text)
NameError: name 'enc' is not defined

NichG

2020-10-05, 08:22 AM

Ok, I've got:

...

Looks like you removed one too many indents, so the stuff that used to be in 'def interact_model' is now just being executed. Try:

#!/usr/bin/env python3

import fire
import json
import os
import numpy as np
import tensorflow as tf

import model, sample, encoder

def interact_model(
model_name='117M',
seed=None,
nsamples=1,
batch_size=1,
length=None,
temperature=1,
top_k=0,
top_p=0.0
):
"""
Interactively run the model
:model_name=117M : String, which model to use
:seed=None : Integer seed for random number generators, fix seed to reproduce
results
:nsamples=1 : Number of samples to return total
:batch_size=1 : Number of batches (only affects speed/memory). Must divide nsamples.
:length=None : Number of tokens in generated text, if None (default), is
determined by model hyperparameters
:temperature=1 : Float value controlling randomness in boltzmann
distribution. Lower temperature results in less random completions. As the
temperature approaches zero, the model will become deterministic and
repetitive. Higher temperature results in more random completions.
:top_k=0 : Integer value controlling diversity. 1 means only 1 word is
considered for each step (token), resulting in deterministic completions,
while 40 means 40 words are considered at each step. 0 (default) is a
special setting meaning no restrictions. 40 generally is a good value.
:top_p=0.0 : Float value controlling diversity. Implements nucleus sampling,
overriding top_k if set to a value > 0. A good setting is 0.9.
"""
if batch_size is None:
batch_size = 1
assert nsamples % batch_size == 0

enc = encoder.get_encoder(model_name)
hparams = model.default_hparams()
with open(os.path.join('models', model_name, 'hparams.json')) as f:
hparams.override_from_dict(json.load(f))

if length is None:
length = hparams.n_ctx // 2
elif length > hparams.n_ctx:
raise ValueError("Can't get samples longer than window size: %s" % hparams.n_ctx)

with tf.Session(graph=tf.Graph()) as sess:
context = tf.placeholder(tf.int32, [batch_size, None])
np.random.seed(seed)
tf.set_random_seed(seed)
output = sample.sample_sequence(
hparams=hparams, length=length,
context=context,
batch_size=batch_size,
temperature=temperature, top_k=top_k, top_p=top_p
)

saver = tf.train.Saver()
ckpt = tf.train.latest_checkpoint(os.path.join('models', model_name))
saver.restore(sess, ckpt)

f = open("sample.txt","r")
raw_text = f.read()
f.close()
context_tokens = enc.encode(raw_text)
generated = 0
for _ in range(nsamples // batch_size):
out = sess.run(output, feed_dict={
context: [context_tokens for _ in range(batch_size)]
})[:, len(context_tokens):]
for i in range(batch_size):
generated += 1
text = enc.decode(out[i])
print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
print(text)
print("=" * 80)

if __name__ == '__main__':
fire.Fire(interact_model)

Bohandas

2020-10-13, 11:54 PM

That, plus changing a bunch of tabs to spaces, worked. I got it working shortly after you posted it.

One more question. Is there any way to make it automatically log output text to a file? I recently lost some outputs because I forgot to copy and paste before closing the program.

NichG

2020-10-14, 06:17 AM

That, plus changing a bunch of tabs to spaces, worked. I got it working shortly after you posted it.

One more question. Is there any way to make it automatically log output text to a file? I recently lost some outputs because I forgot to copy and paste before closing the program.

The line that says 'print(text)' is where the outputs are being generated. So you could instead write that to a file, like (at the same indent level where print is):

f = open("output.log","a")
f.write(text)
f.close()

Or if you're running this from a console, you can add '> output.log' to print to a file rather than printing to the console. Like 'python whatever_program_file.py arg1 arg2 arg3 > output.log' where whatever_program_file.py is the name of this code file, and the arg1 arg2 arg3 are whatever and however many command-line arguments you're currently using.

Bohandas

2020-11-04, 01:05 AM

Additionally, is there any way to use a saved model older than the latest one? The checkpoint file doesn't seem to spawn copies when the training program autosaves the way the other files do.

I figured this one out btw. It can be opened with notepad and apparently it just points the program to the correct name for the data file, so if it said something like:

model_checkpoint_path: "model-221"
all_model_checkpoint_paths: "model-210"
all_model_checkpoint_paths: "model-221"

and you were using an earlier model file you would just change the "221" there to the correct number

Bohandas

2021-02-25, 02:45 AM

I can't remember what button to press to interrupt the generation process

NichG

2021-02-25, 04:15 AM

Ctrl-C will kill a process. Is there a built in thing?

Bohandas

2021-03-27, 06:40 PM

Ctrl-C will kill a process. Is there a built in thing?

Thank you. That's what I was thinking of

So there's a bit in the code that says:

while True:
raw_text = input("Model prompt >>> ")
while not raw_text:
print('Prompt should not be empty!')
raw_text = input("Model prompt >>> ")

That's what's getting the text that you're going to process. You could replace that with something like:

f = open(filename,"r")
raw_text = f.read()
f.close()

Note that Python is picky about indentation, so when you remove that outer 'while True:' loop you have to make sure that everything under it is de-indented to the same level correctly. E.g. if you had

while True:
aaa
while False:
bbb
ccc
ddd

Then removing the outer while loop, it would become:

aaa
while False:
bbb
ccc
ddd

As far as checkpoints, GPT-2 checkpoints should be rather large, no? So it's not surprising if they're not storing each one separately. So you'd have to pick which checkpoints you want to preserve somehow (e.g. copy them to a separate file during the training process, for example).

Is there any simple way to get this to keep spitting out samples over and over again. setting a value for nsamples in the console doesn't seem to work.

NichG

2021-03-27, 11:37 PM

Thank you. That's what I was thinking of

Is there any simple way to get this to keep spitting out samples over and over again. setting a value for nsamples in the console doesn't seem to work.

Based on the code I pasted before, it looks like nsamples would work. Can you paste the current version of your code?

Edit: Okay, maybe the issue is that you're setting nsamples in the console. You have to change what you pass to that function, wherever it gets called. You could change nsamples=1 in the function definition to, say, nsamples = 10 or whatever. Or pass it as an argument in the console: interact_model(nsamples = ...)

Bohandas

2021-03-28, 02:59 PM

I tried changing it to 3 in the code and it still just displayed one thing. I think maybe it's processing several but not displaying them.

Here's the code as it stands

#!/usr/bin/env python3

import fire
import json
import os
import numpy as np
import tensorflow as tf

import model, sample, encoder

def interact_model(
model_name='117M',
seed=None,
nsamples=3,
batch_size=1,
length=None,
temperature=1,
top_k=0,
top_p=0.0
):
"""
Interactively run the model
:model_name=117M : String, which model to use
:seed=None : Integer seed for random number generators, fix seed to reproduce
results
:nsamples=1 : Number of samples to return total
:batch_size=1 : Number of batches (only affects speed/memory). Must divide nsamples.
:length=None : Number of tokens in generated text, if None (default), is
determined by model hyperparameters
:temperature=1 : Float value controlling randomness in boltzmann
distribution. Lower temperature results in less random completions. As the
temperature approaches zero, the model will become deterministic and
repetitive. Higher temperature results in more random completions.
:top_k=0 : Integer value controlling diversity. 1 means only 1 word is
considered for each step (token), resulting in deterministic completions,
while 40 means 40 words are considered at each step. 0 (default) is a
special setting meaning no restrictions. 40 generally is a good value.
:top_p=0.0 : Float value controlling diversity. Implements nucleus sampling,
overriding top_k if set to a value > 0. A good setting is 0.9.
"""
if batch_size is None:
batch_size = 1
assert nsamples % batch_size == 0

enc = encoder.get_encoder(model_name)
hparams = model.default_hparams()
with open(os.path.join('models', model_name, 'hparams.json')) as f:
hparams.override_from_dict(json.load(f))

if length is None:
length = hparams.n_ctx // 2
elif length > hparams.n_ctx:
raise ValueError("Can't get samples longer than window size: %s" % hparams.n_ctx)

with tf.Session(graph=tf.Graph()) as sess:
context = tf.placeholder(tf.int32, [batch_size, None])
np.random.seed(seed)
tf.set_random_seed(seed)
output = sample.sample_sequence(
hparams=hparams, length=length,
context=context,
batch_size=batch_size,
temperature=temperature, top_k=top_k, top_p=top_p
)

saver = tf.train.Saver()
ckpt = tf.train.latest_checkpoint(os.path.join('models', model_name))
saver.restore(sess, ckpt)

f = open("sample.txt","r")
raw_text = f.read()
f.close()
context_tokens = enc.encode(raw_text)
generated = 0
for _ in range(nsamples // batch_size):
out = sess.run(output, feed_dict={
context: [context_tokens for _ in range(batch_size)]
})[:, len(context_tokens):]
for i in range(batch_size):
generated += 1
text = enc.decode(out[i])
print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
print(text)
print("=" * 80)

if __name__ == '__main__':
fire.Fire(interact_model)

NichG

2021-03-28, 08:54 PM

It looks like maybe you dropped an indent. Instead of:

for _ in range(nsamples // batch_size):
out = sess.run(output, feed_dict={
context: [context_tokens for _ in range(batch_size)]
})[:, len(context_tokens):]
for i in range(batch_size):
generated += 1
text = enc.decode(out[i])
print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
print(text)

I think maybe it should be:

for _ in range(nsamples // batch_size):
out = sess.run(output, feed_dict={
context: [context_tokens for _ in range(batch_size)]
})[:, len(context_tokens):]
for i in range(batch_size):
generated += 1
text = enc.decode(out[i])
print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
print(text)

That second 'for' should be at the same indentation level as 'out = ...'

Peelee

2021-03-29, 09:03 AM

The Mod on the Silver Mountain: Thread timed out.

truemane

2021-03-30, 07:22 AM

Metamagic Mod: thread re-opened.

Bohandas

2021-05-01, 03:09 PM

It looks like maybe you dropped an indent. Instead of:

for _ in range(nsamples // batch_size):
out = sess.run(output, feed_dict={
context: [context_tokens for _ in range(batch_size)]
})[:, len(context_tokens):]
for i in range(batch_size):
generated += 1
text = enc.decode(out[i])
print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
print(text)

I think maybe it should be:

for _ in range(nsamples // batch_size):
out = sess.run(output, feed_dict={
context: [context_tokens for _ in range(batch_size)]
})[:, len(context_tokens):]
for i in range(batch_size):
generated += 1
text = enc.decode(out[i])
print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
print(text)

That second 'for' should be at the same indentation level as 'out = ...'

Thank you. This worked perfectly.

One other question. I've found that if I try to use the "--noise" argument while training a model I get an error message saying "AttributeError: module 'tensorflow' has no attribute 'random'". Have you had any experience with this?

NichG

2021-05-01, 06:48 PM

Thank you. This worked perfectly.

One other question. I've found that if I try to use the "--noise" argument while training a model I get an error message saying "AttributeError: module 'tensorflow' has no attribute 'random'". Have you had any experience with this?

Nope. I thought it could be because the library moved where it put the random stuff in its hierarchy, but I checked and tf.random should exist.

I should also say, I don't actually use this particular implementation anymore. I'd probably use Huggingface's implementations since they have a standardized interface, and they're expanding to things like a community-trained GPT-3 model for example. But if you finally have this one working I can understand not wanting to switch.

Bohandas

2021-06-08, 12:00 AM

I'm actually not sure if my computer would be able run GPT-3. It barely runs GPT-2.

NichG

2021-06-08, 12:56 AM

One of the ways to do this kind of stuff is to use Google Colab instances. You can basically use one for free for something like 12 hours a day, and you can get a GPU allocation that can run most of the stuff people are playing with. The code basically loads in a Jupyter Notebook interface and for a lot of the good ones, there's a simple 'enter prompt here and press Run All' kind of workflow to it.

I'm not sure there's one for the community GPT-3 yet, but these things exist for DALL-E and various other interfaces to CLIP (various kinds of 'specify a sentence and it will draw the image' models that have exploded this year).

Bohandas

2021-10-02, 02:26 PM

Do you know what the recommended system requirements are to run the 345M model locally, and/or what the minimum system requirements are to run the 345M model without risking a system crash (as happened the last time I attenpted to train the 345M model)

truemane

2021-10-02, 07:02 PM

Metamagic Mod: thread Necromancy