CoCoGen

Language Models of Code are Few-Shot Commonsense Learners

The official repository for “Language Models of Code are Few-Shot Commonsense Learners” (Madaan et al., EMNLP’2022).

This paper addresses the general task of structured commonsense reasoning: generate a graph given a natural language input. We address these family of tasks by framing the problem as a code generation task, and prompting large language models of code (e.g., Codex).

| | |-|

Running CoCoGen
Sample Outputs
End-to-end workflow
Creating dynamic prompts (KST)

Running CoCoGen

An OpenAI API key is required to run the jobs. To get an API key, register at https://openai.com/blog/openai-codex/.

The key should be exported in the environment variable OPENAI_API_KEY.

Please note that as of Oct 2022, codex is free to use for non-commercial purposes with a key.

Proscript script generation:

python -u src/api/query_openai_over_tasks.py --task_file_path data/proscript_script_gen/dev.jsonl --num_tasks -1 \
                                                 --output_file_path data/proscript_script_gen/dev_outputs.jsonl \
                                                 --prompt_path  data/proscript_script_gen/prompt.txt --job_type proscript-literal \
                                                  --engine text-davinci-002 --max_requests_per_min 10

Here:

* `--task_file_path`: path to the file containing the tasks to be run
* `--num_tasks`: number of tasks to be run. If -1, all tasks in the file will be run.
* `--output_file_path`: path to the file to which the outputs will be written
* `--prompt_path`: path to the file containing the prompt. The prompt is created from the train split.
* `--job_type`: the type of job to run. This is used to determine the code converter. Currently, the following converters are supported: "explagraphs",         "proscript-hashmap-init", "proscript-hashmap-sep", "proscript-networkx", "proscript-literal", "proscript-oo"

* `--engine`: the API engine to use. See the complete list of available at https://beta.openai.com/docs/api-reference/introduction.

Proscript edge prediction:

python -u src/api/query_openai_over_tasks.py --task_file_path data/proscript_edge_prediction/dev.jsonl --num_tasks -1 \
                                                 --output_file_path data/proscript_edge_prediction/dev_outputs.jsonl \
                                                 --prompt_path  data/proscript_edge_prediction/prompt.txt --job_type "proscript-edge-pred-literal" \
                                                  --engine text-davinci-002 --max_requests_per_min 10

Evaluation:

python src/eval/edge_pred_eval.py  data/proscript_edge_prediction/dev_outputs.jsonl

Explagraphs:

python -u src/api/query_openai_over_tasks.py --task_file_path data/explagraphs/dev.jsonl --num_tasks -1 \
                                                 --output_file_path data/explagraphs//dev_outputs.jsonl \
                                                 --prompt_path  data/explagraphs//prompt.txt --job_type "explagraphs-literal" \
                                                  --engine text-davinci-002 --max_requests_per_min 10

Evaluate:

export CUDA_VISIBLE_DEVICES=0,1 && python src/eval/explagraph_eval.py data/explagraphs//dev_outputs.jsonl tmp

Propara:

python -u src/api/query_openai_over_tasks.py --task_file_path data/propara/test.jsonl --num_tasks -1 \
                                                 --output_file_path data/propara//test_outputs.jsonl \
                                                 --prompt_path  data/propara//prompt.txt --job_type "propara-func" \
                                                  --engine text-davinci-002 --max_requests_per_min 10

Evaluate:

python src/eval/propara/eval_results.py --raw_output_file data/propara//test_outputs.jsonl --output_file  data/propara/test_output_predictions.jsonl

Sample Outputs

Sample outputs for each task are located in outputs.

outputs/
├── explagraphs
├── propara
├── proscript_edge_pred
└── proscript_script_generation

Since we are not permitted by the authors of proscript to release the test split, we remove the reference outputs for proscript_script_generation and proscript_edge_prediction.

Output format

Each output file is a jsonl, where each line is a json object with several common fields:

{

    "prompt": "the dynamic prompt created for the task"",
    "reference_graph":  "the graph/table for the task (G in paper),
    "reference_code": "python code for the graph (T + G_c in paper))",
    "codex_response": { 
        response from codex
    },
    "generated_code": "code generated by codex",
    "generated_graph": {
        same format as reference_graph, obtained by parsing the generated code
    }
}

For example, for edge prediction:

{
    "scenario": "have bowl for cut ingredients",
    "closest_queries": [
        "mix ingredients together",
        "add the measured ingredients to bowl",
        "take out a cutting board and knife",
        "gather the ingredients",
        "measure out ingredients",
        "put all ingredients into bowl and mix",
        "prepare the ingredients with a knife",
        "make a list of ingredients needed",
        "copy list from website to paper",
        "put dry ingredients in one bowl",
        "prepare a meal"
    ],
    "prompt": "the dynamic prompt created for the task"",
        "reference_graph": {
        "title": "have bowl for cut ingredients",
        "num_steps": 7,
        "schema": [
            "step0: Move toward where work is to be done",
            "step1: Find the location of desired bowl",
            "step2: Grab the bowl firmly",
            "step3: Set the bowl down",
            "step4: Walk toward the bowls location",
            "step5: get out ingredients",
            "step6: have bowl for cut ingredients"
        ],
        "relations": [
            "step1 -> step4",
            "step4 -> step2",
            "step2 -> step0",
            "step0 -> step3",
            "step3 -> step6",
            "step5 -> step1"
        ]
    },
    "reference_code": "class HaveBowlForCutIngredients:\n\n    title = \"have bowl for cut ingredients\"\n    steps = 7\n\n    def step0(self):\n        return \"Move toward where work is to be done\"\n\n    def step1(self):\n        return \"Find the location of desired bowl\"\n\n    def step2(self):\n        return \"Grab the bowl firmly\"\n\n    def step3(self):\n        return \"Set the bowl down\"\n\n    def step4(self):\n        return \"Walk toward the bowls location\"\n\n    def step5(self):\n        return \"get out ingredients\"\n\n    def step6(self):\n        return \"have bowl for cut ingredients\"\n\n    def get_relations(self):\n        return [\n            \"step1 -> step4\",\n            \"step4 -> step2\",\n            \"step2 -> step0\",\n            \"step0 -> step3\",\n            \"step3 -> step6\",\n            \"step5 -> step1\",\n        ]\n",
    "input_prompt_code": "class HaveBowlForCutIngredients:\n\n    title = \"have bowl for cut ingredients\"\n    steps = 7\n\n    def step0(self):\n        return \"Move toward where work is to be done\"\n\n    def step1(self):\n        return \"Find the location of desired bowl\"\n\n    def step2(self):\n        return \"Grab the bowl firmly\"\n\n    def step3(self):\n        return \"Set the bowl down\"\n\n    def step4(self):\n        return \"Walk toward the bowls location\"\n\n    def step5(self):\n        return \"get out ingredients\"\n\n    def step6(self):\n        return \"have bowl for cut ingredients\"\n\n",
    "codex_response": {
        "id": "",
        "object": "text_completion",
        "created": ,
        "model": "code-davinci:002",
        "choices": [
            {
                "text": "    def get_relations(self):\n        return [\n            \"step4 -> step1\",\n            \"step1 -> step2\",\n            \"step2 -> step3\",\n            \"step3 -> step6\",\n            \"step5 -> step0\",\n            \"step5 -> step4\",\n        ]\n",
                "index": 0,
                "logprobs": null,
                "finish_reason": "stop"
            }
        ]
    },
    "generated_code": "class HaveBowlForCutIngredients:\n\n    title = \"have bowl for cut ingredients\"\n    steps = 7\n\n    def step0(self):\n        return \"Move toward where work is to be done\"\n\n    def step1(self):\n        return \"Find the location of desired bowl\"\n\n    def step2(self):\n        return \"Grab the bowl firmly\"\n\n    def step3(self):\n        return \"Set the bowl down\"\n\n    def step4(self):\n        return \"Walk toward the bowls location\"\n\n    def step5(self):\n        return \"get out ingredients\"\n\n    def step6(self):\n        return \"have bowl for cut ingredients\"\n\n    def get_relations(self):\n        return [\n            \"step4 -> step1\",\n            \"step1 -> step2\",\n            \"step2 -> step3\",\n            \"step3 -> step6\",\n            \"step5 -> step0\",\n            \"step5 -> step4\",\n        ]\n",
    "generated_graph": {
        "title": "have bowl for cut ingredients",
        "num_steps": 7,
        "schema": [
            "step0: Move toward where work is to be done",
            "step1: Find the location of desired bowl",
            "step2: Grab the bowl firmly",
            "step3: Set the bowl down",
            "step4: Walk toward the bowls location",
            "step5: get out ingredients",
            "step6: have bowl for cut ingredients"
        ],
        "relations": [
            "step4 -> step1",
            "step1 -> step2",
            "step2 -> step3",
            "step3 -> step6",
            "step5 -> step0",
            "step5 -> step4"
        ]
    },
    "elapsed_time": 0.0003092289
}

End-to-end workflow

To give an overview of our end-to-end workflow, we provide all the files required to run propara end-to-end.

Step 1: Create task files

python src/prompting/make_task_file.py --inpath data/propara/train.json --outpath data/propara/code/func/train.jsonl --job_type "propara-func"

python src/prompting/make_task_file.py --inpath data/propara/test.json --outpath data/propara/code/func/test.jsonl --job_type "propara-func"

python src/prompting/make_task_file.py --inpath data/propara/dev.json --outpath data/propara/code/func/dev.jsonl --job_type "propara-func"

Step 2: Create the prompt

python src/prompting/make_codex_prompt.py data/propara/code/func/train.jsonl 6 > data/propara/code/func/prompt.txt && cat data/propara/code/func/prompt.txt|it2copy

Step 3: Run inference

python -u src/codexapi/query_openai_over_tasks.py --task_file_path data/propara/code/func/test.jsonl --num_tasks -1 --output_file_path data/propara/code/func/test_output.jsonl --prompt_path data/propara/code/func/prompt.txt --job_type "propara-func" --engine code-davinci-002 --max_requests_per_min 10  --max_tokens 800

The max tokens is set to 800 (vs. 280 default) because the generations are long. The prompt fits about 5 examples.

Step 4: Evaluate

python src/eval/propara/eval_results.py --raw_output_file data/propara/code/func/test_output.jsonl --output_file  data/propara/predictions/test_output.json 

Evaluate data/propara/code/func/test_output.jsonl

Creating dynamic prompts

Note that when we run a job with a fixed random prompt, the prompt_path is supplied as an argument to query_openai_over_tasks.py. In case of dynamic prompts, we will create a prompt for each test example, and store it in the task jsonl file with each example in the prompt field.
In summary, we will call query_openai_over_tasks.py as before, but using a different task file and without specifying a prompt_path.
To create this file, we use the following command:

python src/prompting/knnprompt/make_knn_like_prompt.py\
       --train_file_path ${TRAIN_EXAMPLES_PATH}\
       --test_file_path ${TEST_FILE_PATH} \
       --k $K 
       --output_file_path ${KNN_TEST_FILE_PATH} \
       --query_field ${QFIELD}