Project Board

TODOs

Implement HuggingFace/Llama experiment module
Implement OpanAI experiment module

10-6-23 Pivot

2 real world motivations

C to Rust for memory safety
Web front-end framework update. New web front-end frameworks appear every few month, and old frameworks also keep getting big updates. We could implement the same webpage with different frameworks in many languages and each one would have their own advantages. As a web project grow, the original framework it was built on may not be advantages anymore, so it could be very helpful if we can translate the codebase from one web framework to another while keeping the same functionality.

Current state of LLM Code Translation

Understanding the Effectiveness of Large Language Models in Code Translation seems to be a good exploration of current use of LLM in this field. From this paper we can see:
- LLM usage is very basic. Mostly, people just prompt with some task description + minimal prompt engineering. This can also be seen from the leaderboards of some code translation datasets. Generally, code translation seems to be a way to show the ability of a multi-lingual code generation model, and not much work is focused on optimizing LLM sampling /generation/prompt-engineering for this specific task.
- LLM makes uncharacteristically trivial errors. One of the findings in this paper is that among all the failed translation attempts by LLMs, most of them failed due to compilation error, even for the models that performs extremely well like GPT-4 (~47% success rate). They would make mistakes like violation of semantic restrictions and dependency errors that are not often seen when I was trying ChatGPT on coding questions. From the paper we see most bugs can be trace back to the difference between languages. By following the code from source language, the LLM can forgot about the rules in the target language. (e.g. ll = 1 works in go but ll ll = 1 would not work in C++ since ll is a reserved keyword)

Potential method

Base on the findings from this paper, using stochastic sampling with a high level planner seems a promising direction
We could use MCMC-like sampling methods to condition each LLM translation on (original code + high-level plan + semantic rules). This would allow LLM to keep follow the source code, but not so blindly that it makes trivial errors in target language.

Dataset

There is some existing dataset like CodeXGLUE
We could also use any benchmark, dataset, or code base with test cases and just run the tests in the target language. The "Understanding" paper used dataset like "EvalPlus", and projects with APIs for command-line processing including Apache Commons CLI(Java), Click (Python)

Proposed Nextstep

Expanding literature search in LLM code translation. See what is the method used by the most current state of art.
Try LLM code translation on ChatGPT and Llama, and see if they indeed make trivial errors.
Test if condition LLM on high level plan by simply adding high level description to translation prompt can improve translation result.
Find suitable C or web codebase/dataset that can be used as benchmark.

10-12-23 Experiments

Verify: Does LLM make more trivial mistakes when translating?

Naive code translate attempt(Chat-GPT)
- Translating 50 APPS python solutions to rust
- Result:
  - compile@1: 62.5%, compile@any: 92%
  - pass@1: 42.6%, pass@any: 70%
Code translation with Chat-GPT code summary attempt(Chat-GPT)
- Translating 50 APPS python solutions to rust
- Provided with summary of source code by Chat-GPT
- Result:
  - compile@1: 44.7%, compile@any: 100%
  - pass@1: 28.3%, pass@any:70%
Directly solve APPS attempt (Chat-GPT)
- Giving Chat-GPT APPS task prompts directly
- Result:
  - compile@1: 81.8%, compile@any: 100%
  - pass@1: 18.6%, pass@any:56%

Appendix

Translation with summary prompt example

Translate the following python program using the given python code and summary of program behavior

SUMMARY:

This code prompts the user to enter an integer and then enters a loop based on that integer. Within the loop, the code prompts the user to enter another integer and creates empty lists and sets to store various values.

The code then enters another loop based on the second integer entered. Within this loop, it prompts the user to enter a string and appends it to the 'mass' list. It also checks the first and last character of the string to determine if it matches certain conditions. Based on the conditions, it adds the string to specific sets and lists, and increments different counters.

After the second loop, the code checks if certain conditions are met and either prints -1 and continues to the next iteration of the loop, or proceeds to calculate and print a value based on the counters and sets/lists created earlier. The code then enters another loop using the calculated value and performs calculations based on the contents of the sets/lists to create another list containing certain values. Finally, it prints the contents of the list.

CODE:

for _ in range(int(input())):
    n = int(input())
    mass = []
    zo = 0
    oz = 0
    zz = 0
    oo = 0
    ozs = []
    zos = []
    ozss = set()
    zoss = set()
    for j in range(n):
        k = input()
        mass.append(k)
        if k[0] == '0' and k[-1] == '1':
            zoss.add(k)
            zos.append(j + 1)
            zo += 1
        elif k[0] == '1' and k[-1] == '0':
            ozss.add(k)
            ozs.append(j + 1)
            oz += 1
        elif k[0] == '0' and k[-1] == '0':
            zz += 1
        else:
            oo += 1
    if zz and oo and not oz and not zo:
        print(-1)
        continue
    else:
        if zo > oz:
            print((zo - oz) // 2)
            ans = []
            need = (zo - oz) // 2
            i = 0
            while need:
                zzz = mass[zos[i] - 1][len(mass[zos[i] - 1]) - 1:: -1]
                if zzz not in ozss:
                    ans.append(zos[i])
                    need -= 1
                i += 1
            print(*ans)
        else:
            print((oz - zo) // 2)
            ans = []
            need = (oz - zo) // 2
            i = 0
            while need:
                zzz = mass[ozs[i] - 1][len(mass[ozs[i] - 1]) - 1:: -1]
                if zzz not in zoss:
                    ans.append(ozs[i])
                    need -= 1
                i += 1
            print(*ans)

10-20-23 Dataset Update

Challanges with realistic code datasets

Realistic C code tend to be large

LLM have limited context window
- LLAMA-2: 4k tokens
- GPT-3 2K
- GPT-4 32k
Most C libraries have much more tokens
[Impossible] Transpiling and testing part by part
- Rewrite one part of C program with rust code and run unit tests
  - Possible for unsafe rust using Rust FFI. However previous C -> Unsafe Rust attempts exist and are successful.
  - For safe Rust, many "unsafe" C behaviors effects program globally(e.g. dangling pointers in some queue implementation), so impossible to transpile part of C program to Safe Rust
Thus, our method must learn to translate code larger than their context window.
- We test by translating full C programs/libraries, and run all test cases with Rust FFI.

Code could be seen by LLM during training

All code not handwritten specifically to create such a dataset could be uploaded on line and seen by LLM during training.
Past NLP benchmarks either ask people to write questions, or just ignore the possibility of questions being seen by LLM
We may need to find ways to verify testing code is equivalent to handwritten code to LLM
- Potentially by checking LLM confidence in given code or wether LLM has memorized given code

Potential Next Steps

Start with testing using minimal C libraries like:
- klib: implementation for basic data structure & functionalities (e.g. hashing, sorting, dynamic array, & string manipulation)
- cdsa: implementation of simple data structures including list, queue, stack.
Begin brainstorming method to process code larger than LLM context window.

Examples: Queue implementation

From cdsa

#include <assert.h>
#include <stddef.h>

struct Queue;
struct QueueNode;

typedef struct Queue Queue;
typedef struct QueueNode QueueNode;

struct Queue {
    QueueNode *head;
    QueueNode *tail;
    size_t size;
};

struct QueueNode {
    QueueNode *next;
};

void queue_init(Queue *queue) {
    assert(queue);

    queue->head = NULL; // Init with dangling pointer
    queue->tail = NULL;
    queue->size = 0;
}

QueueNode* queue_peek(const Queue *queue) {
    assert(queue);

    return queue->head;
}

size_t queue_size(const Queue *queue) {
    assert(queue);

    return queue->size;
}

int queue_empty(const Queue *queue) {
    assert(queue);

    return queue->size == 0;
}

void queue_push(Queue *queue, QueueNode *node) {
    assert(queue && node);

    if (!queue->head) {
        queue->head = node;
    } else {
        queue->tail->next = node;
    }

    queue->tail = node;
    node->next = NULL;

    ++queue->size;
}

QueueNode *queue_pop(Queue *queue) {
    QueueNode *n;

    assert(queue);

    n = queue->head;

    if (!n) {
        return NULL;
    }

    if (!n->next) {
        queue->tail = NULL;
    }

    queue->head = n->next;
    n->next = ((QueueNode*) 0x100);

    --queue->size;

    return n;
}

void queue_remove_all(Queue *queue) {
    assert(queue);

    if (queue->head) {
        queue->head->next = ((QueueNode*) 0x100);
        queue->tail->next = ((QueueNode*) 0x100);
    }

    queue->head = NULL;
    queue->tail = NULL;
    queue->size = 0;
}

Safe Rust Implementation From Blog

#![allow(unused)]
fn main() {
struct Queue<T> {
  queue: Vec<T>,
}

impl<T> Queue<T> {
  fn new() -> Self {
    Queue { queue: Vec::new() }
  }

  fn queue_size(&self) -> usize {
    self.queue.len()
  }

  fn queue_push(&mut self, item: T) {
    self.queue.push(item)
  }

  fn queue_pop(&mut self) -> T {
    self.queue.remove(0)
  }
  fn queue_empty(&self) -> bool {
    self.queue.is_empty()
  }

  fn queue_peek(&self) -> Option<&T> {
    self.queue.first()
  }
}
}

Rust Translation by Chat-GPT-3.5

Given cdsa code and "Translate this c code to safe rust, keep the function names"

Resulting function names are wrong and unsafe.

#![allow(unused)]
fn main() {
use std::ptr;

pub struct Queue {
    head: *mut QueueNode,
    tail: *mut QueueNode,
    size: usize,
}

pub struct QueueNode {
    next: *mut QueueNode,
}

impl Queue {
    pub fn new() -> Self {
        Queue {
            head: ptr::null_mut(),
            tail: ptr::null_mut(),
            size: 0,
        }
    }

    pub fn peek(&self) -> Option<*mut QueueNode> {
        if self.head.is_null() {
            None
        } else {
            Some(self.head)
        }
    }

    pub fn size(&self) -> usize {
        self.size
    }

    pub fn is_empty(&self) -> bool {
        self.size == 0
    }

    pub fn push(&mut self, node: *mut QueueNode) {
        if self.head.is_null() {
            self.head = node;
        } else {
            unsafe {
                (*self.tail).next = node;
            }
        }

        self.tail = node;
        unsafe {
            (*node).next = ptr::null_mut();
        }

        self.size += 1;
    }

    pub fn pop(&mut self) -> Option<*mut QueueNode> {
        if self.head.is_null() {
            None
        } else {
            let n = self.head;

            if unsafe { (*n).next.is_null() } {
                self.tail = ptr::null_mut();
            }

            self.head = unsafe { (*n).next };
            unsafe {
                (*n).next = 0x100 as *mut QueueNode;
            }

            self.size -= 1;
            Some(n)
        }
    }

    pub fn remove_all(&mut self) {
        if !self.head.is_null() {
            let null_ptr = 0x100 as *mut QueueNode;
            unsafe {
                (*self.head).next = null_ptr;
                (*self.tail).next = null_ptr;
            }
        }

        self.head = ptr::null_mut();
        self.tail = ptr::null_mut();
        self.size = 0;
    }
}

}

10-26-23 Challenge in Rust2C

Dataset Continued

C test cases cannot be trusted to test Rust code
- Rust use Option (enum) instead of NULL pointer, but Option cannot be safely used in C
- Validity of Lifetime Annotations cannot be tested in C test cases
- Of all the pointers in Rust only reference with & and Box can be safely used in C
- Some C test cases tests the internal implementation details, which is impossible for Rust to align with
Thus, naively running C unit tests on translated Rust would not work
Possible solution
- Handwrite test cases
  - Cost to find enough people who knows Rust to provide code
- Only deal with CLI tools or whole programs that uses some I/O interface
  - Greatly limits possible code space
  - Cannot effectively measure how well C & Rust codes align with each other
- Find ways to automatically generate equivalent tests from C to rust
  - Ambiguity, though may be solved by listing assumptions at each translation
  - C does not have a universally used tests framework.

Human in the loop required

Ambiguity that cannot be inferred from C code alone

e.g. Choice of pointers

Given some struct defined using template in C

template <typename T> 
struct Node { 
  T *next; 
};

There could be many ways to implement that in Rust, each one has its own trade off.

// using reference
// lifetime has to be specified 
struct Node<'a T> { 
  next: Box<T>; 
};

// pointer with box
// immutable, owns the object
struct Node<T> { 
  next: Box<T>; 
};

// pointer with Cell
// mutable, T has to be Copy
struct Node<T> { 
  next: Cell<T>; 
};

// pointer with RefCell
// keep track of references in runtime
// could panic and could cause problems when dereferencing
struct Node<T> { 
  next: RefCell<T>; 
};

Some C code does not have equivalent Safe Rust implementation
- A general purpose doubly linked list is impossible in Safe Rust
  - std::collections::LinkedList is implemented with unsafe rust.
  - details: Learning Rust With Entirely Too Many Linked Lists
We must design a framework so user could resolve ambiguity and approve usage of unsafe rust
- Idea1: Using PPL to achieve constrained LLM generation
- Idea2: At each time-step, LLM generate a Plan/Rust code with holes and places for users to edit to resolve ambiguity

11-2-23 Baseline

Experiments Set-up

C++ (src_lang) to Rust (tar_lang).
Code from code contests dataset
Test translation correctness by running I/O test cases
100 C++ programs used as translation source in total
10 Rust translations for each C++ program
Using chatGPT as LLM

Techniques used

Naive Translate
- "Translate {src_lang} to {tar_lang}:{code}."
Polite Translation
- "Please translate the following {src_lang} code to {tar_lang} code:{code}`"
Chain of Thought version 1
- "First, understand the functionality of the following {src_lang} code. Then, translate the {src_lang} code to {tar_lang} while maintaining the same functionality, ensuring that the translated code can be successfully compiled. {code}
Chain of Thought version 2
- Use problem description as summary to aid translation
  - Problem description: {summary}. Please translate to {tar_lang} according to the following {src_lang} code and the description of the coding problem it solved: {code}
One-Shot (TODO)

Result

Naive Translate
- Compile success rate: 32.0%
- Translation success rate: 21.8%
Polite Translation
- Compile success rate: 33.0%
- Translation success rate: 16.4%
Chain of Thought version 1
- Compile success rate: 27.4%
- Translation success rate: 18.4%
Chain of Thought version 2
- Compile success rate: 40.9%
- Translation success rate: 3.9%

Translation Examples

Problem

Let's call a string a phone number if it has length 11 and fits the pattern "8xxxxxxxxxx", where each "x" is replaced by a digit.

For example, "80123456789" and "80000000000" are phone numbers, while "8012345678" and "79000000000" are not.

You have n cards with digits, and you want to use them to make as many phone numbers as possible. Each card must be used in at most one phone number, and you don't have to use all cards. The phone numbers do not necessarily have to be distinct.

Input

The first line contains an integer n the number of cards with digits that you have (1 <= n <= 100).

The second line contains a string of n digits (characters "0", "1", ..., "9") s_1, s_2, ..., s_n. The string will not contain any other characters, such as leading or trailing spaces.

Output

If at least one phone number can be made from these cards, output the maximum number of phone numbers that can be made. Otherwise, output 0.

Source C++

#include <bits/stdc++.h>
using namespace std;
int main() {
  int i, n, cnt = 0, temp;
  scanf("%d", &n);
  for (i = 1; i <= n; i++) {
    scanf("%01d", &temp);
    if (temp == 8) cnt++;
  }
  printf("%d\n", min(cnt, n / 11));
}

C2Rust (syntactic translation)

#![allow(dead_code, mutable_transmutes, non_camel_case_types, non_snake_case, non_upper_case_globals, unused_assignments, unused_mut)]
#![register_tool(c2rust)]
#![feature(register_tool)]
extern "C" {
    fn printf(_: *const libc::c_char, _: ...) -> libc::c_int;
    fn scanf(_: *const libc::c_char, _: ...) -> libc::c_int;
}
unsafe fn main_0() -> libc::c_int {
    let mut i: libc::c_int = 0;
    let mut n: libc::c_int = 0;
    let mut cnt: libc::c_int = 0 as libc::c_int;
    let mut temp: libc::c_int = 0;
    scanf(b"%d\0" as *const u8 as *const libc::c_char, &mut n as *mut libc::c_int);
    i = 1 as libc::c_int;
    while i <= n {
        scanf(
            b"%01d\0" as *const u8 as *const libc::c_char,
            &mut temp as *mut libc::c_int,
        );
        if temp == 8 as libc::c_int {
            cnt += 1;
        }
        i += 1;
    }
    printf(
        b"%d\n\0" as *const u8 as *const libc::c_char,
        min(cnt, n / 11 as libc::c_int),
    );
    return 0;
}
pub fn main() {
    unsafe { ::std::process::exit(main_0() as i32) }
}

Naive Translate

use std::cmp;

fn main() {
    let mut i: i32;
    let n: i32;
    let mut cnt: i32 = 0;
    let mut temp: i32;
    scanf("%d", &n);
    for i in 1..=n {
        scanf("%01d", &temp);
        if temp == 8 {
            cnt += 1;
        }
    }
    printf("%d\n", cmp::min(cnt, n / 11));
}

Error: printf not used in rust have to use print! macro. scanf is also not defined

Polite Translation (Does Not Compile)

use std::io::{self, Read};

fn main() -> io::Result<()> {
    let mut input = String::new();
    io::stdin().read_line(&mut input)?;
    let n: usize = input.trim().parse()?;
    
    let mut cnt = 0;
    let mut temp = String::new();
    for _ in 0..n {
        io::stdin().read_line(&mut temp)?;
        let temp: i32 = temp.trim().parse()?;
        if temp == 8 {
            cnt += 1;
        }
    }
    println!("{}", cnt.min(n / 11));
    
    Ok(())
}

Error: From return type only std::io::Error is accepted but parse()? will return ParseIntError which cannot be cast to std::io::Error

Chain of Thought version 1 (Compiles But Tests Failed)

use std::io::{self, Read};

fn main() {
    let mut input = String::new();
    io::stdin().read_to_string(&mut input).unwrap();
    
    let n: usize = input.trim().parse().unwrap();
    let mut cnt = 0;
    
    for _ in 0..n {
        let mut temp = String::new();
        io::stdin().read_line(&mut temp).unwrap();
        let temp: i32 = temp.trim().parse().unwrap();
        
        if temp == 8 {
            cnt += 1;
        }
    }
    
    println!("{}", cnt.min(n as i32 / 11));
}

Error: read_line instead of reading just one character.

Chain of Thought version 2 (success)

use std::io::{self, BufRead};

fn main() {
    let stdin = io::stdin();
    let mut lines = stdin.lock().lines().map(|x| x.unwrap());
    
    let n: usize = lines.next().unwrap().parse().unwrap();
    let s: String = lines.next().unwrap();
    
    let mut cnt = 0;
    for ch in s.chars() {
        if ch == '8' {
            cnt += 1;
        }
    }
    
    let result = std::cmp::min(cnt, n / 11);
    println!("{}", result);
}

Research Log of Han