Project Board

TODOs

  • Implement HuggingFace/Llama experiment module
  • Implement OpanAI experiment module

In Progress

Finished

10-6-23 Pivot

2 real world motivations

  1. C to Rust for memory safety
  2. Web front-end framework update. New web front-end frameworks appear every few month, and old frameworks also keep getting big updates. We could implement the same webpage with different frameworks in many languages and each one would have their own advantages. As a web project grow, the original framework it was built on may not be advantages anymore, so it could be very helpful if we can translate the codebase from one web framework to another while keeping the same functionality.

Current state of LLM Code Translation

  • Understanding the Effectiveness of Large Language Models in Code Translation seems to be a good exploration of current use of LLM in this field. From this paper we can see:
    • LLM usage is very basic. Mostly, people just prompt with some task description + minimal prompt engineering. This can also be seen from the leaderboards of some code translation datasets. Generally, code translation seems to be a way to show the ability of a multi-lingual code generation model, and not much work is focused on optimizing LLM sampling /generation/prompt-engineering for this specific task.
    • LLM makes uncharacteristically trivial errors. One of the findings in this paper is that among all the failed translation attempts by LLMs, most of them failed due to compilation error, even for the models that performs extremely well like GPT-4 (~47% success rate). They would make mistakes like violation of semantic restrictions and dependency errors that are not often seen when I was trying ChatGPT on coding questions. From the paper we see most bugs can be trace back to the difference between languages. By following the code from source language, the LLM can forgot about the rules in the target language. (e.g. ll = 1 works in go but ll ll = 1 would not work in C++ since ll is a reserved keyword)

Potential method

  • Base on the findings from this paper, using stochastic sampling with a high level planner seems a promising direction
  • We could use MCMC-like sampling methods to condition each LLM translation on (original code + high-level plan + semantic rules). This would allow LLM to keep follow the source code, but not so blindly that it makes trivial errors in target language.

Dataset

  • There is some existing dataset like CodeXGLUE
  • We could also use any benchmark, dataset, or code base with test cases and just run the tests in the target language. The "Understanding" paper used dataset like "EvalPlus", and projects with APIs for command-line processing including Apache Commons CLI(Java), Click (Python)

Proposed Nextstep

  • Expanding literature search in LLM code translation. See what is the method used by the most current state of art.
  • Try LLM code translation on ChatGPT and Llama, and see if they indeed make trivial errors.
  • Test if condition LLM on high level plan by simply adding high level description to translation prompt can improve translation result.
  • Find suitable C or web codebase/dataset that can be used as benchmark.

10-12-23 Experiments

Verify: Does LLM make more trivial mistakes when translating?

  • Naive code translate attempt(Chat-GPT)
    • Translating 50 APPS python solutions to rust
    • Result:
      • compile@1: 62.5%, compile@any: 92%
      • pass@1: 42.6%, pass@any: 70%
  • Code translation with Chat-GPT code summary attempt(Chat-GPT)
    • Translating 50 APPS python solutions to rust
    • Provided with summary of source code by Chat-GPT
    • Result:
      • compile@1: 44.7%, compile@any: 100%
      • pass@1: 28.3%, pass@any:70%
  • Directly solve APPS attempt (Chat-GPT)
    • Giving Chat-GPT APPS task prompts directly
    • Result:
      • compile@1: 81.8%, compile@any: 100%
      • pass@1: 18.6%, pass@any:56%

Appendix

Translation with summary prompt example

Translate the following python program using the given python code and summary of program behavior

SUMMARY:

This code prompts the user to enter an integer and then enters a loop based on that integer. Within the loop, the code prompts the user to enter another integer and creates empty lists and sets to store various values.

The code then enters another loop based on the second integer entered. Within this loop, it prompts the user to enter a string and appends it to the 'mass' list. It also checks the first and last character of the string to determine if it matches certain conditions. Based on the conditions, it adds the string to specific sets and lists, and increments different counters.

After the second loop, the code checks if certain conditions are met and either prints -1 and continues to the next iteration of the loop, or proceeds to calculate and print a value based on the counters and sets/lists created earlier. The code then enters another loop using the calculated value and performs calculations based on the contents of the sets/lists to create another list containing certain values. Finally, it prints the contents of the list.

CODE:

for _ in range(int(input())):
    n = int(input())
    mass = []
    zo = 0
    oz = 0
    zz = 0
    oo = 0
    ozs = []
    zos = []
    ozss = set()
    zoss = set()
    for j in range(n):
        k = input()
        mass.append(k)
        if k[0] == '0' and k[-1] == '1':
            zoss.add(k)
            zos.append(j + 1)
            zo += 1
        elif k[0] == '1' and k[-1] == '0':
            ozss.add(k)
            ozs.append(j + 1)
            oz += 1
        elif k[0] == '0' and k[-1] == '0':
            zz += 1
        else:
            oo += 1
    if zz and oo and not oz and not zo:
        print(-1)
        continue
    else:
        if zo > oz:
            print((zo - oz) // 2)
            ans = []
            need = (zo - oz) // 2
            i = 0
            while need:
                zzz = mass[zos[i] - 1][len(mass[zos[i] - 1]) - 1:: -1]
                if zzz not in ozss:
                    ans.append(zos[i])
                    need -= 1
                i += 1
            print(*ans)
        else:
            print((oz - zo) // 2)
            ans = []
            need = (oz - zo) // 2
            i = 0
            while need:
                zzz = mass[ozs[i] - 1][len(mass[ozs[i] - 1]) - 1:: -1]
                if zzz not in zoss:
                    ans.append(ozs[i])
                    need -= 1
                i += 1
            print(*ans)

10-20-23 Dataset Update

Challanges with realistic code datasets

Realistic C code tend to be large

  • LLM have limited context window
    • LLAMA-2: 4k tokens
    • GPT-3 2K
    • GPT-4 32k
  • Most C libraries have much more tokens
  • [Impossible] Transpiling and testing part by part
    • Rewrite one part of C program with rust code and run unit tests
      • Possible for unsafe rust using Rust FFI. However previous C -> Unsafe Rust attempts exist and are successful.
      • For safe Rust, many "unsafe" C behaviors effects program globally(e.g. dangling pointers in some queue implementation), so impossible to transpile part of C program to Safe Rust
  • Thus, our method must learn to translate code larger than their context window.
    • We test by translating full C programs/libraries, and run all test cases with Rust FFI.

Code could be seen by LLM during training

  • All code not handwritten specifically to create such a dataset could be uploaded on line and seen by LLM during training.
  • Past NLP benchmarks either ask people to write questions, or just ignore the possibility of questions being seen by LLM
  • We may need to find ways to verify testing code is equivalent to handwritten code to LLM
    • Potentially by checking LLM confidence in given code or wether LLM has memorized given code

Potential Next Steps

  • Start with testing using minimal C libraries like:
    • klib: implementation for basic data structure & functionalities (e.g. hashing, sorting, dynamic array, & string manipulation)
    • cdsa: implementation of simple data structures including list, queue, stack.
  • Begin brainstorming method to process code larger than LLM context window.

Examples: Queue implementation

From cdsa

#include <assert.h>
#include <stddef.h>

struct Queue;
struct QueueNode;

typedef struct Queue Queue;
typedef struct QueueNode QueueNode;

struct Queue {
    QueueNode *head;
    QueueNode *tail;
    size_t size;
};

struct QueueNode {
    QueueNode *next;
};

void queue_init(Queue *queue) {
    assert(queue);

    queue->head = NULL; // Init with dangling pointer
    queue->tail = NULL;
    queue->size = 0;
}

QueueNode* queue_peek(const Queue *queue) {
    assert(queue);

    return queue->head;
}

size_t queue_size(const Queue *queue) {
    assert(queue);

    return queue->size;
}

int queue_empty(const Queue *queue) {
    assert(queue);

    return queue->size == 0;
}

void queue_push(Queue *queue, QueueNode *node) {
    assert(queue && node);

    if (!queue->head) {
        queue->head = node;
    } else {
        queue->tail->next = node;
    }

    queue->tail = node;
    node->next = NULL;

    ++queue->size;
}

QueueNode *queue_pop(Queue *queue) {
    QueueNode *n;

    assert(queue);

    n = queue->head;

    if (!n) {
        return NULL;
    }

    if (!n->next) {
        queue->tail = NULL;
    }

    queue->head = n->next;
    n->next = ((QueueNode*) 0x100);

    --queue->size;

    return n;
}

void queue_remove_all(Queue *queue) {
    assert(queue);

    if (queue->head) {
        queue->head->next = ((QueueNode*) 0x100);
        queue->tail->next = ((QueueNode*) 0x100);
    }

    queue->head = NULL;
    queue->tail = NULL;
    queue->size = 0;
}

Safe Rust Implementation From Blog

#![allow(unused)]
fn main() {
struct Queue<T> {
  queue: Vec<T>,
}

impl<T> Queue<T> {
  fn new() -> Self {
    Queue { queue: Vec::new() }
  }

  fn queue_size(&self) -> usize {
    self.queue.len()
  }

  fn queue_push(&mut self, item: T) {
    self.queue.push(item)
  }

  fn queue_pop(&mut self) -> T {
    self.queue.remove(0)
  }
  fn queue_empty(&self) -> bool {
    self.queue.is_empty()
  }

  fn queue_peek(&self) -> Option<&T> {
    self.queue.first()
  }
}
}

Rust Translation by Chat-GPT-3.5

Given cdsa code and "Translate this c code to safe rust, keep the function names"

Resulting function names are wrong and unsafe.

#![allow(unused)]
fn main() {
use std::ptr;

pub struct Queue {
    head: *mut QueueNode,
    tail: *mut QueueNode,
    size: usize,
}

pub struct QueueNode {
    next: *mut QueueNode,
}

impl Queue {
    pub fn new() -> Self {
        Queue {
            head: ptr::null_mut(),
            tail: ptr::null_mut(),
            size: 0,
        }
    }

    pub fn peek(&self) -> Option<*mut QueueNode> {
        if self.head.is_null() {
            None
        } else {
            Some(self.head)
        }
    }

    pub fn size(&self) -> usize {
        self.size
    }

    pub fn is_empty(&self) -> bool {
        self.size == 0
    }

    pub fn push(&mut self, node: *mut QueueNode) {
        if self.head.is_null() {
            self.head = node;
        } else {
            unsafe {
                (*self.tail).next = node;
            }
        }

        self.tail = node;
        unsafe {
            (*node).next = ptr::null_mut();
        }

        self.size += 1;
    }

    pub fn pop(&mut self) -> Option<*mut QueueNode> {
        if self.head.is_null() {
            None
        } else {
            let n = self.head;

            if unsafe { (*n).next.is_null() } {
                self.tail = ptr::null_mut();
            }

            self.head = unsafe { (*n).next };
            unsafe {
                (*n).next = 0x100 as *mut QueueNode;
            }

            self.size -= 1;
            Some(n)
        }
    }

    pub fn remove_all(&mut self) {
        if !self.head.is_null() {
            let null_ptr = 0x100 as *mut QueueNode;
            unsafe {
                (*self.head).next = null_ptr;
                (*self.tail).next = null_ptr;
            }
        }

        self.head = ptr::null_mut();
        self.tail = ptr::null_mut();
        self.size = 0;
    }
}

}

10-26-23 Challenge in Rust2C

Dataset Continued

  • C test cases cannot be trusted to test Rust code
    • Rust use Option (enum) instead of NULL pointer, but Option cannot be safely used in C
    • Validity of Lifetime Annotations cannot be tested in C test cases
    • Of all the pointers in Rust only reference with & and Box can be safely used in C
    • Some C test cases tests the internal implementation details, which is impossible for Rust to align with
  • Thus, naively running C unit tests on translated Rust would not work
  • Possible solution
    • Handwrite test cases
      • Cost to find enough people who knows Rust to provide code
    • Only deal with CLI tools or whole programs that uses some I/O interface
      • Greatly limits possible code space
      • Cannot effectively measure how well C & Rust codes align with each other
    • Find ways to automatically generate equivalent tests from C to rust
      • Ambiguity, though may be solved by listing assumptions at each translation
      • C does not have a universally used tests framework.

Human in the loop required

  • Ambiguity that cannot be inferred from C code alone
    • e.g. Choice of pointers
      • Given some struct defined using template in C

        template <typename T> 
        struct Node { 
          T *next; 
        }; 
        
      • There could be many ways to implement that in Rust, each one has its own trade off.

        // using reference
        // lifetime has to be specified 
        struct Node<'a T> { 
          next: Box<T>; 
        }; 
        
        // pointer with box
        // immutable, owns the object
        struct Node<T> { 
          next: Box<T>; 
        }; 
        
        // pointer with Cell
        // mutable, T has to be Copy
        struct Node<T> { 
          next: Cell<T>; 
        }; 
        
        // pointer with RefCell
        // keep track of references in runtime
        // could panic and could cause problems when dereferencing
        struct Node<T> { 
          next: RefCell<T>; 
        }; 
        
  • Some C code does not have equivalent Safe Rust implementation
  • We must design a framework so user could resolve ambiguity and approve usage of unsafe rust
    • Idea1: Using PPL to achieve constrained LLM generation
    • Idea2: At each time-step, LLM generate a Plan/Rust code with holes and places for users to edit to resolve ambiguity

11-2-23 Baseline

Experiments Set-up

  • C++ (src_lang) to Rust (tar_lang).
  • Code from code contests dataset
  • Test translation correctness by running I/O test cases
  • 100 C++ programs used as translation source in total
  • 10 Rust translations for each C++ program
  • Using chatGPT as LLM

Techniques used

  • Naive Translate
    • "Translate {src_lang} to {tar_lang}:{code}."
  • Polite Translation
    • "Please translate the following {src_lang} code to {tar_lang} code:{code}`"
  • Chain of Thought version 1
    • "First, understand the functionality of the following {src_lang} code. Then, translate the {src_lang} code to {tar_lang} while maintaining the same functionality, ensuring that the translated code can be successfully compiled. {code}
  • Chain of Thought version 2
    • Use problem description as summary to aid translation
      • Problem description: {summary}. Please translate to {tar_lang} according to the following {src_lang} code and the description of the coding problem it solved: {code}
  • One-Shot (TODO)

Result

  • Naive Translate
    • Compile success rate: 32.0%
    • Translation success rate: 21.8%
  • Polite Translation
    • Compile success rate: 33.0%
    • Translation success rate: 16.4%
  • Chain of Thought version 1
    • Compile success rate: 27.4%
    • Translation success rate: 18.4%
  • Chain of Thought version 2
    • Compile success rate: 40.9%
    • Translation success rate: 3.9%

Translation Examples

Problem

Let's call a string a phone number if it has length 11 and fits the pattern "8xxxxxxxxxx", where each "x" is replaced by a digit.

For example, "80123456789" and "80000000000" are phone numbers, while "8012345678" and "79000000000" are not.

You have n cards with digits, and you want to use them to make as many phone numbers as possible. Each card must be used in at most one phone number, and you don't have to use all cards. The phone numbers do not necessarily have to be distinct.

Input

The first line contains an integer n the number of cards with digits that you have (1 <= n <= 100).

The second line contains a string of n digits (characters "0", "1", ..., "9") s_1, s_2, ..., s_n. The string will not contain any other characters, such as leading or trailing spaces.

Output

If at least one phone number can be made from these cards, output the maximum number of phone numbers that can be made. Otherwise, output 0.

Source C++

#include <bits/stdc++.h>
using namespace std;
int main() {
  int i, n, cnt = 0, temp;
  scanf("%d", &n);
  for (i = 1; i <= n; i++) {
    scanf("%01d", &temp);
    if (temp == 8) cnt++;
  }
  printf("%d\n", min(cnt, n / 11));
}

C2Rust (syntactic translation)

#![allow(dead_code, mutable_transmutes, non_camel_case_types, non_snake_case, non_upper_case_globals, unused_assignments, unused_mut)]
#![register_tool(c2rust)]
#![feature(register_tool)]
extern "C" {
    fn printf(_: *const libc::c_char, _: ...) -> libc::c_int;
    fn scanf(_: *const libc::c_char, _: ...) -> libc::c_int;
}
unsafe fn main_0() -> libc::c_int {
    let mut i: libc::c_int = 0;
    let mut n: libc::c_int = 0;
    let mut cnt: libc::c_int = 0 as libc::c_int;
    let mut temp: libc::c_int = 0;
    scanf(b"%d\0" as *const u8 as *const libc::c_char, &mut n as *mut libc::c_int);
    i = 1 as libc::c_int;
    while i <= n {
        scanf(
            b"%01d\0" as *const u8 as *const libc::c_char,
            &mut temp as *mut libc::c_int,
        );
        if temp == 8 as libc::c_int {
            cnt += 1;
        }
        i += 1;
    }
    printf(
        b"%d\n\0" as *const u8 as *const libc::c_char,
        min(cnt, n / 11 as libc::c_int),
    );
    return 0;
}
pub fn main() {
    unsafe { ::std::process::exit(main_0() as i32) }
}

Naive Translate

use std::cmp;

fn main() {
    let mut i: i32;
    let n: i32;
    let mut cnt: i32 = 0;
    let mut temp: i32;
    scanf("%d", &n);
    for i in 1..=n {
        scanf("%01d", &temp);
        if temp == 8 {
            cnt += 1;
        }
    }
    printf("%d\n", cmp::min(cnt, n / 11));
}

Error: printf not used in rust have to use print! macro. scanf is also not defined

Polite Translation (Does Not Compile)

use std::io::{self, Read};

fn main() -> io::Result<()> {
    let mut input = String::new();
    io::stdin().read_line(&mut input)?;
    let n: usize = input.trim().parse()?;
    
    let mut cnt = 0;
    let mut temp = String::new();
    for _ in 0..n {
        io::stdin().read_line(&mut temp)?;
        let temp: i32 = temp.trim().parse()?;
        if temp == 8 {
            cnt += 1;
        }
    }
    println!("{}", cnt.min(n / 11));
    
    Ok(())
}

Error: From return type only std::io::Error is accepted but parse()? will return ParseIntError which cannot be cast to std::io::Error

Chain of Thought version 1 (Compiles But Tests Failed)

use std::io::{self, Read};

fn main() {
    let mut input = String::new();
    io::stdin().read_to_string(&mut input).unwrap();
    
    let n: usize = input.trim().parse().unwrap();
    let mut cnt = 0;
    
    for _ in 0..n {
        let mut temp = String::new();
        io::stdin().read_line(&mut temp).unwrap();
        let temp: i32 = temp.trim().parse().unwrap();
        
        if temp == 8 {
            cnt += 1;
        }
    }
    
    println!("{}", cnt.min(n as i32 / 11));
}

Error: read_line instead of reading just one character.

Chain of Thought version 2 (success)

use std::io::{self, BufRead};

fn main() {
    let stdin = io::stdin();
    let mut lines = stdin.lock().lines().map(|x| x.unwrap());
    
    let n: usize = lines.next().unwrap().parse().unwrap();
    let s: String = lines.next().unwrap();
    
    let mut cnt = 0;
    for ch in s.chars() {
        if ch == '8' {
            cnt += 1;
        }
    }
    
    let result = std::cmp::min(cnt, n / 11);
    println!("{}", result);
}