92 lines
		
	
	
		
			2.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			92 lines
		
	
	
		
			2.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| = Design Notes
 | |
| 
 | |
| == Problems:
 | |
| 
 | |
| Translating C to Go is harder than it looks.
 | |
| 
 | |
| Jan says: It's impossible in the general case to turn C char* into Go
 | |
| []byte.  It's possible to do it probably often for concrete C code
 | |
| cases - based also on author's C coding style. The first problem this
 | |
| runs into is that Go does not guarantee that the backing array will
 | |
| keep its address stable due to Go movable stacks. C expects the
 | |
| opposite, a pointer never magically modifies itself, so some code will
 | |
| fail.
 | |
| 
 | |
| INSERT CODE EXAMPLES ILLUSTRATING THE PROBLEM HERE
 | |
| 
 | |
| == How the parser works
 | |
| 
 | |
| There are no comment nodes in the C AST. Instead every cc.Token has a
 | |
| Sep field: https://godoc.org/modernc.org/cc/v3#Token
 | |
| 
 | |
| It captures, when configured to do so, all white space preceding the
 | |
| token, combined, including comments, if any. So we have all white
 | |
| space/comments information for every token in the AST. A final white
 | |
| space/comment, preceding EOF, is available as field TrailingSeperator
 | |
| in the AST: https://godoc.org/modernc.org/cc/v3#AST.
 | |
| 
 | |
| To get the lexically first white space/comment for any node, use
 | |
| tokenSeparator():
 | |
| https://gitlab.com/cznic/ccgo/-/blob/6551e2544a758fdc265c8fac71fb2587fb3e1042/v3/go.go#L1476
 | |
| 
 | |
| The same with a default value is comment():
 | |
| https://gitlab.com/cznic/ccgo/-/blob/6551e2544a758fdc265c8fac71fb2587fb3e1042/v3/go.go#L1467
 | |
| 
 | |
| == Looking forward
 | |
| 
 | |
| Eric says: In my visualization of how the translator would work, the
 | |
| output of a ccgo translation of a module at any given time is a file
 | |
| of pseudo-Go code in which some sections may be enclosed by a Unicode
 | |
| bracketing character (presently using the guillemot quotes U+ab and
 | |
| U+bb) meaning "this is not Go yet" that intentionally makes the Go
 | |
| compiler barf. This expresses a color on the AST nodes.
 | |
| 
 | |
| So, for example, if I'm translating hello.c with a ruleset that does not
 | |
| include print -> fmt.Printf, this:
 | |
| 
 | |
| ---------------------------------------------------------
 | |
| #include <stdio>
 | |
| 
 | |
| /* an example comment */
 | |
| 
 | |
| int main(int argc, char *argv[])
 | |
| {
 | |
|     printf("Hello, World")
 | |
| }
 | |
| ---------------------------------------------------------
 | |
| 
 | |
| becomes this without any explicit rules at all:
 | |
| 
 | |
| ---------------------------------------------------------
 | |
| «#include <stdio>»
 | |
| 
 | |
| /* an example comment */
 | |
| 
 | |
| func main
 | |
| {
 | |
| 	«printf(»"Hello, World"!\n"«)»
 | |
| }
 | |
| ---------------------------------------------------------
 | |
| 
 | |
| Then, when the rule print -> fmt.Printf is added, it becomes
 | |
| 
 | |
| ---------------------------------------------------------
 | |
| import (
 | |
|         "fmt"
 | |
| )
 | |
| 
 | |
| /* an example comment */
 | |
| 
 | |
| func main
 | |
| {
 | |
| 	fmt.Printf("Hello, World"!\n")
 | |
| }
 | |
| ---------------------------------------------------------
 | |
| 
 | |
| because with that rule the AST node corresponding to the printf
 | |
| call can be translated and colored "Go".  This implies an import
 | |
| of fmt.  We observe that there are no longer C-colored spans
 | |
| and drop the #includes.
 | |
| 
 | |
| // end
 |