Intro#

Assembly is a tough language to write in. It is both excessively verbose and obstruse, which means it requires detailed and clear comments. While writing ROMS for the GameBoy I often found myself frustrated at the amount of comments that I had to write and at how they looked. Proper formatting would improve the readability, but would take time. I thought the problem of formatting text to match some kind of style guide was interesting in its own right, so I diverted and wrote a formatter to be used as a part of my ROM build process.

An artist's rendering of Nintendo developers trying to figure out what the intern did to their codebase

Approach#

The idea was to scan the code and look for sets of “features”. For my purposes, a feature was basically one of two things

  1. A set of consecutive lines of code (a code group) with comments that are all indented to the same level.
  2. A set of consecutive lines of comments (a comment group) that are all indented to the same level.

The script takes note of the locations of these features, as well as some additional information such as the length and the existence of styling features which would indicate that this code had already been formatted. The reason for identifying if code has already been formatted is a practical one. Often times I make a quick change to an assembly file when debugging, and then need to reassemble it in order to test my change. If the action of the formatter on a piece of text is not idempotent, then that text will get messier the more builds I do.

After identifying features, the formatter then does a second pass of the code, and formats groups that are identified as being unformatted. Specifically:

  1. In the case of code groups, all comments will either be indented to the maximum indent level of a comment in the group, or to the global document maximum.

  2. In the case of comment groups, a border will be placed around the comment.

An example of what I mean.

	cp a, $20 ; Is the pointer on the first book?
	jr nz, .UpdateBook2Status ; If not, check the second


	; this comment breaks up the code

	ld a, c ; Get the return code of make_line_blink to see if something had broken
	jr z, .UpdateBookSeventeenStatusToRead ; Update the status to read

becomes

	cp a, $20                 ; Is the pointer on the first book?
	jr nz, .UpdateBook2Status ; If not, check the second


	;--------------------------------;
	; this comment breaks up the code;
	;--------------------------------;

	ld a, c                                ; Get the return code of make_line_blink to see if something had broken
	jr z, .UpdateBookSeventeenStatusToRead ; Update the status to read

Style Specification#

From the code comments:

    1) If a comment is proceeded by code, that comment must be indented to the local or global comment level
    1.1) Whitespace is the main determinant for blocks of code. Newlines will not cause a code group to be split
    2) If the comment is not proceeded by code, it must be surrounded like so

    ;-------------------;
    ; This is a comment ;
    ; nice, huh?        ;
    ;-------------------;
    
    2.1) Must start with a ;-...-;
    2.2) lines with text must start and end with a ;
    2.3) must end with a ;-...-;
    2.4) Whitespace before the leading semicolon, and after the semicolon but proceeding any text is preserved as is
    2.5) Whitespace is the main determinant for block comments. Different whitespace arrangements will lead to different block comments being formatted

Using asmfmt#

ASMFMT offers the following options for use:

python FormatAsm.py -h
usage: FormatAsm.py [-h] [-o [OUTPUT [OUTPUT ...]]] [-g] [input [input ...]]

Format an ASM file.

positional arguments:
  input                 Location(s) of the ASM file(s) to be imported

optional arguments:
  -h, --help            show this help message and exit
  -o [OUTPUT [OUTPUT ...]], --output [OUTPUT [OUTPUT ...]]
                        Location(s) of output ASM file(s)
  -g, --global_indent   Adjust comments at the end of code lines to a global
                        indent level

If -o is not provided, the script will format the files provided in place.

The source code for asmfmt can be found on GitHub here.

Future work#

I would rewrite this in a more object oriented fashion. Feature specifications would contain their match patterns as a property, and would return a string containing the formatted feature. This would make extending the formatter easier.

While writing the script I also encountered a lot of edge cases, mostly caused by a feature occurring at the end of a document. Not handling these will break the idempotency of the script, but I’d like to implement the fixes for these in a neater fashion.

Appendix#

  1. Using the psutils package to see what files are open in a Python process. I ran into issues where the temporary files created were being held open by Python. This was confusing at first, as I was using the with open(…) idiom that Python offers in order to deal with all files. It turns out that the tempfile.mkstemp method returns an open file handle along with the name of the file that needs to be closed as well. Overall a silly mistake that was fixed by re-reading the docs, but I did come across the psutil package while debugging which provided a handy way to see what files were open at a given point in the script. Shoutout to this stack overflow post.
for process in psutil.process_iter():
	print(process.open_files())
  1. Other text formatters. Links to other tools to make your code less annoying to read, which served as inspiration for this project:
    • Gofmt: A formatter specifically for Go.
    • Black: A formatter for Python, designed as a unix command line tool.