HW1 FAQ
 
New content will be added at the top
 
I have a doubt regarding dealing with the string literals.
 
String literals are supposed to start with a " ( double quotes) and end
with another " (double quote ).
 
it may not contain any " (double quote) within.
 
But this case would not occur because, once a (") double quote is
encountered, the string literal is considered to be complete. 
 
For example,
  
 let b = "asdf "efghtids" ; let x = 5; let y = "basd";
 
 would read "asdf " as a string lit, efghids as a ident and " ; let x = 5;
 let y = "  as another string lit
 
Is my understanding right ??
 
 
You are correct in what the scanner should do--although you would eventually give you an error because it would reach the eof before terminating the
string started with the last " and containing only ;.  It is true that you can't put a " (without escaping it at \") in the middle of a string_lit.  The scanner does not, though, ever explicitly detect that you put a " in the middle of  a string_lit--it just gives the wrong tokenization for the rest of the input.
 
 
 
How can I take back and resubmit an assignment before the due date?
 
In the toolbar go to Assignments. Then choose the submitted tab.  This shows a list of your 
submitted assignments.  There should be a (very non-intuitive) icon that looks like a hand 
holding a paper on the far right.  Click that to take back the assignment.  After you have 
taken it back, you can see it in the inbox tab.
 
Our grammer states:
int_lit ::= 0 but cannot be followed by an ident_part | nonzero_digit
digit*
 
With this grammer
0hello would cause an error
but 1hello would be an int_lit (1) followed by an ident(hello).
 
 
Is this correct or is an ident_part never allowed to directly follow an
int_lit?
 
As specified, 0hello is an error, but 1hello is an intlit followed by an
ident.  This is, of course, not a very good design decision (and in fact
was an oversight) but we'll leave it as specified.
 
 
In Lecture 3 slide 2 you were discussing how to handle reserved words.
You gave the example:
ifa is one token, an ident while if3 is two tokens, the reserved word if
followed by the numlit 3.
 
Why is that true? I don't understand why if3 would not also be just an
ident because 3 is still an ident_part and I thought we were to create the
longest possible token match (ie classtest would be the ident classtest
and not the reserved word class followed by the ident test)
 
You're right.  A correct example would have been
 
ifa is one token, an ident while if+ is two tokens, the reserved word if
followed by the PLUS token.
 
Thanks for reading so carefully!
 
 
Are we able to submit our assignment multiple times?
For example, if I submit my assignment early and then later realize it had
a mistake, can I fix the mistake and resubmit if it is still before the
due date? (I have not done this for this assignment but it has happened in
other classes). 
 
Yes, you can resubmit before the due date.  
 
 
If a keyword is found, but it is not lower case as specified in the
lexical structure, should that result in an ErrorToken?  Or is it a valid IdentToken?
 
It should be an IdentToken.  If it follows the rules for being an ident, unless it exactly matches a keyword, it is an ident.
 
 
 
If a multiple character token is found, should the character and line
position returned correspond to the first or last character?
 
It should be the position of the first character—you’ll need to save the position when you are in the start state.

 

 

What is the kind(enumeration) of the binary operators '^' and
'v' in the Token class and how do we distinguish between the
character 'v' and the binary operator 'v' in the language?
 
It's not '^' but '/\', and not 'v' but '\/'. Those are two characters composed of two char slash and backslash. Also, among the separators, '.' is dot char and '->'(Arrow) is minus with '>'.

 

It is not clear to me what the difference between ErrorToken class

errors BAD_CHAR_AFTER_BACKSLASH and BAD_CHAR_AFTER_ESCAPE is.  In my DFA, when I'm in a string literal and find a "\", I'm looking for an "n", "r", a quote or another "\".  If I don't find one of those then I was create an ErrorToken with message BAD_CHAR_AFTER_BACKSLASH. 

 

This should be a BAD_CHAR_AFTER_ESCAPE, the \ in a string is called an

escape character.  Use BAD_STRING for anything else that can go wrong with a string, like reaching the end of file with an open string.

 

Elsewhere, \ can only appear as part of /\ or \/  If not, then you have

a BAD_CHAR_AFTER_BACKSLASH error.

 

 

I can't find a home in my DFA for the error BAD_INTLIT.  From the lexical structure, it is clear that a 0 can only be followed by a character that is not an ident_part. When this was violated I created an ErrorToken for BAD_CHAR_AFTER_0.  But assuming that you start with any other digit  beside 0, I'm simply collecting characters until I get a non-digit or end of file.  Since I can't see a home for BAD_INTLIT.  Am I misinterpreting the lexical structure?

 

This is not part of the lexical structure itself but you use BAD_INTLIT if you get a NumberFormatException (indicating the number is too big) when you try to compute the int value.

 

 

What is the exact definition of ascii_char - anything with character

code between 1 and 127?  Or is the definition of ascii_char exclusive of  some characters in white_space?

 

Whitespace characters are ascii characters.  You really don't need to worry about this--just be able to deal with whatever the Reader's read method returns.  It will either be an int that can be cast to char -1 indicating end of file.  The char will either be something you are looking for, or an error.

 

 

 
I wanted to understand how the character position and the line number needs to be changed in program with whitespaces. That is the extent to which we should handle special scenarios, especially the following :- For LF and CR characters, should we consider them as that of their original meaning in which CR takes to the beginning of the SAME line and LF takes to the SAME position as current position in next line or should LF and CR be considered to be equivalent to CR+LF and take us to the beginning of next line.  Also regarding tab characters and other whitespaces: How do we get to know the amount by which char_num should be incremented. I know for tab its 4, but what about other whitespaces?
 
Our language uses the conventions of Java, i.e. that a line is terminated with either \n, \r or the sequence \r\n.  In all three cases, you increase the line count and set the character count to 0.  The only thing at all tricky is to avoid counting \r\n as two lines.  (The reason for all this is to properly handle program sources following the conventions of different OSs--Unix terminates lines with \n, Windows with \r\n)
 
All whitespace characters, including tabs, are just one character. (You're counting characters in the input, not the way they might be displayed.)  The only thing you ever have to do with the character count is increment it by one or reset it to zero when you start a new line.