bosoli.blogg.se - Unicode caret

UNICODE CARET CODE

So, a valid regular expression that matches this symbol would be /\cJ/, e.g. They require exactly one character following \c.įor example, U+000A LINE FEED is ^J in caret notation (because 0x000A = 10 and J is the 10th letter of the alphabet).

UNICODE CARET CODE

In regular expressions (not in strings!), any character with a character code greater than 0 and lower than 26 can be escaped using its caret notation character, prefixed with \c.Ĭontrol escapes are three characters long. It’s probably easiest to define octal escape syntax using the following regular expression: \\(?. It looks like one, and it’s even equal to \00 and \000, both of which are octal escape sequences - but unless it’s followed by a decimal digit, it acts like a single character escape sequence. Note that there’s one exception here: by itself, \0 is not an octal escape sequence. By simply zero padding octal escapes, you can avoid this problem. In other words, '\12' (a single octal character escape equivalent to '\012') is not the same as '\0012' (an octal escape '\001' followed by an unescaped character '2').

1), the next character will be considered part of the escape sequence until at most three digits are matched. '\1') is part of a larger string, and it’s immediately followed by a character in the range (e.g. '\1', '\01' and '\001' are equivalent zero padding is not required. Octal escapes can consist of two, three of four characters. To use the same example, the copyright symbol ( '©') has character code 169, which gives 251 in octal notation, so you could write it as '\251'. (Note that this is the same range of characters that can be escaped through hexadecimal escapes.) any character in the extended ASCII range) can be escaped using its octal-encoded character code, prefixed with \. Octal escape sequencesĪny character with a character code lower than 256 (i.e. If cross-browser compatibility is a concern, use \x0B instead of \v.Īnother thing to note is that the \v and \0 escapes are not allowed in JSON strings. Note: IE < 9 treats '\v' as 'v' instead of a vertical tab ( '\x0B'). However, using \u outside of a Unicode escape sequence, or \x outside of a hexadecimal escape is disallowed by the specification, and causes some engines to throw a syntax error. '\a' = 'a'), but this is of course not needed. I suppose you could think of \ followed by a new line as an escape sequence for the empty string.Ĭharacters without special meaning can be escaped as well (e.g. This is simply a way to spread a string over multiple lines (for easier code editing, for example), without the string actually including any new line characters. The new line doesn’t become part of the string. The \ followed by a new line is not a character escape sequence, but a LineContinuation.

There’s only one exception to this rule: 'abc\ Note that the escape character \ makes special characters literal.

\": double quote (U+0022 QUOTATION MARK)Īll single character escapes can easily be memorized using the following regular expression: \\.

\0: null character (U+0000 NULL) (only if the next character is not a decimal digit else it’s an octal escape sequence).

\v: vertical tab (U+000B LINE TABULATION).

\t: horizontal tab (U+0009 CHARACTER TABULATION).

\r: carriage return (U+000D CARRIAGE RETURN).

There are some reserved single character escape sequences for use in strings: Now that’s out of the way, let’s take a look at the different types of character escape sequences in JavaScript strings. Basically, JavaScript uses code units rather than code points. To get the actual character code of these higher code point characters in JavaScript, you’ll have to do some extra work. Since JavaScript uses UCS-2 encoding internally, higher code points are represented by a pair of (lower valued) “surrogate” pseudo-characters which are used to comprise the real character. the character with code point 0xFFFF, which is 65535 in decimal). In JavaScript, String#charCodeAt() can be used to get the numeric Unicode code point of any character up to U+FFFF (i.e. Character codes, code points, and code unitsĪ code point (also known as “character code”) is a numerical representation of a specific Unicode character.įor example, the character code of the copyright symbol © is 169, which can be written as 0xA9 in hex. Having recently written about character references in HTML and escape sequences in CSS, I figured it would be interesting to look into JavaScript character escapes as well.