Encode XML

Since XML is very verbose, you are given a way of encoding it where each tag gets mapped to a pre-defined integer value The language/grammar is as follows:
Element –> Element Attr* END Element END [aka, encode the element tag, then its attributes, then tack on an END character, then encode its children, then another end tag]
Attr –> Tag Value [assume all values are strings] END –> 01
Tag –> some predefined mapping to int
Value –> string value END
Write code to print the encoded version of an xml element (passed in as string).
FOLLOW UP
Is there anything else you could do to (in many cases) compress this even further?

Solution:

	private Map<String, Byte> tagMap;
	private static final Byte[] END = { 0, 1 };
	private List<String> tokens;
	private int currentTokenIndex;

	byte[] encode(char[] input) throws IOException {
		tokenize(input);
		currentTokenIndex = 0;
		ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
		encodeTokens(outputStream);
		return outputStream.toByteArray();
	}

	void encodeTokens(ByteArrayOutputStream output) throws IOException {
		nextToken("<");
		// read tag name
		String tagName = nextToken();
		output.write(getTagCode(tagName));
		// read attributes
		while (!hasNextToken(">") && !hasNextTokens("/", ">")) { // read next attribute
			String key = nextToken();
			nextToken("=");
			String value = nextToken();
			output.write(getTagCode(key));
			for (char c : value.toCharArray()) {
				output.write(c);
			}
			output.write(END[0]);
			output.write(END[1]);
		}
		// end of attributes
		output.write(END[0]);
		output.write(END[1]);
		// finish this element
		if (hasNextTokens("/", ">")) {
			nextToken("/");
			nextToken(">");
		} else {
			nextToken(">");
			// while not the end tag
			while (!hasNextTokens("<", "/")) {
				encodeTokens(output); // encode child
			}
			// ending tag
			nextToken("<");
			nextToken("/");
			nextToken(tagName);
			nextToken(">");
		}
		output.write(END[0]);
		output.write(END[1]);
	}
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: