Random Writer (Markov)

This is to implement a Markov approach to predicting what is going to happen next with using what has happened before. For example, for handwriting, if you see the letters “TH” and there is one letter left but you are not able to read it, then you could guess it could be the letter “E”, well that is kinder the idea.

So, in this assignment what is required is to read in a file, any file story etc, and then create seeds (which is a user requested input) with the next character(s) that are linked to that seed. To understand what a seed is, lets say that we have a line of text

“Hi there, my knickname is genux, and I enjoy to do software development, in many languages and operating systems, because development is just interesting!!”

and we picked a seed length of 3, so to start with the first seed is “Hi ” <- yep there is a space there!! because it has to be a length of 3, and the next character is "t" which is what is linked to the seed value "Hi ", so from the text above with a seed length of 3 you can see that there is a seed of "ent" (within the development words for example) has two choices to make for the next character either a "," or a " " and this is the random choice of where to go after this test. If you went to " " then the new seed would be "nt " which there is only one next possible seed which is "t i" and you just carry on like this. To start with this problem, I wanted to create a mapped seeds that had attached the next characters that are linked to that seed value, so it would be a

Map<Vector<char>  > theKeys

which means that we are using a map (seed is the string key) and then a vector or char(acter(s)) that are associated with that seed. So then we need to read in the value and insert into the map, so here is the way that I am using to insert a character into vector associated with the mapped seed key. What I am doing is to start with, to see if there is already a seed within the map already, if so create a vector of values that is already associated with that seed key, either or, insert the new (insertChar) into the vector and place into the map seed key (if you use the put method if will replace the previous values associated with that key).

inline void addNewChar(Map<Vector<char> > &theKeys, string seedValue, char insertChar)
{
	Vector<char> addResults;
	if (theKeys.containsKey(seedValue))
		addResults = theKeys.get(seedValue);
	addResults.add(insertChar);
	theKeys.put(seedValue, addResults);
}

the next thing is to start with reading the seed start length from the file (the readSeedFromFile method will do that and return the string of its value), then whilst there is characters left in the file keep on reading the file whilst inserting the seed and characters into the mapped variable (Map > &theKeys).

void setupKeys(Map<Vector<char> > &theKeys, ifstream &infile, int seed)
{
	// obtain the first seed value
	string seedValue = readSeedFromFile(infile,seed);
	char newChar;
	while (!infile.eof())
	{
		newChar = nextChar(infile);
		addNewChar(theKeys, seedValue, newChar);
		seedValue = seedValue.substr(1,seedValue.length()-1) + newChar;
	}
}

So after we have read in the file, we need to find the highest seeded that has the most next characters attached to it, and this what the function below does, the foreach is a iterator that will loop through all of the seed (key) values within the Map and then find the length of the next characters.

string obtainAMaxKey(Map<Vector<char> > &theKeys)
{
	int maxSeed =0;
	string maxKey="";
	Vector<char> values;
	// iterator through the map values
	foreach (string key in theKeys)
	{
		values = theKeys.get(key);
		if (values.size() > maxSeed)
		{
			maxKey = key;
			maxSeed = values.size();
		}
	}
	return maxKey;
}

the last thing is to output the Markov, stop at a word count of 2000 or if there is no more characters attached to the last seed value. So all we do is start with the seed value from above function, then just get the Vector from the seed, then pick a random number from the length of that Vector and update the seed to the next seed. As below.

void outputMarkov(Map<Vector<char> > &theKeys, string startKey)
{
	Randomize();
	int wordCount = startKey.length(), randomKey;
	Vector<char> values;
	while (true)
	{
		if (wordCount >= 2000) break;
		values = theKeys.get(startKey);
		if (values.size() ==0) 
		{
			cout << "NO MORE KEYS"; 
			break;
		}
		randomKey = RandomInteger(0,values.size()-1);
		cout << values[randomKey];
		startKey = startKey.substr(1,startKey.length()-1) + values[randomKey];
		wordCount++;
	}
}

I have attached the zip file with also the PDF file of the requirements. It will work within visual studio 2008

Leave a Reply

Your email address will not be published. Required fields are marked *