This is assignment 2 from a older CS107 course, because the latest course work is held on a UNIX server which in turn you need access to!!.. kinder does not help when you do not have access too!!.. Here is where I got the files from cs107 link, there is a zip file at the bottom of that page which has all of the assignments and handouts etc.
I have included the PDF of the assignment in the above file link, but since the actual assignment includes a few massive files you can download them from the CS107 link of the whole course work that I am using from the cs107 link.
As a note, within my linux setup, I need to setup the environment variable to tell the program to use a Linux little endian by
export OSTYPE=linux
Basically the assignment is to load in cast/movie details (which the base code from the assignment already does for use) and then we are needed to pull out the data from the structure of the file, the structure of the actors is (as taken from the PDF file)
- The name of the actor is laid out character by character, as a normal null-terminated C-string. If the length of the actor’s name is even, then the string is
padded with an extra ‘\0′ so that the total number of bytes dedicated to the name is always an even number. The information that follows the name is most
easily interpreted as a short integer, and virtually all hardware constrains any address manipulated as a short * to be even. - The number of movies in which the actor has appeared, expressed as a two-byte short. (Some people have been in more than 255 movies, so a single byte just
isn’t enough.) If the number of bytes dedicated to the actor’s name (always even) and the short (always 2) isn’t a multiple of four, then two additional ‘\0′’s
appear after the two bytes storing the number of movies. This padding is conditionally done so that the 4-byte integers than follow sit at addresses that are
multiples of four. - An array of offsets into the movieFile image, where each offset identifies one of the actor’s films.
with the base of the actors information containing a 4 byte integer value of how many actors are in the block of data and then the following values are in 4 bytes again with the offsets into the block of where the actual data is stored.
So the method that you need to write is to pull in that data from the actorInfo block of memory, and here is my version of it, the method is getCredits and takes the cast member (player) as the method will insert all of the films that cast as been in.
bool imdb::getCredits(const string& player, vector<film>& films) const { int numberOfActors,data; char *actorName; memcpy(&numberOfActors,(int*)actorInfo.fileMap, 4); // loop though the number of actors for (int i =2; i < numberOfActors; i++) { memcpy(&data, ((int*)actorInfo.fileMap)+i,4); actorName = ((char*)actorInfo.fileMap + (data)); if (strcmp(actorName,player.c_str())==0) { // find the padding lengths int actorNameLength = strlen(actorName)+1; int movieNumberPad = actorNameLength % 2; int fourOffset = (actorNameLength + movieNumberPad + 2) % 4; short numberMovies; memcpy(&numberMovies,((short*)((char*)actorInfo.fileMap + data + actorNameLength + movieNumberPad)),2); for (int j=0; j < numberMovies; j++) { int movieOffset; film theFilm; memcpy(&movieOffset, ((int*)((char*)actorInfo.fileMap + data + actorNameLength + movieNumberPad + (j*4) + fourOffset + 2)),4); theFilm.title = ((char*)movieInfo.fileMap + movieOffset); theFilm.year = 1900 + (int)((char*)movieInfo.fileMap + movieOffset + theFilm.title.length() + 1)[0]; films.push_back(theFilm); } return true; } } return false; }
The block of memory for the movie block is similar to the actor block and as taken from the PDF attached file here is the details of that block of memory,
- The title of the movie, terminated by a ‘\0′ so the character array behaves as a normal C-string.
- The year the film was released, expressed as a single byte. This byte stores the year – 1900. Since Hollywood is less than 28 years old, it was fine to just store the year as a delta from 1900. If the total number of bytes used to encode the name and year of the movie is odd, then an extra ‘\0′ sits in between the one-byte year and the data that follows.
- A two-byte short storing the number of actors appearing in the film, padded with two additional bytes of zeroes if needed.
- An array of four-byte integer offsets, where each integer offset identifies one of the actors in the actorFile. The number of offsets here is, of course, equal to
the short integer read during step 3
And once again like the actor block there is a 4 byte integer value that stores how many movies within the block of memory with the next 4 bytes storing the offset into that block of memory, here is my implementation of the method getCast that will take a movie as a parameter and find all of the cast (players) and return that vector list.
bool imdb::getCast(const film& movie, vector<string>& players) const { int numberOfMovies, data; film theFilm; char *movieName; memcpy(&numberOfMovies, (int*)movieInfo.fileMap, 4); for (int i = 1; i < numberOfMovies; i++) { memcpy(&data, (int*)movieInfo.fileMap+i,4); movieName = ((char*)movieInfo.fileMap + data); theFilm.title = movieName; theFilm.year = 1900 + (int)((char*)movieInfo.fileMap + data + theFilm.title.length() + 1)[0]; if (movie == theFilm) { int paddingFirst = theFilm.title.length() % 2; // paddingSecond = paddingfirst + length of title + 2 (year + \0) and then + 2 for the actors number int paddingSecond = (paddingFirst + theFilm.title.length() +2 + 2) %4; short numberOfActors; memcpy(&numberOfActors, (short*)((char*)movieInfo.fileMap + data + theFilm.title.length() + 2 + paddingFirst),2); // get the actors offsets and insert into the players list for (int k =0; k < numberOfActors; k++) { int offset; memcpy(&offset, ((int*)((char*)movieInfo.fileMap + data + strlen(movieName) + 2 + paddingFirst + 2 + paddingSecond + (k*4))), 4); players.push_back((char*)actorInfo.fileMap + offset); } return true; } } return false; }
After you have sorted out reading in the information you have to find out if there is a link from actor1 to another actor2 (six degrees) with linking them together the cast that they have been with with the films that they have stared in, so here is my breath search setup as taken from the assignment.
/** * @param actor1 - first actor to look for * @param actor2 - second actor to look for to link to the first * @param db - the imdb database loaded */ static bool generateShortestPath(const string& actor1, const string& actor2, const imdb& db) { list<path> partialPaths; set<string> previouslySeenActors; set<film> previouslySeenFilms; path actorsFirst(actor1); partialPaths.push_front(actorsFirst); while (!partialPaths.empty() && partialPaths.front().getLength() <=5) { path frontPath = (path)partialPaths.front(); partialPaths.pop_front(); vector<film> actorsFilms; db.getCredits(frontPath.getLastPlayer(), actorsFilms); for (int filmI=0; filmI < actorsFilms.size(); filmI++) { film filmData; filmData = actorsFilms.at(filmI); // not seen the film before if (previouslySeenFilms.find(filmData) == previouslySeenFilms.end()) { previouslySeenFilms.insert(filmData); vector<string> movieCast; db.getCast(filmData, movieCast); for (int castI =0; castI < movieCast.size(); castI++) { // if not already seen if (previouslySeenActors.find(movieCast.at(castI)) == previouslySeenActors.end()) { previouslySeenActors.insert(movieCast.at(castI)); path newfrontPath = frontPath; newfrontPath.addConnection(filmData, movieCast.at(castI)); if (movieCast.at(castI) == actor2) { cout << newfrontPath << endl; return true; } else partialPaths.push_back(newfrontPath); } } } } } return false; }
You can do some speed ups from the assignments PDF file, but they are mainly the icing on the cake and I thought that I would assignment 3 first, so if I do the extras for this I will update this post.
But this was a fun assignment, because of the memory block and finding out the data within it.