From 3151a27482379c9e50f59cde4e3766fb75de0f9d Mon Sep 17 00:00:00 2001 From: tmp1u19 <tmp1u19@soton.ac.uk> Date: Sat, 29 Feb 2020 20:27:41 +0000 Subject: [PATCH] Add details in README file to explain the project and how to work with it. --- README.md | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ea6e8ef..148441f 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,29 @@ -The aim of this project is to analyse all the files containing reviews for each hotel in different situations. +The aim of this project is to analyse the number of reviews of each hotel, extract it and sort the hotels in a descending way based on their number of reviews. The data for each hotel is stored in the directory reviews_folder. The dataset to be used for this project is found at: https://secure.ecs.soton.ac.uk/notes/comp1204/coursework/dataset/reviews_dataset.tar.gz -The file should be extracted with the following UNIX commands: +From the terminal, you can extract the folder with the following commands: gunzip reviews_dataset.tar.gz tar xvf reviews_dataset.tar +In countreviews.sh the script of this project can be found. The first parameter is the path of the directory +to extract the data from. If you work in the same place you download the directory, you can simply write the +name of it. + +The script loops through the files of the directory mentioned, it memorises the name of the file (without any +extension or path) and then whenever it founds in that file a line which contains "Author", it counts it +(every time a line has "Author" in it, it means a new review will follow; by counting the number of authors + will result in counting the number of reviews). + +After looping through all the files and extracting the name of the file and the number of reviews, sort the +output by numerical value(-n flag) in reverse(-r flag) by the second key(-k2). + +To run the programme, first you have to make it executable with the command: + +chmod a+x countreviews.sh + +Then execute (in case you work in the same workspace as your data directory): + +./countreviews.sh reviews_folder -- GitLab