I was wondering if someone can help
me with some data manipulation and collection for an academic
I have a data set of 210 films arranged by rows in Excel- their weekly
revenues, theater counts, etc. in a data set called
the [url removed, login to view] (I have included only a few for viewing). I want this data to be arranged differently for
my research need. In addition, I want a new column to
be created by pulling some additional data from an
internet website: movies.yahoo.com. I explain these below:
1. First, I want the data (from FullData)to
be arranged differently. Each movie should be followed for 8 weeks if it lasts that long. For example, film number 1 in the FullData
- Coffee and Cigarettes - lasted for 10 weeks (see the weekly gross and the weekly screens; but in the newly created data set we stop at week 8 (see Sample Data).
2. I want a new column number for the 9 weeks numbered 0 through
8. Please see column C in Sample Data. This should repeat for every
3. For weeks 1 through 8, I want the corresponding
weekly screen numbers and the weekly gross (revenues). Currently, they are
in a row in the FullData. Screen and revenues do not
exist for week
0 - so there should be a "." to indicate missing data corresponding to week 0.
The variables that do not vary with time such
as the budget, sequel, etc. just repeat themselves 8
times for weeks 1 through 8 (see the Sample Data for Coffee and Cigarettes).
4. Now consider column D in the Sample Data. This is a new column that is not in the FullData. This data needs to be collected fresh.
You need to go to Yahoo Messages under [url removed, login to view] and then go to the Message Board
regarding a specific movie. For Coffee and Cigarettes go to:
[url removed, login to view];e=Pwru6usDdBRWH7y_8tsXQNHY4w5tLBJGFS9fCz0btgpXHtnJcLZBa1isDtHKkG58MWDzlLpvNeGIgBHfp89K9uHxWMWl1SVuH0ewFLZBUGrbjZAqTCwE68q6T10J3hCqfl0-
This is the Community Message Board in Yahoo Movies. The messages in the Yahoo Message Board are dated and are publicly
accessible for free. Here I want the counts of
meassages starting with week 0 where Week 0 is all
days PRIOR to the opening day of the movie. The
message counts for weeks 1, 2, etc. upto week 8 should
follow. Since a movie usually opens on a Friday, Week 1 would be from that Friday (or Thursday if it opens on a Thursday) till the next Thursday. For example, Coffee and Cigarettes opened on May 14 (Friday), 2004 - this info is in FullData under the column "reldate" which stands for release date. Coffee and Cigarettes has only 9 messages in the Board, with the first one appearing on 19-Apr-04 11:41 pm while the most recent one on 25-Aug-05 09:21 am. Therefore, the number of messages under Week 0 is actually 3 (12-May-04 07:46 pm, 29-Apr-04 12:44 am and 19-Apr-04 11:41 pm), and so on... If in a specific week out of the first 8 weeeks, there is no message then it gets a value of "0".