The following are some of the basic objectives of this project:
1. The main objective will be to gather and extract online tweets. These tweets should be from particular years, for example it can be from year 2012-2014.
2. The next objective will be to clean all the tweets and organize accordingly so as to make it easy to extract information from it. Tweets can be from various countries, if they are cleaned and arranged in a particular manner, this will help to retrieve information geographically.
3. After the tweets are cleaned and sorted out, the next thing will be to load the data into the warehouse where some operations using algorithms will be performed on them. Some tools may be required for this.
4. After the ETL process is done, the most important objective will be to apply some data mining algorithms on the tweets in order to get suitable and important information. Depending on the algorithms, various sectors can be considered such as Automobile, Transportation, Mobile Communication and so on.
5. All this will require detailed knowledge about data mining algorithms and their use in retrieving information.
what i want
What things will be required to design the system.
All the flow diagrams with explanation.
Star schema for the warehouse with explanation.
Minimum requirements of the system.
Implementation( important part):
In detailed explanation of how system was build.
Straight from collection of tweets, then how the tweets were clean, what was used.
Then after cleaning how the tweets were loaded in warehouse.
Then what algorithms were performed on them.
Explanation of the algorithm in detail.
Testing and evaluation:
How the system was tested and what was the output.
All the test plans.
Screenshot of the output with explanation.
If any error encountered then how was it corrected.
Appendix: all the code and algorithms used in the project.
1) what type of tweets are taken and from where, are they the one given by me.
2) how were the tweets cleaned, what was used for the cleaning.
3) in all how many tweets were there.
4) after cleaning how was the loading done in the data warehouse, what all were used, check for star schema and all.
5) after loading the data in the warehouse, what type of algorithm was used, is it a data mining algorithm?
6) after applying the algo, wat was the output.
7) it is expected that the output should specify what kind of device or operations system user is using and any important information about the tweet.