- Open Access
- Total Downloads : 28
- Authors : Lakshmi S R , Asha N , K C Gouda,
- Paper ID : IJERTCONV4IS29019
- Volume & Issue : ICIOT – 2016 (Volume 4 – Issue 29)
- Published (First Online): 24-04-2018
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Predictive Analytics for Rainfall Prediction
Lakshmi S R Asha N
Department of Computer Science and Engineering, Dept. of CSE and Engineering
APS College of Engineering, Kanakpura, Bangalore APS College of Engineering, Kanakpura, Bangalore
K C Gouda,
* CSIR Fourth Paradigm Institute NAL Belur Campus, Bangalore, India
environment where several softwares like MATLAB,
Abstract: Rainfall is important for food production plan, water resource management. India is an agricultural country and its economy is largely based upon productivity. Thus rainfall prediction becomes a significant factor in agricultural countries like India. On the growing importance of Rainfall studies in the climate change scenario and High Performance Computing, different Users starting from a farmer to a scientist to a policy maker needs the rainfall prediction well in advance for their application like crop planning, water storage etc. Data discovery from temporal, spatial and spatio- temporal data is critical for rainfall analysis. However, recent growth in observations and model outputs, combined with the increased availability of geographical data, presents new opportunities for the users to implement new techniques such as predictive analytics for developing a predictor which can be used for multi-scale forecasting of rainfall that is from 24 hour forecast to long-range forecast say 2-3 month in advance forecast. Hence we developed predictive analytics system for the efficient and real time prediction of rainfall over India.
KeywordsAgriculture, Ensemble forecasting, Rainfall Forecasting, Prediction.
-
INTRODUCTION
Agriculture is the predominant occupation in India, accounting for about 52% of employment. Since, India is an agricultural country and its economy is largely based upon crop productivity. The occurrence of prolonged dry period or heavy rain at the critical stages of the crop growth and development may lead to significant reduce crop yield. Rainfall is important for food production plan, water resource management and all activity plans in the nature. Thus rainfall prediction becomes a significant factor in agricultural countries like India. Rainfall forecasting has been one of the most scientifically and technologically challenging problems around the world in the last century. The rainfall data is available for the data mining techniques which can be useful for predicting the rainfall which can very useful for taking decisions over crop planting in the areas.
The Irrigation facilities are inadequate, as revealed by the fact that only 52.6% of the land was irrigated in 200910 which result in farmers still being dependent on rainfall, specifically the Monsoon season. A good monsoon results in a robust growth for the economy as a whole, while a poor monsoon leads to a sluggish growth. In the present study a framework for rainfall prediction from past data and present weather condition is generated using the predictive analytics in a High Performance Computing
STATISTICA etc are interfaced with the object oriented language to develop a multi-scale forecasting platform for rainfall prediction.
-
RELATED WORK
There have been many attempts to forecast rainfall. Rainfall forecasting can apply to many time horizons such as short term, medium term, and long term periods. Some authors design systems which can forecast yearly data, some try to forecast monthly data whereas some try to forecast daily data.
N. Sen. [1] has presented long-range summer monsoon rainfall forecast model based on power regression technique with the use of Ei Nino, Eurasian snow cover, north west Europe temperature, Europe pressure gradient Wind pattern, Arabian sea SST, east Asia pressure and south Indian ocean temperature in previous year. The experimental results showed that the model error was 4%.
-
Nkrintra, [2] described the development of a statistical forecasting method for SMR over Thailand using multiple linear regression and
local polynomial-based nonparametric approaches. SST, sea level pressure (SLP),wind speed, EiNino Southern Oscillation Index (ENSO), IOD was chosen as predictors. The experiments indicated that the correlation between observed and forecast rainfall.
-
Sohn, [3] has developed a prediction model for the occurrence of heavy rain in South Korea using multiple linear and logistics regression, decision tree and artificial neural network.
M. T. Mebrhatu [4] modeled for prediction categories of rainfall (below, above, normal) in the highlands of Eritrea. The most influential predictor of rainfall amount was the southern Indian Ocean SST. Experimental results showed that the hit rate for the model was 70%.
H. Hasani [5] proposed human height prediction model based on multiple polynomial regression that was used successfully to forecast the growth potentials of height with precision and was helpful in children growth study.
Vaccari [6] modeled plant motion time series and Nutrient recovery data for advanced life support using multi variable polynomial regression.
But in these studies the approach is very simple as the data set is very small, To handle data mining in a Big data environment as all the climate data are of large size with multi-dimensional (latitude ,longitude, vertical levels,
time(day,month,year) ) and of large time scale starting from daily to decades i.e 10 years. So in the present project work algorithms will be developed for large climate data analysis to study the climate change over India.
Boundary
Condition
Boundary
Condition
Model
output+Predic tive Model
Model
output+Predic tive Model
CMMAC
SGCM
model
CMMAC
SGCM
model
-
-
SYSTEM DESIGN
Initial
Conditio n
Initial
Conditio n
Analysis
Algorithm
Analysis
Algorithm
Results
Results
Visualization Tools
Visualization Tools
Real time
observation andApplication (nowcasting)
Real time
observation andApplication (nowcasting)
The fig 1 shows the high level design of forecasting a rainfall. In this we consider both boundary conditions (eg
.It may be land, ocean, forest parameter) as well as initial conditions( ie.,today's condition) data along with the GCM(Global Climate Model) output data. With these output data we apply our predictive analysis algorithm to analyze the data. Finally, the forecasted results will be
Step 6: Using the trained Predictive model to derive present day rainfall state with the help of Principal components obtained from GCM output and rainfall state of the previous day.
-
Design of Multi-Scale Predictive Analytics
-
Short Range Prediction(1-6hrs)
Now casting using real-time satellite data.
eg. IPL match, event organization, closing school etc.
-
Medium Range Prediction(3-5days)
Cyclone or heavy rainfall using meso-scale model (WRF)
+PA algorithm
Eg. Disaster preparedness, Warning for Fisherman, Closing school.
-
Long Range Prediction(3-6months)
Monsoon rain forecasting in April for JJAS using GCM Model + PA algorithm
Eg. Agriculture planning, Water management, food security etc.
-
Climate Prediction(10-100yrs)
-
Understanding Climate Change, Global warming using CSIR climate model +PA algorithm
Eg. Food habit, crop pattern etc
-
-
IMPLEMENTATION AND RESULTS The modules are
-
Monsoon model (ong range)
-
weather research forecast model(Short-range)
-
Cyclone model (rainfall day cyclone)
In order to calculate the anomaly of rainfall we use the following formula.
1 50
obtained by the visualization package (ie., Grads).
Fig1. High level design
Rain(lat,lon) =
Anomaly
Rain(lat,lon, yr)
50 yr = 1
A. Flow of the Predictive Model
Step 1: Adopting unsupervised data classification technique, such as K-means clustering technique, for clustering of the observed multi-site rainfall data in order to identify the rainfall states present in the rainfall data.
Step 2: Perform Principal Component Analysis (PCA) to reduce the dimensions of the standardized predictor data,
-
CMMACS climate data set. The dimensionally-reduced climate variables represent a large fraction of the variability contained in the original data.
Step 3: Training the Predictive model(s) to establish relationship between the input data containing current day standardized and dimensionally-reduced climate predictors along with previous day(s) rainfall state and the output data containing the current day rainfall state.
Step 4: Applying bias correction for the GCM output data to obtain bias-corrected GCM data.
Step 5: Obtain principal components of GCM data by performing PCA of the bias-corrected GCM data with the help of principal directions obtained during PCA of CMMACS climate data.
For year if anomaly (-10 to10): Normal Year (ex 1998)
If anomaly less than (-10): Drought or Deficit Year (ex 2002)
If anomaly more than (+10): Flood or Excess Year (ex 1961).
Regression
Regression is a statistical empirical technique that utilizes the relation between two or more quantitative variables on observational database so that an outcome variable can be predicted from the others.
Regression use two methods Simple linear regression and multiple linear regression models. Regression produces a polynomial describing the relationship between any set of inputs and corresponding output.
Here we have considered past data to develop a regression equation,
-
A linear regression equation to predict monsoon rainfall
-
Y=aX1+bX2
Year
Rainfall
Where
Forecast
Observed
y=Normalized rain anomaly.
1951
748
737
x1= normalized anomaly of the location. x2=normalized
1952
797
792
anomaly of January to April.
1953
925
920
The regression equation is utilized in order to predict
1954
891
885
the accurate results. The overall measure is adopted for the
1969
872
829
accuracy of the forecasts is the root-mean-square error
1970
891
939
(rmse). 1971
911
886
1972
717
653
n 1973
921
912
767
747
i1 1975
948
960
Where fi and oi are the forecast and observed Indian 1976
853
855
monsoon rainfall for the ith year. In the present work an 1977
805
880
predictive analytics approach is carried out to provide 1978
845
908
better understanding of the rainfall pattern over all India 1979
769
746
using 50 years of multi-source data. This work involves 1980
825
881
development of several novel algorithms like ensemble 1981
853
842
forecasting, multi-model forecasting of the monsoon 1982
738
736
rainfall, extreme rainfall events, flood or drought Index etc. 1983
904
959
over India using predictive analytics. The developed 1984
845
835
predictive algorithm is compared against the traditional Table1.Comparison of PA forecasted Rainfall and
directions. Finally several case studies will be presented Year
Raw
With PA
Observat
using predictive analytics for rainfall predictions, which
Prediction
Prediction
ion
provide new scientific insights with high societal impacts. 2012
+3
-2
-7
Fig.2 shows the results of 20yr of verification of 2011
+6
+4
+2
observed and forecasted rainfall. The arrow along the 2010
-1
+3
+2
abscissa denotes the long-term mean rainfall. The two 2009
-5
-8
-22
dotted lines represent the actual rainfall.
Tab
le2. Rainfall Forecasting f
or all India
Y=aX1+bX2
Year
Rainfall
Where
Forecast
Observed
y=Normalized rain anomaly.
1951
748
737
x1= normalized anomaly of the location. x2=normalized
1952
797
792
anomaly of January to April.
1953
925
920
The regression equation is utilized in order to predict
1954
891
885
the accurate results. The overall measure is adopted for the
1969
872
829
accuracy of the forecasts is the root-mean-square error
1970
891
939
(rmse). 1971
911
886
1972
717
653
n 1973
921
912
767
747
i1 1975
948
960
Where fi and oi are the forecast and observed Indian 1976
853
855
monsoon rainfall for the ith year. In the present work an 1977
805
880
predictive analytics approach is carried out to provide 1978
845
908
better understanding of the rainfall pattern over all India 1979
769
746
using 50 years of multi-source data. This work involves 1980
825
881
development of several novel algorithms like ensemble 1981
853
842
forecasting, multi-model forecasting of the monsoon 1982
738
736
rainfall, extreme rainfall events, flood or drought Index etc. 1983
904
959
over India using predictive analytics. The developed 1984
845
835
predictive algorithm is compared against the traditional Table1.Comparison of PA forecasted Rainfall and
directions. Finally several case studies will be presented Year
Raw
With PA
Observat
using predictive analytics for rainfall predictions, which
Prediction
Prediction
ion
provide new scientific insights with high societal impacts. 2012
+3
-2
-7
Fig.2 shows the results of 20yr of verification of 2011
+6
+4
+2
observed and forecasted rainfall. The arrow along the 2010
-1
+3
+2
abscissa denotes the long-term mean rainfall. The two 2009
-5
-8
-22
dotted lines represent the actual rainfall.
Tab
le2. Rainfall Forecasting f
or all India
RMSE (Fi Oi)2 / n
1974
statistical methods of forecasting and proposes new
Observed Rainfall for the period 1951-1984
140000
130000
120000
110000
100000
range
range
90000
80000
70000
60000
l
l
50000
40000
30000
20000
10000
0
2003 JJAS rainfall
June July August Sept
month
Fig.2 Observed and forecasted rainfall for 20yr of verification
Table 1 shows the comparison of forecasted and observed rainfall over India from the period 1951- 1984.where the forecasted results are almost near to the observed data. Table 2 shows the result of forecasted rainfall for monsoon season (June, July, august & September) and this shows that predictive analytics prediction is almost near to observed rainfall data. Fig 3 shows the comparison of predicted model and observed monsoon rainfall.
Fig 3. Seasonal Rainfall variation in monsoon months
Rain_Anomaly(1951-2003)
17.5
12.5
Range
Range
7.5
2.5
-2.5
-7.5 -20. 8
-12. 5
-17.5
-22.5
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 0 0
5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 0 0
Year
Fig 4. Rainfall anomaly calculated for 53yrs of data
Fig 3. Shows the seasonal rainfall variation during the months of June-Septembers. The results are averaged over 53 years and fig 4 shows the rainfall anomaly calculated for 53 yrs of data.
-
-
CONCLUSION
The skill of prediction contributes towards developing methodologies for predicting rainfall at local or regional scale over India from large scale GCM output of climatological data. At the end of present work first time a predictive analytics package is on place using which one can generate accurate and efficient spatio- temporal rainfall forecasting at an affordable cost in a cloud computing and Big Data environment.
REFERENCES
-
Klaush Juliseh, Data mining for Intrusion Detection A critical review, Applications of Data mining in computer security, Daniel Barbara, Sushil jajodia, Published by Springer.
-
Guhathakurta,P (2005) Long-range monsoon rainfall prediction of 2005 for the districts and sub-division Kerala with artificial neural network, Current
Science, 90, 773-779
-
Rajeevan, M (2001) Prediction of Indian summer monsoon: Status, problems and prospects, Current Science.
-
D. Mark, Geographical information science; critical issues in an emerging cross disciplinary Research domain, URISA Journal vol. 12, February 1999.
-
Tamil Nadu Meteorology Department, Chennai.
-
European Commission, Forest Fires in Europe 2007, Technical report, Report No. 8, 2008.