- Open Access
- Total Downloads : 924
- Authors : Milap J Bhuva, Hiren Kamani
- Paper ID : IJERTV3IS050200
- Volume & Issue : Volume 03, Issue 05 (May 2014)
- Published (First Online): 14-05-2014
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
DNA Indexing & Searching Human Authentication using STR Locus based on PCR Process
Milap J. Bhuva
Department of Information Technology
G.K.B.I.E. Rajkot, India
Hiren Kamani
Assistant Professor of CE/IT Department
-
Rajkot, India
AbstractDNA Indexing & Search Human Authentication System works on bases of DNA profiling. It contains a DNA Data Bank which has DNA Profiles of civilization. DNA Profile stored with DNA STR Locus or DNA sample and necessity persons information. These samples are stored in database by performing algorithmic procedure, it is the major task of the system which is not stored a DNA sample in physically but it stored with digitize unique DNA signature. It also manages the DNA Data Bank and is responsible for developing, providing, and supporting this Program to federal, state, and local crime laboratories in our country and selected international law enforcement crime laboratories to foster the exchange and comparison of forensic DNA evidence from violent crime investigations. The DNA Data Bank Unit also provides administrative management and support to the police or crime investigation departments for various advisory boards, Forensic Science, Department of Justice (DOJ) grant programs, and legislation regarding DNA.
Salient features of this system to resolve
-
Rape case.
-
Murder case.
-
Disputed Paternity & Maternity case.
-
Child Swapping & Kidnapping case.
-
Unknown Dead bodies Identification case.
(Like Suicide, Dead body exists at any place, Mass disaster- example: Uttarakhand floods disaster, an Artificial accident- example: Bomb blast.)
Index Terms forensic science, DNA Profiling, polymerase chain reaction, mass fatality incidents, mass disaster, software, D3S1358, vWA, FGA,D8S1179, D21S11, D18S51, D5S818, D13S317, D7S820, TH01, TPOX, CSF1PO, D16S539
-
INTRODUCTION
The DNA Indexing & Search Human Authentication system which contains DNA Data Bank It is an effective tool for fighting violent crimes which has contains DNA samples of each and every people. Some people feel that these DNA databases are a great help to society, giving new hope to forensic cases that would otherwise have no leads. Others feel
they are an invasion of privacy, and that these databases should consist only of samples collected from violent crime offenders.
DNA Indexing & Search Human Authentication system works with 24 STR locus in which 13 STR locus commonly used in CODIS[1]. In this paper there are 24 STR loci name suggest as CSF1PO, FGA, TH01, TPOX, vWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11, D2S1338, D19S433, Penta D, Penta E, D1S1656, D2S441, D10S1248, D12S391, D22S1045, D6S1043, SE33.
Information on 24 commonly used STR Loci present in commercial kits
STR Marker
Position (MB)
Repeat Motif
Allele Range
D1S1656
230.905
TAGA
8-20.3
TPOX
1.493
AATG
4-16
D2S441
68.239
TCTA/TCAA
8-17
D2S1338
218.879
TGCC/TTCC
10-31
D3S1358
45.582
TCTA/TCTG
6-26
FGA
155.509
CTTT/TTCC
12.2-51.2
D5S818
123.111
AGAT
4-29
CSF1PO
149.455
AGAT
5-17
SE33
88.987
AGAT/AGAC
3-49
D6S1043
92.450
GATA
8-25
D7S820
83.789
TCTA/TCTG
5-16
D8S1179
125.907
GGAA
6-20
D10S1248
131.093
TCAT
7-19
TH01
2.192
TCAT
3-14
vWA
6.093
TCTA/TCTG
10-25
D12S391
12.450
AGAT/AGAC
13.27.2
D13S317
82.692
TATC
5-17
Penta E
97.374
AAAGA
5-32
D16S539
86.386
GATA
4-17
D18S51
60.949
AGAA
5.3-40
D19S433
30.416
AAGG/TAGG
5.2-20
D21S11
20.554
TCTA/TCTG
12-43.2
Penta D
45.056
AAAGA
1.1-19
D22S1045
37.536
ATT
7-20
These alleleic 24 STR Locus presents or says the unique address for the person identity which is very useful to identify the person.
-
Y-CHROMOSOME DNA TESTING
The Y-chromosome is found only in males and therefore genetic markers along the Y-chromosome can be specific to the male portion of a male-femaleDNA mixture such as is common in sexual assault cases. Y-chromosome markers can also be useful in missing persons investigations, some paternity testing scenarios, historical investigations, and genetic genealogy due to the fact that most of the Y-chromosome (barring mutation) is passed from father to son without changes.
A core set of Y-chromosome STR (Y-STR) loci is widely used in laboratories worldwidefor human identity testing and genetic genealogy. The minimal haplotype loci(MHL) were selected in the late 1990s from a meager set of available Y- STRs.The MHL include DYS19, DYS389I, DYS389II,
DYS390, DYS391, DYS392, DYS393,and the polymorphic, multi-copy marker DYS385[2]. In 2003, the Y- chromosomesubcommittee of the Scientific Working Group on DNA Analysis Methods (SWGDAM)recommended two additional Y-STRs named DYS438 and DYS439[3].
Characteristics of Commonly Used Y-chromosome STR Loci present in commercial kits.
STR Marker
Position (MB)
Repeat Motif
Allele Range
DYS393
3.19
AGAT
8-17
DYS456
4.33
AGAT
13-18
DYS458
7.93
GAAA
14-20
DYS19
10.13
TAGA
10-19
DYS391
12.61
TCTA
6-14
DYS635
12.89
TSTA
17-27
DYS437
12.98
TCTR
13-17
DYS439
13.03
AGAT
8-15
<>DYS389 13.12
TCTR
9-17/24-34
DYS438
13.38
TTTTC
6-14
DYS390
15.78
TCTR
17-28
GATA-H4
17.25
TAGA
8-13
DYS385
19.26
GAAA
7-28
DYS392
21.04
TAT
6-20
DYS448
22.78
AGAGAT
17-24
-
TRADITIONAL USE OF AVAILABLE METHODS
There are three types of DNA fingerprints: RFLPs, VNTRs, and STRs. Restriction fragment length polymorphisms, or RFLPs as they are commonly known, were the first type of DNA fingerprinting which came onto the scene in the mid- 1980s. RFLPs focus on the size differences of certain genetic locations. This technique is the DNA equivalent of screening sand through a progressively finer mesh screens to determine particle sizes[5].
Variable number tandem repeats, or VNTRs represent specific locations on a chromosome in which tandem repeats of 9-80 or more bases repeat a different number of times between
individuals. These regions of DNA are readily analyzed using the RFLP approach and a probe specific to a VNTR locus[6]. The fragments are a little shorter than RFLPs (about 1-2 kilo base pairs), but are created through the exact same process.
Some of the advantages of these types of DNAfingerprints are that they are the most stable and reproducible, which is a valuable trait tohave when you are trying to determine an exact match of a persons DNA, which mustexclude billions of other peoples DNA with a certain degree of confidence. Some of the disadvantages of RFLPs and VNTRs include they are very time consuming (especially the probe hybridization step), relatively large amounts of DNA must be used to obtain an adequate sample, too many polymorphisms may be present for a short probe, and the cost is very high due to labor and time requirements[7]. And also these types of sequence analysis or DNA profiling require fresh blood sample or evidence.
Currently, the most popular method of DNA fingerprinting is Short tandem repeats, or STRs for short. Unlike VNTRs which analyze minisatellites that have repeat sequences of 9-80 base pairs, STRs use microsatellites which have repeat sequences ofonly 2-5 base pairs, introducing the less is more philosophy to the world of DNA fingerprinting. This was a big step forward in forensic science since the length of DNA fragment being analyzed is short enough to be amplified by polymerase chain reaction (PCR)[8], so now we are able to analyze a very small sample of DNA that is quicker and easier than any previously known method and match it to a persons identity. PCR was developed in the mid 1980s and used the same principles that cells use to replicate DNA to amplify the specified region, which is usually between 150-3,000 base pairs in length. In this type of DNA profiling there is no need to require fresh blood sample. It is applying all the available evidence and also for degraded mix blood samples.
Evidence
Possible Location of
DNA on the Evidence
Source of DNA
Baseball bat or
similar Weapon
Handle, end
Sweat, skin, blood, tissue
Hat, bandanna, or
mask
Inside
Sweat, hair, dandruff
Eyeglasses
Nose or ear pieces, lens
Sweat, skin
Facial tissue, cotton
swab
Surface area
Mucus, blood, sweat,
semen, ear wax
Dirty laundry
Surface area
Blood, sweat, semen
Toothpick
Tips
Saliva
Used cigarette
Cigarette butt
Saliva
Stamp or envelope
Licked area
Saliva
Tape or Ligature
Inside/outside surface
Skin, Sweat
Bottle, can, or glass
Sides, mouthpiece
Saliva, Sweat
Used Condom
Inside/outside surface
Semen, vaginal or
rectal cells
Blanket, pillow,
sheet
Surface area
Sweat, hair, semen,
urine, Saliva
Through and
throughBullet
Outside surface
Blood, tissue
Bite mark
Persons skin or clothing
Saliva
Fingernail, partial
fingernail
Scrapings
Blood, sweat, tissue
Sources of information regarding DNA Samples:
STRs are currently the most popular type of DNA fingerprint, since the whole PCR process takes only a few hours, compared to RFLP/VNTR probe hybridization and film exposure which can take several days. STRs can use much smaller samples of DNA than RFLPs/VNTRs, and can even use partially degraded DNA to create a fingerprint.
Thus, the integrity and quality of the DNA sample is not as great a factor with STRs than with the traditional methods of DNA fingerprinting. Thecurrent standard forensic protocol analyses 13 core STR loci which have been carefully chosen for their uniqueness[9]. The only disadvantage of the STR approach is it is sensitive to contaminating DNA, so usually the STR approach is used first, followed by a VNTR analysis if contamination is suspected, and enough DNA is available.
There are various STR markers kits available: Applied Biosystems(Identifiler, MiniFiler, NGM, NGM SElect), Promega (PowerPlex 16, PowerPlex ESI 17, PowerPlex ESX 17, PowerPlex 18D), and Qiagen (ESSplex, IDplex) kits.[4]
-
PROTOTYPE
There are various algorithms are analysed for this kind of purpose. By using dynamic programming approach especially in DNA sequencing Needleman-Wunsch[10] algorithm and Smith-waterman algorithms [12] are more complex in finding exact patternmatching algorithm. By this method the worst case complexity isO(mn). The major advantage of this method is flexibility inadapting to different edit distance functions. The Raita algorithm [11] utilizes the same approach as Horspoolalgorithm[13] toobtaining the shift value after an attempt. Instead of comparingeach character in the pattern with the sliding window from right toleft, the order of comparison in Raita algorithm [11] is carried outby first comparing the rightmost and leftmost characters of thepattern with the sliding window. If they both match, the remainingcharacters are compared from the right to the left. Intuitively, theinitial resemblance can be established by comparing the last andthe first characters of the pattern and the sliding window.Therefore, it is anticipated to further decrease the unnecessarycomparisons. The Aho-Corasick[14] algorithm consists ofconstructing a finite state pattern matching machine from thekeyword and then using the machine to process the text in a singlepass. It can find an occurrence of several patterns in the order ofO(n) time, where n is the length of the text, with pre-processing ofthe patterns in linear time.
In this paper construct anew algorithm whch checks the input DNA profile to find all the occurrence of a allelic pattern within this profile, based on skip ( put _to match the patterns)[15] can be described as follows-
-
Fix the DNA Profile index in a cretin position.
-
Use this position as a starting point of matching.
-
Compare the DNA Profile contents from the defined point with pattern contents.
-
Find the skip value depending on the match number (rangesfrom 1 to y-1)
-
Perform the above sequence while the DNA Profile position does not reaches x-y.
Description ofthe Algorithm:
This algorithm assumes that there is input DNA Profile (T) that has size (x) and there is an Allelic Pattern (P) with size (y). Sothe algorithm proceeds as follows
-
Input DNA Profile (T) of size (x) and Allelic Pattern (P) of size (y).
-
Output starting index of all substring occurrence of (T) that is equal to (P) and output (-1) if no such substring exists.
-
Initialization step: skip = 1, index i of T = 1. Number of occurrence = 0.
-
Check index. If index <= x-y, then go to step 5. Else go to step 12.
-
Now set index j of T = 1, and save i if (k = i).
-
Check j. If j<=y go to step 7, else go to step 8.
-
Compare P(j) and T(k). If they are equal then increase the value of j and k by 1 and go to step 6.
-
Skip if j=skip.
-
Increase number of occurrence.
-
Now do (I +skip).
-
Go to step 4.
-
Return number of occurrence.
-
-
IMPLEMENTATION OF THIS ALGORITHM This algorithm was implemented using C and tested
usingdifferent DNA sequences and DNA Profile with different DNA Profile sizes. However this algorithm will be compared with other algorithms innear future.
-
RESULT ANALYSIS
Due to the lack of resources, mainly the theoretical resultsare considered here. In future experiment based results will becompared for a better understand of this algorithm. This algorithm is very simple to describe and hasfollowing main features
-
Good time complexity.
-
Unlimited sized of DNAProfile can be handled.
-
Unlimited sized of Allelic Pattern can be handled.
-
Skip technique is used.
-
Can be used different ranges of applications.
-
-
FUTURE WORK
Here on this particular paper, mainly emphasizing ontheoretical results. In future I will emphasize more onpractical results. This algorithm will be comparedwith other algorithms that has common features like
-
Multiple DNA ProfileMatching and Searching
-
Maintaining different types of DNA Profiles (Particular Case wise)
-
No processing time
In near future this algorithm will be compared withother algorithms considering other factors.
-
-
CONCLUSION
-
A new algorithm is developed for indexing and searching mechanism for multiple DNA Profiling and Allelic Patterns.This algorithm proves performance which is bettercompared to other algorithms. This is a very light weightalgorithm. The space complexity and the time complexity ofthisalgorithm is less compared to other andalgorithms. This algorithm can work with a very long text and pattern which is a great advantage over other algorithms.
ACKNOWLEDGMENT
I would like to thank my Professor Mr.HirenKamaniand Dr.NikunjBrahmbhattfor his time. I am grateful to him for providing me support. Finally, I acknowledge my Professor Mr.HirenKamanifor his kind suggestions on every step of thiswork.
REFERENCES
-
J. M. Butler*, C. R. HillNational Institute of Standards and Technology, Applied Genetics Group, Gaithersburg, Maryland, United States of America
-
Butler JM. Forensic DNA typing: biology, technology, and genetics of STR markers. 2nd ed. Elsevier: New York, 2005.
-
Gill P. Role of short tandem repeat DNA in forensic casework in the UK-
-past, present, and future perspectives. Bio Techniques 2002;32:366- 72.
-
Budowle, B., et al. (1998). CODIS and PCR-based short tandem repeat loci: DNA Profiling tools. Proceedings of the Second European Symposium on Human Identification, pp. 73-88. Madison, Wisconsin: Promega Corporation.
-
People v. Miles (1991) 577 N.E. Rptr. 2d Series 477 (Ill. App. 4 Dist. 1991), RFLPs (2004)
-
People v. Castro (1989) 144 Misc. 2d Series 956, 545 N.Y. Supp. 2d Series 985 (Sup.Ct.1989), VNTRs.
-
Texas Tech University."Molecular Genetic Markers."Genetics@TTU. Fall 2005, 11thNov. 2005.
-
United States v. Downing (1985) United States Court of Appeals, 3rd Circuit, Vol. 753Federal Series, 2d, pp. 1224. 609 F. Supp. 784 (E.D. Pa. 1985).753 F. Rptr. 2d Series 1227-28 (3rd. Cir. 1985).
-
Hares, D.R. (2011). Expanding the CODIS core loci in the United States.Forensic Sci. Int. Genet.(in press), doi:10.1016/j.fsigen.2011.04.012
-
Needleman, S.B Wunsch, C.D(1970). A general method
applicable to the search for similarities in the amino acidsequence of two proteins. J.Mol.Biol.48,443-453.
-
RajuBhukya, DVLN Somayajulu,An Index Based KPartitionMultiple Pattern Matching Algorithm, Proc.Of International Conference on Advances in Computer Science2010 pp 83-87.
-
Smith,T.F and waterman, M (1981). Identification of commonmolecular subsequences T.mol.Biol.147,195-197.
-
Horspool, R.N., 1980. Practical fast searching in strings. Software practice experience, 10:501-506.
-
Aho, A. V., and M. J. Corasick, Efficient string matching:an aid to bibliographic Search, Communications of theACM 18 (June 1975), pp. 333 340.
-
Ziad A.A Alqadi, MusbahAqel&Ibrahiem M.M.EI Emary,Multiple Skip Multiple Pattern Matching algorithms.IAENGInternational.Vol 34(2),2007.