DNA Indexing & Searching Human Authentication using STR Locus based on PCR Process

Milap J Bhuva; Hiren Kamani

doi:10.17577/IJERTV3IS050200

Volume 03, Issue 05 (May 2014)

DNA Indexing & Searching Human Authentication using STR Locus based on PCR Process

DOI : 10.17577/IJERTV3IS050200

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 124
Total Downloads : 924
Authors : Milap J Bhuva, Hiren Kamani
Paper ID : IJERTV3IS050200
Volume & Issue : Volume 03, Issue 05 (May 2014)
Published (First Online): 14-05-2014
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

DNA Indexing & Searching Human Authentication using STR Locus based on PCR Process

Milap J. Bhuva

Department of Information Technology

G.K.B.I.E. Rajkot, India

Hiren Kamani

Assistant Professor of CE/IT Department

Rajkot, India

AbstractDNA Indexing & Search Human Authentication System works on bases of DNA profiling. It contains a DNA Data Bank which has DNA Profiles of civilization. DNA Profile stored with DNA STR Locus or DNA sample and necessity persons information. These samples are stored in database by performing algorithmic procedure, it is the major task of the system which is not stored a DNA sample in physically but it stored with digitize unique DNA signature. It also manages the DNA Data Bank and is responsible for developing, providing, and supporting this Program to federal, state, and local crime laboratories in our country and selected international law enforcement crime laboratories to foster the exchange and comparison of forensic DNA evidence from violent crime investigations. The DNA Data Bank Unit also provides administrative management and support to the police or crime investigation departments for various advisory boards, Forensic Science, Department of Justice (DOJ) grant programs, and legislation regarding DNA.

Salient features of this system to resolve

Rape case.
Murder case.
Disputed Paternity & Maternity case.
Child Swapping & Kidnapping case.
Unknown Dead bodies Identification case.

(Like Suicide, Dead body exists at any place, Mass disaster- example: Uttarakhand floods disaster, an Artificial accident- example: Bomb blast.)

Index Terms forensic science, DNA Profiling, polymerase chain reaction, mass fatality incidents, mass disaster, software, D3S1358, vWA, FGA,D8S1179, D21S11, D18S51, D5S818, D13S317, D7S820, TH01, TPOX, CSF1PO, D16S539

INTRODUCTION

The DNA Indexing & Search Human Authentication system which contains DNA Data Bank It is an effective tool for fighting violent crimes which has contains DNA samples of each and every people. Some people feel that these DNA databases are a great help to society, giving new hope to forensic cases that would otherwise have no leads. Others feel

they are an invasion of privacy, and that these databases should consist only of samples collected from violent crime offenders.

DNA Indexing & Search Human Authentication system works with 24 STR locus in which 13 STR locus commonly used in CODIS[1]. In this paper there are 24 STR loci name suggest as CSF1PO, FGA, TH01, TPOX, vWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11, D2S1338, D19S433, Penta D, Penta E, D1S1656, D2S441, D10S1248, D12S391, D22S1045, D6S1043, SE33.

Information on 24 commonly used STR Loci present in commercial kits

STR Marker	Position (MB)	Repeat Motif	Allele Range
D1S1656	230.905	TAGA	8-20.3
TPOX	1.493	AATG	4-16
D2S441	68.239	TCTA/TCAA	8-17
D2S1338	218.879	TGCC/TTCC	10-31
D3S1358	45.582	TCTA/TCTG	6-26
FGA	155.509	CTTT/TTCC	12.2-51.2
D5S818	123.111	AGAT	4-29
CSF1PO	149.455	AGAT	5-17
SE33	88.987	AGAT/AGAC	3-49
D6S1043	92.450	GATA	8-25
D7S820	83.789	TCTA/TCTG	5-16
D8S1179	125.907	GGAA	6-20
D10S1248	131.093	TCAT	7-19
TH01	2.192	TCAT	3-14
vWA	6.093	TCTA/TCTG	10-25
D12S391	12.450	AGAT/AGAC	13.27.2
D13S317	82.692	TATC	5-17
Penta E	97.374	AAAGA	5-32
D16S539	86.386	GATA	4-17
D18S51	60.949	AGAA	5.3-40
D19S433	30.416	AAGG/TAGG	5.2-20
D21S11	20.554	TCTA/TCTG	12-43.2
Penta D	45.056	AAAGA	1.1-19
D22S1045	37.536	ATT	7-20

These alleleic 24 STR Locus presents or says the unique address for the person identity which is very useful to identify the person.

Y-CHROMOSOME DNA TESTING

The Y-chromosome is found only in males and therefore genetic markers along the Y-chromosome can be specific to the male portion of a male-femaleDNA mixture such as is common in sexual assault cases. Y-chromosome markers can also be useful in missing persons investigations, some paternity testing scenarios, historical investigations, and genetic genealogy due to the fact that most of the Y-chromosome (barring mutation) is passed from father to son without changes.

A core set of Y-chromosome STR (Y-STR) loci is widely used in laboratories worldwidefor human identity testing and genetic genealogy. The minimal haplotype loci(MHL) were selected in the late 1990s from a meager set of available Y- STRs.The MHL include DYS19, DYS389I, DYS389II,

DYS390, DYS391, DYS392, DYS393,and the polymorphic, multi-copy marker DYS385[2]. In 2003, the Y- chromosomesubcommittee of the Scientific Working Group on DNA Analysis Methods (SWGDAM)recommended two additional Y-STRs named DYS438 and DYS439[3].

Characteristics of Commonly Used Y-chromosome STR Loci present in commercial kits.

STR Marker	Position (MB)	Repeat Motif	Allele Range
DYS393	3.19	AGAT	8-17
DYS456	4.33	AGAT	13-18
DYS458	7.93	GAAA	14-20
DYS19	10.13	TAGA	10-19
DYS391	12.61	TCTA	6-14
DYS635	12.89	TSTA	17-27
DYS437	12.98	TCTR	13-17
DYS439	13.03	AGAT	8-15
<>DYS389	13.12	TCTR	9-17/24-34
DYS438	13.38	TTTTC	6-14
DYS390	15.78	TCTR	17-28
GATA-H4	17.25	TAGA	8-13
DYS385	19.26	GAAA	7-28
DYS392	21.04	TAT	6-20
DYS448	22.78	AGAGAT	17-24

TRADITIONAL USE OF AVAILABLE METHODS

There are three types of DNA fingerprints: RFLPs, VNTRs, and STRs. Restriction fragment length polymorphisms, or RFLPs as they are commonly known, were the first type of DNA fingerprinting which came onto the scene in the mid- 1980s. RFLPs focus on the size differences of certain genetic locations. This technique is the DNA equivalent of screening sand through a progressively finer mesh screens to determine particle sizes[5].

Variable number tandem repeats, or VNTRs represent specific locations on a chromosome in which tandem repeats of 9-80 or more bases repeat a different number of times between

individuals. These regions of DNA are readily analyzed using the RFLP approach and a probe specific to a VNTR locus[6]. The fragments are a little shorter than RFLPs (about 1-2 kilo base pairs), but are created through the exact same process.

Some of the advantages of these types of DNAfingerprints are that they are the most stable and reproducible, which is a valuable trait tohave when you are trying to determine an exact match of a persons DNA, which mustexclude billions of other peoples DNA with a certain degree of confidence. Some of the disadvantages of RFLPs and VNTRs include they are very time consuming (especially the probe hybridization step), relatively large amounts of DNA must be used to obtain an adequate sample, too many polymorphisms may be present for a short probe, and the cost is very high due to labor and time requirements[7]. And also these types of sequence analysis or DNA profiling require fresh blood sample or evidence.

Currently, the most popular method of DNA fingerprinting is Short tandem repeats, or STRs for short. Unlike VNTRs which analyze minisatellites that have repeat sequences of 9-80 base pairs, STRs use microsatellites which have repeat sequences ofonly 2-5 base pairs, introducing the less is more philosophy to the world of DNA fingerprinting. This was a big step forward in forensic science since the length of DNA fragment being analyzed is short enough to be amplified by polymerase chain reaction (PCR)[8], so now we are able to analyze a very small sample of DNA that is quicker and easier than any previously known method and match it to a persons identity. PCR was developed in the mid 1980s and used the same principles that cells use to replicate DNA to amplify the specified region, which is usually between 150-3,000 base pairs in length. In this type of DNA profiling there is no need to require fresh blood sample. It is applying all the available evidence and also for degraded mix blood samples.

Evidence	Possible Location of DNA on the Evidence	Source of DNA
Baseball bat or similar Weapon	Handle, end	Sweat, skin, blood, tissue
Hat, bandanna, or mask	Inside	Sweat, hair, dandruff
Eyeglasses	Nose or ear pieces, lens	Sweat, skin
Facial tissue, cotton swab	Surface area	Mucus, blood, sweat, semen, ear wax
Dirty laundry	Surface area	Blood, sweat, semen
Toothpick	Tips	Saliva
Used cigarette	Cigarette butt	Saliva
Stamp or envelope	Licked area	Saliva
Tape or Ligature	Inside/outside surface	Skin, Sweat
Bottle, can, or glass	Sides, mouthpiece	Saliva, Sweat
Used Condom	Inside/outside surface	Semen, vaginal or rectal cells
Blanket, pillow, sheet	Surface area	Sweat, hair, semen, urine, Saliva
Through and throughBullet	Outside surface	Blood, tissue
Bite mark	Persons skin or clothing	Saliva
Fingernail, partial fingernail	Scrapings	Blood, sweat, tissue

Sources of information regarding DNA Samples:

STRs are currently the most popular type of DNA fingerprint, since the whole PCR process takes only a few hours, compared to RFLP/VNTR probe hybridization and film exposure which can take several days. STRs can use much smaller samples of DNA than RFLPs/VNTRs, and can even use partially degraded DNA to create a fingerprint.

Thus, the integrity and quality of the DNA sample is not as great a factor with STRs than with the traditional methods of DNA fingerprinting. Thecurrent standard forensic protocol analyses 13 core STR loci which have been carefully chosen for their uniqueness[9]. The only disadvantage of the STR approach is it is sensitive to contaminating DNA, so usually the STR approach is used first, followed by a VNTR analysis if contamination is suspected, and enough DNA is available.

There are various STR markers kits available: Applied Biosystems(Identifiler, MiniFiler, NGM, NGM SElect), Promega (PowerPlex 16, PowerPlex ESI 17, PowerPlex ESX 17, PowerPlex 18D), and Qiagen (ESSplex, IDplex) kits.[4]

PROTOTYPE

There are various algorithms are analysed for this kind of purpose. By using dynamic programming approach especially in DNA sequencing Needleman-Wunsch[10] algorithm and Smith-waterman algorithms [12] are more complex in finding exact patternmatching algorithm. By this method the worst case complexity isO(mn). The major advantage of this method is flexibility inadapting to different edit distance functions. The Raita algorithm [11] utilizes the same approach as Horspoolalgorithm[13] toobtaining the shift value after an attempt. Instead of comparingeach character in the pattern with the sliding window from right toleft, the order of comparison in Raita algorithm [11] is carried outby first comparing the rightmost and leftmost characters of thepattern with the sliding window. If they both match, the remainingcharacters are compared from the right to the left. Intuitively, theinitial resemblance can be established by comparing the last andthe first characters of the pattern and the sliding window.Therefore, it is anticipated to further decrease the unnecessarycomparisons. The Aho-Corasick[14] algorithm consists ofconstructing a finite state pattern matching machine from thekeyword and then using the machine to process the text in a singlepass. It can find an occurrence of several patterns in the order ofO(n) time, where n is the length of the text, with pre-processing ofthe patterns in linear time.

In this paper construct anew algorithm whch checks the input DNA profile to find all the occurrence of a allelic pattern within this profile, based on skip ( put _to match the patterns)[15] can be described as follows-
1. Fix the DNA Profile index in a cretin position.
2. Use this position as a starting point of matching.
3. Compare the DNA Profile contents from the defined point with pattern contents.
4. Find the skip value depending on the match number (rangesfrom 1 to y-1)
5. Perform the above sequence while the DNA Profile position does not reaches x-y.
Description ofthe Algorithm:

This algorithm assumes that there is input DNA Profile (T) that has size (x) and there is an Allelic Pattern (P) with size (y). Sothe algorithm proceeds as follows
1. Input DNA Profile (T) of size (x) and Allelic Pattern (P) of size (y).
2. Output starting index of all substring occurrence of (T) that is equal to (P) and output (-1) if no such substring exists.
3. Initialization step: skip = 1, index i of T = 1. Number of occurrence = 0.
4. Check index. If index <= x-y, then go to step 5. Else go to step 12.
5. Now set index j of T = 1, and save i if (k = i).
6. Check j. If j<=y go to step 7, else go to step 8.
7. Compare P(j) and T(k). If they are equal then increase the value of j and k by 1 and go to step 6.
8. Skip if j=skip.
9. Increase number of occurrence.
10. Now do (I +skip).
11. Go to step 4.
12. Return number of occurrence.
IMPLEMENTATION OF THIS ALGORITHM This algorithm was implemented using C and tested

usingdifferent DNA sequences and DNA Profile with different DNA Profile sizes. However this algorithm will be compared with other algorithms innear future.
RESULT ANALYSIS

Due to the lack of resources, mainly the theoretical resultsare considered here. In future experiment based results will becompared for a better understand of this algorithm. This algorithm is very simple to describe and hasfollowing main features
1. Good time complexity.
2. Unlimited sized of DNAProfile can be handled.
3. Unlimited sized of Allelic Pattern can be handled.
4. Skip technique is used.
5. Can be used different ranges of applications.
FUTURE WORK

Here on this particular paper, mainly emphasizing ontheoretical results. In future I will emphasize more onpractical results. This algorithm will be comparedwith other algorithms that has common features like
1. Multiple DNA ProfileMatching and Searching
2. Maintaining different types of DNA Profiles (Particular Case wise)
3. No processing time
  
  In near future this algorithm will be compared withother algorithms considering other factors.
CONCLUSION

A new algorithm is developed for indexing and searching mechanism for multiple DNA Profiling and Allelic Patterns.This algorithm proves performance which is bettercompared to other algorithms. This is a very light weightalgorithm. The space complexity and the time complexity ofthisalgorithm is less compared to other andalgorithms. This algorithm can work with a very long text and pattern which is a great advantage over other algorithms.

ACKNOWLEDGMENT

I would like to thank my Professor Mr.HirenKamaniand Dr.NikunjBrahmbhattfor his time. I am grateful to him for providing me support. Finally, I acknowledge my Professor Mr.HirenKamanifor his kind suggestions on every step of thiswork.

REFERENCES

J. M. Butler*, C. R. HillNational Institute of Standards and Technology, Applied Genetics Group, Gaithersburg, Maryland, United States of America
Butler JM. Forensic DNA typing: biology, technology, and genetics of STR markers. 2nd ed. Elsevier: New York, 2005.
Gill P. Role of short tandem repeat DNA in forensic casework in the UK-

-past, present, and future perspectives. Bio Techniques 2002;32:366- 72.
Budowle, B., et al. (1998). CODIS and PCR-based short tandem repeat loci: DNA Profiling tools. Proceedings of the Second European Symposium on Human Identification, pp. 73-88. Madison, Wisconsin: Promega Corporation.
People v. Miles (1991) 577 N.E. Rptr. 2d Series 477 (Ill. App. 4 Dist. 1991), RFLPs (2004)
People v. Castro (1989) 144 Misc. 2d Series 956, 545 N.Y. Supp. 2d Series 985 (Sup.Ct.1989), VNTRs.
Texas Tech University."Molecular Genetic Markers."Genetics@TTU. Fall 2005, 11thNov. 2005.
United States v. Downing (1985) United States Court of Appeals, 3rd Circuit, Vol. 753Federal Series, 2d, pp. 1224. 609 F. Supp. 784 (E.D. Pa. 1985).753 F. Rptr. 2d Series 1227-28 (3rd. Cir. 1985).
Hares, D.R. (2011). Expanding the CODIS core loci in the United States.Forensic Sci. Int. Genet.(in press), doi:10.1016/j.fsigen.2011.04.012
Needleman, S.B Wunsch, C.D(1970). A general method

applicable to the search for similarities in the amino acidsequence of two proteins. J.Mol.Biol.48,443-453.
RajuBhukya, DVLN Somayajulu,An Index Based KPartitionMultiple Pattern Matching Algorithm, Proc.Of International Conference on Advances in Computer Science2010 pp 83-87.
Smith,T.F and waterman, M (1981). Identification of commonmolecular subsequences T.mol.Biol.147,195-197.
Horspool, R.N., 1980. Practical fast searching in strings. Software practice experience, 10:501-506.
Aho, A. V., and M. J. Corasick, Efficient string matching:an aid to bibliographic Search, Communications of theACM 18 (June 1975), pp. 333 340.
Ziad A.A Alqadi, MusbahAqel&Ibrahiem M.M.EI Emary,Multiple Skip Multiple Pattern Matching algorithms.IAENGInternational.Vol 34(2),2007.

DNA Indexing & Searching Human Authentication using STR Locus based on PCR Process

Leave a Reply