Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.
SciCrunch Registry is a curated repository of scientific resources, with a focus on biomedical resources, including tools, databases, and core facilities - visit SciCrunch to register your resource.
Database of compiled, public, deep sequencing miRNA data and several novel tools to facilitate exploration of massive data. The miR-seq browser supports users to examine short read alignment with the secondary structure and read count information available in concurrent windows. Features such as sequence editing, sorting, ordering, import and export of user data are of great utility for studying iso-miRs, miRNA editing and modifications. miRNA����??target relation is essential for understanding miRNA function. Coexpression analysis of miRNA and target mRNAs, based on miRNA-seq and RNA-seq data from the same sample, is visualized in the heat-map and network views where users can investigate the inverse correlation of gene expression and target relations, compiled from various databases of predicted and validated targets.
Proper citation: miRGator (RRID:SCR_007793) Copy
http://projects.tcag.ca/humandup/
THIS RESOURCE IS NO LONGER IN SERVICE, documented on July 17, 2013. It contains information about segmental duplications in the human genome. The criteria used to identify regions of segmental duplication are: Sequence identity of at least 90, Sequence length of at least 5 kb, Not be entirely composed of repetitive elements. Background Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5 of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. Results Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90 identity. We have also detected that 38.9 Mb (1.28) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants. Conclusion Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve. The segmental duplication data and summary statistics are available for download. Data for Human Genome (based on the May 2004 Human Genome Assembly (hg17)) Visualize duplication relationships in GBrowse (GBrowse) Duplicon Pair relationships (GFF) Genes within duplication regions (HTML) Genome duplication content (MS Excel) The segmental duplication data can be visualized in a genome browser in the GBrowse section. Selected human genome annotation tracks (except the segmental duplication track) have also been obtained from UCSC and loaded into the genome browser. Detailed information (e.g. overlapping genes, overlapping clones, detailed alignment) can be obtained by clicking on a duplication cluster in GBrowse. Both keyword search and BLAT search are available. Analyses based on previous human genome assemblies can be found in the Previous Analyses section. Acknowledgments We thank The Centre for Applied Genomics at the Hospital for Sick Children (HSC) as well as collaborators worldwide. Supported by Genome Canada the Howard Hughes Medical Institute International Scholar Program (to S.W.S.) and the HSC Foundation.
Proper citation: Human Genome Segmental Duplication Database (RRID:SCR_007728) Copy
SYSTERS is a database of protein sequences grouped into homologous families and superfamilies. The SYSTERS project aims to provide a meaningful partitioning of the whole protein sequence space by a fully automatic procedure. A refined two-step algorithm assigns each protein to a family and a superfamily. The sequence data underlying SYSTERS release 4 now comprise several protein sequence databases derived from completely sequenced genomes (ENSEMBL, TAIR, SGD and GeneDB), in addition to the comprehensive Swiss-Prot/TrEMBL databases. To augment the automatically derived results, information from external databases like Pfam and Gene Ontology are added to the web server. Furthermore, users can retrieve pre-processed analyses of families like multiple alignments and phylogenetic trees. New query options comprise a batch retrieval tool for functional inference about families based on automatic keyword extraction from sequence annotations. A new access point, PhyloMatrix, allows the retrieval of phylogenetic profiles of SYSTERS families across organisms with completely sequenced genomes. Gene, Human, Vertebrate, Genome, Human ORFs
Proper citation: SYSTERS (RRID:SCR_007955) Copy
http://supfam.org/SUPERFAMILY/
SUPERFAMILY is a database of structural and functional protein annotations for all completely sequenced organisms. The SUPERFAMILY annotation is based on a collection of hidden Markov models, which represent structural protein domains at the SCOP superfamily level. A superfamily groups together domains which have an evolutionary relationship. The annotation is produced by scanning protein sequences from over 1,700 completely sequenced genomes against the hidden Markov models.
Proper citation: SUPERFAMILY (RRID:SCR_007952) Copy
It provides a database based on a pre-computed similarity matrix covering the similarity space formed by >4 million amino acid sequences from public databases and completely sequenced genomes. The database is capable of handling very large datasets and is updated incrementally. For sequence similarity searches and pairwise alignments, we implemented a grid-enabled software system, which is based on FASTA heuristics and the Smith Waterman algorithm. SimpleSIMAP and AdvancedSIMAP retrieve homologs for given protein sequences that need to be contained in the SIMAP database. While SimpleSIMAP provides only selected parameters and preconfigured search spaces, the AdvancedSIMAP allows the user to specify search space, filtering and sorting parameters in a flexible manner. Both types of queries result in lists of homologs that are linked in turn to their homologs. So the web interfaces allow users to explore quickly and interactively the protein world by homology. Sponsors: SIMAP is supported by the Department of Genome Oriented Bioinformatics of the Technische Universitt Mnchen and the Institute for Bioinformatics of the GSF-National Research Center for Environment and Health.
Proper citation: SIMAP (RRID:SCR_007927) Copy
https://github.com/vgteam/vg#vg
Software toolkit to improve read mapping by representing genetic variation in reference.Provides succinct encoding of sequences of many genomes.
Proper citation: variation graph (RRID:SCR_024369) Copy
http://projects.tcag.ca/xenodup/
THIS RESOURCE IS NO LONGER IN SERVICE, documented on July 16, 2013. It contains information about segmental duplications in the genomes of chimpanzee, mouse, and rat. The criteria used to identify regions of segmental duplication are: * Sequence identity of at least 90% * Sequence length of at least 5 kb * Not be entirely composed of repetitive elements. BACKGROUND: The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, detecting recent sequence duplications can reveal potentially problematic regions of the genome assembly. We use BLAST-based computational heuristics to identify large (>/= 5 kb) and recent (>/= 90% sequence identity) segmental duplications in the mouse genome sequence. Here we present a database of recently duplicated regions of the mouse genome found in the mouse genome sequencing consortium (MGSC) February 2002 and February 2003 assemblies. RESULTS: We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003 mouse genome sequence assembly is involved in recent segmental duplications, which is less than that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the duplication content consisted of "unmapped" chromosome sequence. Moreover, we suspect that an additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence misassignment errors in this genome assembly. By searching for genes that are located within these regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen of these genes appear to have been duplicated independently in the human genome. From our dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect gene essential for embryogenesis in mice. CONCLUSION: Our results provide an initial analysis of the recently duplicated sequence and gene content of the mouse genome. Many of these duplicated loci, as well as regions identified to be involved in potential sequence misassignment errors, will require further mapping and sequencing to achieve accuracy. A Genome Browser database was set up to display the identified duplication content presented in this work. This data will also be relevant to the growing number of investigators who use the draft genome sequence for experimental design and analysis. The segmental duplication data and summary statistics are available for download and can also be visualized in a genome browser in the GBrowse section. Selected annotation tracks (except the segmental duplication track) have also been obtained from UCSC and loaded into the genome browser. Detailed information (e.g. overlapping genes, overlapping clones, detailed alignment) can be obtained by clicking on a duplication cluster in GBrowse. Both keyword search and BLAT search are available. Analyses based on previous genome assemblies can be found in the Previous Analyses section. Recent Developments The Non-Human Genome Segmental Duplication Database is continually updated including the archived copies of the analysis of all previous genome assemblies and will include all new species as they become available. Acknowledgments We thank The Centre for Applied Genomics at the Hospital for Sick Children (HSC) as well as collaborators worldwide. Supported by Genome Canada the Howard Hughes Medical Institute International Scholar Program (to S.W.S.) and the HSC Foundation.
Proper citation: Non-Human Genome Segmental Duplication Database (RRID:SCR_000470) Copy
Database and integrated tools to improve annotation of the bovine genome and to integrate the genome sequence with other genomics data.
Proper citation: Bovine Genome Database (RRID:SCR_000148) Copy
http://www.cbs.dtu.dk/services/gwBrowser/
An interactive web application for visualizing genomic data of sequenced prokaryotic chromosomes. It allows users to carry out various analyses such as mapping alignments of homologous genes to other genomes, mapping of short sequencing reads to a reference chromosome, and calculating DNA properties such as curvature or stacking energy along the chromosome. The GeneWiz browser produces an interactive graphic that enables zooming from a global scale down to single nucleotides, without changing the size of the plot. Its ability to disproportionally zoom provides optimal readability and increased functionality compared to other browsers. The tool allows the user to select the display of various genomic features, color setting and data ranges. Custom numerical data can be added to the plot allowing, for example, visualization of gene expression and regulation data. Further, standard atlases are pre-generated for all prokaryotic genomes available in GenBank, providing a fast overview of all available genomes, including recently deposited genome sequences.
Proper citation: GeneWiz browser (RRID:SCR_001454) Copy
http://www.worm.mpi-cbg.de/phenobank/cgi-bin/ProjectInfoPage.py
A database that provides primary data from two high-content screens that profile the set of ~900 essential C. elegans genes (~5% of the genome) required for embryo production and/or events during the first two embryonic divisions. Phenobank houses the movies, scored defects, and phenotypic classification data for the embryo-filming and gonad morphology screens.
Proper citation: PhenoBank (RRID:SCR_000930) Copy
Database and browser that provides a central resource to archive and display association between genetic variation and high-throughput molecular-level phenotypes. This effort originated with the NIH GTEx roadmap project: however the scope of this resource will be extended to include any available genotype/molecular phenotype datasets.
Proper citation: GTEx eQTL Browser (RRID:SCR_001618) Copy
https://fungi.ensembl.org/Neurospora_crassa/Info/Index
It's strategy involves Whole Genome Shotgun (WGS) sequencing, in which sequence from the entire genome is generated and reassembled. This method is standard for microbial genome sequencing, and has been successfully applied to Drosophila. Neurospora is an ideal candidate for this approach because of the low repeat content of the genome. Neurospora crassa Database has expanded the scope of its database by including a mitochondrial annotation, incorporating information from the Neurospora compendium, and assigning NCU numbers to tRNA and rRNAs. They have improved the annotation process to predict untranslated regions and to reduce the number of spurious predictions. As a result, version 3 contains 9,826 genes, 794 fewer than version 2. During the initial phase of a WGS project they sequence both ends of the 4 kb inserts from a plasmid library prepared using randomly sheared and sized-selected DNA. The shotgun reads are assembled by recognizing overlapping regions of sequence and making use of the knowledge of the orientation and distance of the paired reads from each plasmid. Obtaining deep sequence coverage though high levels of sequence redundancy assures that the majority of the genome is represented in the initial assembly and that the consensus sequence is of high quality. Their approach toward the initial assembly was conservative, meaning they would rather fail to join sequence contigs that might overlap each other than risk making false joins between two closely related but non-overlapping genomic regions. Hence, the initial assembly contains many sequence contigs and over time these contigs will increase in size and decrease in number as they are joined together. After shotgun sequencing and assembly there was a second phase of sequencing in which additional sequence was obtained from specific regions that were missing from the original assembly or are recognized to be of low quality in the consensus. The Neurospora crassa sequencing project reflects a close collaboration between the Broad Institute and the Neurospora research community. Principal investigators include Bruce Birren and Chad Nusbaum from the Broad Institute, Matt Sachs at the Oregon Graduate Institute of Science and Technology, Chuck Staben at the University of Kentucky and Jak Kinsey at the Fungal Genetics Stock Center at the University of Kansas Medical Center. In addition, we have a larger Advisory Board made up of a number of Neurospora researchers. Sponsors: They have been funded by the National Science Foundation to sequence the N. crassa genome and make the information publicly available.
Proper citation: Neurospora crassa Database (RRID:SCR_001372) Copy
A web-based genome analysis platform that integrates proprietary functional genomic data, metabolic reconstructions, expression profiling, and biochemical and microbiological data with publicly available information. Focused on microbial genomics, it provides better and faster identification of gene function across all organisms. Building upon a comprehensive genomic database integrated with a collection of microbial metabolic and non-metabolic pathways and using proprietary algorithms, it assigns functions to genes, integrates genes into pathways, and identifies previously unknown or mischaracterized genes, cryptic pathways and gene products. . * Automated and manual annotation of genes and genomes * Analysis of metabolic and non-metabolic pathways to understand organism physiology * Comparison of multiple genomes to identify shared and unique features and SNPs * Functional analysis of gene expression microarray data * Data-mining for target gene discovery * In silico metabolic engineering and strain improvement
Proper citation: ERGO (RRID:SCR_001243) Copy
Hi. I''m genegeek (aka Catherine Anderson). I realized during my PostDoc that I preferred learning and explaining new results to doing science so I started a non-traditional career of teaching and outreach. I''ll be using this space to explore public perception of genetics and other cool molecular biology stuff. I hope to add to the great discussions re: new science discoveries and general understanding of genetics. I''ve been running an outreach program and enjoy talking to non-experts about their opinions and understanding. I hope my enthusiasm for the topics can come through the screen. My posts are presented as opinion and commentary and do not represent the views of LabSpaces Productions, LLC, my employer, or my educational institution.
Proper citation: Daring Nucleic Adventures - genegeek (RRID:SCR_005215) Copy
Database of the international consortium working together to mutate all protein-coding genes in the mouse using a combination of gene trapping and gene targeting in C57BL/6 mouse embryonic stem (ES) cells. Detailed information on targeted genes is available. The IKMC includes the following programs: * Knockout Mouse Project (KOMP) (USA) ** CSD, a collaborative team at the Children''''s Hospital Oakland Research Institute (CHORI), the Wellcome Trust Sanger Institute and the University of California at Davis School of Veterinary Medicine , led by Pieter deJong, Ph.D., CHORI, along with K. C. Kent Lloyd, D.V.M., Ph.D., UC Davis; and Allan Bradley, Ph.D. FRS, and William Skarnes, Ph.D., at the Wellcome Trust Sanger Institute. ** Regeneron, a team at the VelociGene division of Regeneron Pharmaceuticals, Inc., led by David Valenzuela, Ph.D. and George D. Yancopoulos, M.D., Ph.D. * European Conditional Mouse Mutagenesis Program (EUCOMM) (Europe) * North American Conditional Mouse Mutagenesis Project (NorCOMM) (Canada) * Texas A&M Institute for Genomic Medicine (TIGM) (USA) Products (vectors, mice, ES cell lines) may be ordered from the above programs.
Proper citation: International Knockout Mouse Consortium (RRID:SCR_005574) Copy
http://swissregulon.unibas.ch/fcgi/sr/swissregulon
A database of genome-wide annotations of regulatory sites. The predictions are based on Bayesian probabilistic analysis of a combination of input information including: * Experimentally determined binding sites reported in the literature. * Known sequence-specificities of transcription factors. * ChIP-chip and ChIP-seq data. * Alignments of orthologous non-coding regions. Predictions were made using the PhyloGibbs, MotEvo, IRUS and ISMARA algorithms developed in their group, depending on the data available for each organism. Annotations can be viewed in a Gbrowse genome browser and can also be downloaded in flat file format.
Proper citation: SwissRegulon (RRID:SCR_005333) Copy
A publicly available database of Transposed elements (TEs) which are located within protein-coding genes of 7 organisms: human, mouse, chicken, zebrafish, fruilt fly, nematode and sea squirt. Using TranspoGene the user can learn about the many aspects of the effect these TEs have on their hosting genes, such as: exonization events (including alternative splicing-related data), insertion of TEs into introns, exons, and promoters, specific location of the TE over the gene, evolutionary divergence of the TE from its consensus sequence and involvement in diseases. TranspoGene database is quickly searchable through its website, enables many kinds of searches and is available for download. TranspoGene contains information regarding specific type and family of the TEs, genomic and mRNA location, sequence, supporting transcript accession and alignment to the TE consensus sequence. The database also contains host gene specific data: gene name, genomic location, Swiss-Prot and RefSeq accessions, diseases associated with the gene and splicing pattern. The TranspoGene and microTranspoGene databases can be used by researchers interested in the effect of TE insertion on the eukaryotic transcriptome.
Proper citation: TranspoGene (RRID:SCR_005634) Copy
A next-generation web-based application that aims to provide an integrated solution for both visualization and analysis of deep-sequencing data, along with simple access to public datasets.
Proper citation: Systems Transcriptional Activity Reconstruction (RRID:SCR_005622) Copy
A knowledgebase of Biochemically, Genetically and Genomically structured genome-scale metabolic network reconstructions. BiGG integrates several published genome-scale metabolic networks into one resource with standard nomenclature which allows components to be compared across different organisms. BiGG can be used to browse model content, visualize metabolic pathway maps, and export SBML files of the models for further analysis by external software packages. Users may follow links from BiGG to several external databases to obtain additional information on genes, proteins, reactions, metabolites and citations of interest.
Proper citation: BiGG Database (RRID:SCR_005809) Copy
http://h-invitational.jp/varygene/
It consists of a Genome Browser, an LD Search System, and the VaryGene 2 system. The Generic Genome Browser is a combination of database and interactive Web page for manipulating and displaying annotations on genomes, while LDSearchSystem is a search system for linkage disequilibrium (LD) bins. VaryGene 2 is a system to search, display, and download our research results on human polymorphism based on publicly available data and annotations of transcripts presented by H-InvDB. VaryGene 2 provides information about single nucleotide polymorphisms (SNPs), deletion-insertion polymorphisms (DIPs), short tandem repeats (STRs), single amino acid repeats (SARs), structural variation (or copy number variations: CNVs), and their relations to the genome, transcripts, and functional domains. Users can search by polymorphisms, transcripts, STRs/SARs, and CNVs.
Proper citation: VarySysDB (RRID:SCR_005880) Copy
Can't find your Tool?
We recommend that you click next to the search bar to check some helpful tips on searches and refine your search firstly. Alternatively, please register your tool with the SciCrunch Registry by adding a little information to a web form, logging in will enable users to create a provisional RRID, but it not required to submit.
Welcome to the NIF Resources search. From here you can search through a compilation of resources used by NIF and see how data is organized within our community.
You are currently on the Community Resources tab looking through categories and sources that NIF has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.
If you have an account on NIF then you can log in from here to get additional features in NIF such as Collections, Saved Searches, and managing Resources.
Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:
You can save any searches you perform for quick access to later from here.
We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.
If you are logged into NIF you can add data records to your collections to create custom spreadsheets across multiple sources of data.
Here are the sources that were queried against in your search that you can investigate further.
Here are the categories present within NIF that you can filter your data on
Here are the subcategories present within this category that you can filter your data on
If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.