Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.
SciCrunch Registry is a curated repository of scientific resources, with a focus on biomedical resources, including tools, databases, and core facilities - visit SciCrunch to register your resource.
THIS RESOURCE IS NO LONGER IN SERVICE, documented on August 20,2019.The COG-database has become a powerful tool in the field of comparative genomics. The construction of this data-base is based on sequence homologies of proteins from different completely sequenced genomes. Highly homologous proteins are assigned to clusters of orthologous groups. The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies. The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. Here is a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or approximately 54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of approximately 20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (approximately 1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes.
Proper citation: Phylogenetic Clusters of Orthologous Groups Ranking (RRID:SCR_008223) Copy
http://www.nisc.nih.gov/projects/comp_seq.html
Generates data for use in developing and refining computational tools for comparing genomic sequence from multiple species. The NISC Comparative Sequencing Program's goal is to establish a data resource consisting of sequences for the same set of targeted genomic regions derived from multiple animal species. The broader program includes plans for a diverse set of analytical studies using the generated sequence and the publication of a series of papers describing the results of those analysis in peer-reviewed journals in a timely fashion. Experimentally, this project involves the shotgun sequencing of mapped BAC clones. For each BAC, an assembly is first performed when a sufficient number of sequence reads have been generated to provide full shotgun coverage of the clone. At that time, the assembled sequence is submitted to the HTGS division of GenBank. Subsequent refinements of the sequence, including the generation of higher-accuracy finished sequence, results in the updating of the sequence record in GenBank. By immediately submitting our BAC-derived sequences to GenBank, it makes their data available as a public service to allow colleagues to speed up their research, consistent with the now well-established routine of sequencing centers participating in the Human Genome Project. However, at the same time, it has made considerable investment in acquiring these mapping and sequence data, including sizable efforts of graduate students, postdoctoral fellows, and other trainees. Furthermore, in most cases, large data sets involving multiple BAC sequences from multiple species must first be generated, often taking many months to accumulate, before the planned analysis can be performed and the resulting papers written and submitted for publication.
Proper citation: Comparative Vertebrate Sequencing (RRID:SCR_008213) Copy
The aim of the PEROXISOME database (PeroxisomeDB) is to gather, organize and integrate curated information on peroxisomal genes, their encoded proteins, their molecular function and metabolic pathway they belong to, and their related disorders. PeroxisomeDB contains the complete peroxisomal proteome of Homo sapiens (encoded by 85 genes) and Saccharomyces cerevisiae (encoded by 61 genes). Now, we have included 34 new organism genomes with the acquisition of 2426 new peroxisomal homolog proteins. PeroxisomeDB 2.0 integrates the peroxisomal metabolome of whole microbody family by the new incorporation of the glycosome proteomes of trypanosomatids and the glyoxysome proteome of Arabidopsis thaliana. The site also provides a Peroxisome Metabolome of peroxisomal genes and proteins, their molecular interactions and metabolic pathways, tools for comparative genomics, predictive tools. Sponsors: Preoxisome Database is funded by Institut de Gntique et deBiologie Molculaire et Cellulaire.
Proper citation: Peroxisome Database (RRID:SCR_008352) Copy
http://cgap.nci.nih.gov/Chromosomes/Mitelman
The web site includes genomic data for humans and mice, including transcript sequence, gene expression patterns, single-nucleotide polymorphisms, clone resources, and cytogenetic information. Descriptions of the methods and reagents used in deriving the CGAP datasets are also provided. An extensive suite of informatics tools facilitates queries and analysis of the CGAP data by the community. One of the newest features of the CGAP web site is an electronic version of the Mitelman Database of Chromosome Aberrations in Cancer. The data in the Mitelman Database is manually culled from the literature and subsequently organized into three distinct sub-databases, as follows: -The sub-database of cases contains the data that relates chromosomal aberrations to specific tumor characteristics in individual patient cases. It can be searched using either the Cases Quick Searcher or the Cases Full Searcher. -The sub-database of molecular biology and clinical associations contains no data from individual patient cases. Instead, the data is pulled from studies with distinct information about: -Molecular biology associations that relate chromosomal aberrations and tumor histologies to genomic sequence data, typically genes rearranged as a consequence of structural chromosome changes. -Clinical associations that relate chromosomal aberrations and/or gene rearrangements and tumor histologies to clinical variables, such as prognosis, tumor grade, and patient characteristics. It can be searched using the Molecular Biology and Clinical (MBC) Associations Searcher -The reference sub-database contains all the references culled from the literature i.e., the sum of the references from the cases and the molecular biology and clinical associations. It can be searched using the Reference Searcher. CGAP has developed six web search tools to help you analyze the information within the Mitelman Database: -The Cases Quick Searcher allows you to query the individual patient cases using the four major fields: aberration, breakpoint, morphology, and topography. -The Cases Full Searcher permits a more detailed search of the same individual patient cases as above, by including more cytogenetic field choices and adding search fields for patient characteristics and references. -The Molecular Biology Associations Searcher does not search any of the individual patient cases. It searches studies pertaining to gene rearrangements as a consequence of cytogenetic aberrations. -The Clinical Associations Searcher does not search any of the individual patient cases. It searches studies pertaining to clinical associations of cytogenetic aberrations and/or gene rearrangements. -The Recurrent Chromosome Aberrations Searcher provides a way to search for structural and numerical abnormalities that are recurrent, i.e., present in two or more cases with the same morphology and topography. -The Reference Searcher queries only the references themselves, i.e., the references from the individual cases and the molecular biology and clinical associations. Sponsors: This database is sponsored by the University of Lund, Sweden and have support from the Swedish Cancer Society and the Swedish Children''s Cancer Foundation
Proper citation: Mitelman Database of Chromosome Aberrations in Cancer (RRID:SCR_012877) Copy
http://www.informatics.jax.org/
Community model organism database for laboratory mouse and authoritative source for phenotype and functional annotations of mouse genes. MGD includes complete catalog of mouse genes and genome features with integrated access to genetic, genomic and phenotypic information, all serving to further the use of the mouse as a model system for studying human biology and disease. MGD is a major component of the Mouse Genome Informatics.Contains standardized descriptions of mouse phenotypes, associations between mouse models and human genetic diseases, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information. Data are obtained and integrated via manual curation of the biomedical literature, direct contributions from individual investigators and downloads from major informatics resource centers. MGD collaborates with the bioinformatics community on the development and use of biomedical ontologies such as the Gene Ontology (GO) and the Mammalian Phenotype (MP) Ontology.
Proper citation: Mouse Genome Database (RRID:SCR_012953) Copy
Database of human genes that provides concise genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. Information featured in GeneCards includes orthologies, disease relationships, mutations and SNPs, gene expression, gene function, pathways, protein-protein interactions, related drugs and compounds and direct links to cutting edge research reagents and tools such as antibodies, recombinant proteins, clones, expression assays and RNAi reagents.
Proper citation: GeneCards (RRID:SCR_002773) Copy
Model organism database that serves as central repository and web-based resource for zebrafish genetic, genomic, phenotypic and developmental data. Data represented are derived from three primary sources: curation of zebrafish publications, individual research laboratories and collaborations with bioinformatics organizations. Data formats include text, images and graphical representations.Serves as primary community database resource for laboratory use of zebrafish. Developed and supports integrated zebrafish genetic, genomic, developmental and physiological information and link this information extensively to corresponding data in other model organism and human databases.
Proper citation: Zebrafish Information Network (ZFIN) (RRID:SCR_002560) Copy
ooTFD (object-oriented Transcription Factors Database) is a successor to TFD, the original Transcription Factors Database. This database is aimed at capturing information regarding the polypeptide interactions which comprise and define the properties of transcription factors. ooTFD contains information about transcription factor binding sites, as well as composite relationships within transcription factors, which frequently occur as multisubunit proteins that form a complex interface to cellular processes outside the transcription machinery through protein-protein interactions. ooTFD contains information represented in TFD but also allows the representation of containment, composite, and interaction relationships between transcription factor polypeptides. It is designed to represent information about all transcription factors, both eukaryotic and prokaryotic, basal as well as regulatory factors, and multiprotein complexes as well as monomers.
Proper citation: object-oriented Transcription Factors Database (RRID:SCR_002435) Copy
THIS RESOURCE IS NO LONGER IN SERVICE, documented August 23, 2016. ELISA is an online database that combines functional annotation with structure and sequence homology modeling to place proteins into sequence-structure-function neighborhoods. The atomic unit of the database is a set of sequences and structural templates that those sequences encode. A graph that is built from the structural comparison of these templates is called PDUG (protein domain universe graph). It introduces a method of functional inference through a probabilistic calculation done on an arbitrary set of PDUG nodes. Further, all PDUG structures are mapped onto all fully sequenced proteomes allowing an easy interface for evolutionary analysis and research into comparative proteomics. ELISA is the first database with applicability to evolutionary structural genomics explicitly in mind.
Proper citation: Evolutionary Lineage Inferred from Structural Analysis (RRID:SCR_002343) Copy
DoTS (Database Of Transcribed Sequences) is a human and mouse transcript index created from all publicly available transcript sequences. The input sequences are clustered and assembled to form the DoTS Consensus Transcripts that comprise the index. These transcripts are assigned stable identifiers of the form DT.123456 (and are often referred to as dots). The transcripts are in turn clustered to form putative DoTS Genes. These are assigned stable identifiers of the form DG.1234356. As of September 1, 2004, the DoTS annotation team has manually annotated 43,164 human and 78,054 mouse DoTS Transcripts (DTs), corresponding to 3,939 human and 7,752 mouse DoTS Genes (DGs). Use the manually annotated gene query to see the DoTS Transcripts that have been manually annotated. The focus of the DoTS project is integrating the various types of data (e.g., EST sequences, genomic sequence, expression data, functional annotation) in a structured manner which facilitates sophisticated queries that are otherwise not easy to perform. DoTS is built on the GUS Platform which includes a relational database that uses controlled vocabularies and ontologies to ensure that biologically meaningful queries can be posed in a uniform fashion. An easy way to start using the site is to search for DoTS Transcripts using an existing cDNA or mRNA sequence. Click on the BLAST tab at the top of the page and enter your sequence in the form provided. All the transcripts with significant sequence similarity to your query sequence will be displayed. Or use one of the provided queries to retrieve transcripts using a number of criteria. These queries are listed on the query page, which can also be reached by clicking on the tab marked query at the top of the page. Finally, the boolean query page allows these queries to be combined in a variety of ways. Sponsors: Funding provided by -NIH grant RO1-HG-01539-03 -DOE grant DE-FG02-00ER62893
Proper citation: Database of Transcribed Sequences (RRID:SCR_002334) Copy
http://mga.bionet.nsc.ru/soft/maia-1.0/
Software package of programs for complex segregation analysis in animal pedigrees.
Proper citation: MAIA (RRID:SCR_007153) Copy
The HumanCyc database describes human metabolic pathways and the human genome. By presenting metabolic pathways as an organizing framework for the human genome, HumanCyc provides the user with an extended dimension for functional analysis of Homo sapiens at the genomic level. A computational pathway analysis of the human genome assigned human enzymes to predicted metabolic pathways. Pathway assignments place genes in their larger biological context, and are a necessary step toward quantitative modeling of metabolism. HumanCyc contains the complete genome sequence of Homo sapiens, as presented in Build 31. Data on the human genome from Ensembl, LocusLink and GenBank were carefully merged to create a minimally redundant human gene set to serve as an input to SRI''s PathoLogic software, which generated the database and predicted Homo sapiens metabolic pathways from functional information contained in the genome''s annotation. SRI did not re-annotate the genome, but worked with the gene function assignments in Ensembl, LocusLink, and GenBank. The resulting pathway/genome database (PGDB) includes information on 28,783 genes, their products and the metabolic reactions and pathways they catalyze. Also included are many links to other databases and publications. The Pathway Tools software/database bundle includes HumanCyc and the Pathway Tools software suite and is available under license. This form of HumanCyc is faster and more powerful than the Web version.
Proper citation: HumanCyc: Encyclopedia of Homo sapiens Genes and Metabolism (RRID:SCR_007050) Copy
http://goblet.molgen.mpg.de/cgi-bin/goblet2008/goblet.cgi
Tool that performs annotation based on GO and pathway terms for anonymous cDNA or protein sequences. It uses the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches. The sensitivity and the reference protein sets can be selected by the user. GOblet runs automatically and is available as a public service on our web server. GOblet expects query sequences to be in FASTA-Format (with header-lines). Protein and nucleotide sequences are accepted. Total size of all sequences submitted per request should not be larger than 50kb currently. For security reasons: Larger post's will be rejected. Due to limited capacities the queries may be processed in batches depending on the server load. The output of the BLAST job is filtered automatically and the relevant hits are displayed. In addition, the respective GO-terms are shown together with the complete GO-hierarchy of parent terms., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
Proper citation: GOblet (RRID:SCR_006998) Copy
http://deconseq.sourceforge.net/
Software tool to automatically detect and efficiently remove sequence contaminations from genomic and metagenomic datasets. It is easily configurable and provides a user-friendly interface. The user can upload FASTA or FASTQ files and select the databases used for contamination screening, including seven human genomes, bacterial genomes, and viral genomes. The user can set the thresholds interactivly and see the results directly using the functionality of the graphical interface. The results can be downloaded in joined or separated files in different formats. The coverage-identity plots provide additional information that can guide the selections of the thresholds using color coded points and connecting lines.
Proper citation: DeconSeq (RRID:SCR_007006) Copy
Comprehensive catalogue of animal genome size data. Haploid DNA contents (C-values, in picograms) are available for 4972 species (3231 vertebrates and 1741 non-vertebrates) based on 6518 records from 669 published sources. Data may be submitted directly to the database or reprints and notifications of new papers may be sent to database curation staff.
Proper citation: Animal Genome Size Database (RRID:SCR_007551) Copy
http://www.geisha.arizona.edu/geisha/
Online repository for chicken in situ hybridization information. This site presents whole mount in situ hybridization images and corresponding probe and genomic information for genes expressed in chicken embryos in Hamburger Hamilton stages 1-25 (0.5-5 days). The GEISHA project began in 1998 to investigate using high throughput whole mount in situ hybridization to identify novel, differentially expressed genes in chicken embryos. An initial expression screen of approximately 900 genes demonstrated feasibility of the approach, and also highlighted the need for a centralized repository of in situ hybridization expression data. Objectives: The goals of the GEISHA project are to obtain whole mount in situ hybridization expression information for all differentially expressed genes in the chicken embryo between HH stages 1-25, to integrate expression data with the chicken genome browsers, and to offer this information through a user-friendly graphical user interface. In situ hybridization images are obtained from three sources: 1. In house high throughput in situ hybridization screening: cDNAs obtained from several embryonic cDNA libraries or from EST repositories are screened for expression using high throughput in situ hybridization approaches. 2. Literature curation: Agreements with journals permit posting of published in situ hybridization images and related information on the GEISHA site. 3. Unpublished in situ hybridization information from other laboratories: laboratories generally publish only a small fraction of their in situ hybridization data. High quality images for which probe identity can be verified are welcome additions to GEISHA.
Proper citation: GEISHA - Gallus Expression in Situ Hybridization Analysis: A Chicken Embryo Gene Expression Database (RRID:SCR_007440) Copy
http://claire.bardel.free.fr/software.html
Software package to perform phylogeny based association and localization analysis.Used for association detection and localization of susceptibility sites using haplotype phylogenetic trees. Performs these two phylogeny-based analysis: tests association between candidate gene and disease; pinpoints markers (SNPs) that are putative disease susceptibility loci.
Proper citation: ALTree (RRID:SCR_007562) Copy
MitoRes, is a comprehensive and reliable resource for massive extraction of sequences and sub-sequences of nuclear genes and encoded products targeting mitochondria in metazoa. It has been developed for supporting high-throughput in-silico analyses aimed to studies of functional genomics related to mitochondrial biogenesis, metabolism and to their pathological dysfunctions. It integrates information from the most accredited world-wide databases to bring together gene, transcript and encoded protein sequences associated to annotations on species name and taxonomic classification, gene name, functional product, organelle localization, protein tissue specificity, Enzyme Classification (EC), Gene Ontology (GO) classification and links to other related public databases. The section Cluster, has been dedicated to the collection of data on protein clustering of the entire catalogue of MitoRes protein sequences based on all versus all global pair-wise alignments for assessing putative intra- and inter-species functional relationships. The current version of MitoRes is based on the UniProt release 4 and contains 64 different metazoan species. The incredible explosion of knowledge production in Biology in the past two decades has created a critical need for bioinformatic instruments able to manage data and facilitate their retrieval and analysis. Hundreds of biological databases have been produced and the integration of biological data from these different resources is very important when we want to focus our efforts towards the study of a particular layer of biological knowledge. MitoRes is a completely rebuilt edition of MitoNuc database, which has been extensively modified to deal successfully with the challenges of the post genomic era. Its goal is to represent a comprehensive and reliable resource supporting high-quality in-silico analyses aimed to the functional characterization of gene, transcript and amino acid sequences, encoded by the nuclear genome and involved in mitochondrial biogenesis, metabolism and pathological dysfunctions in metazoa. The central features of MitoRes are: # an integrated catalogue of protein, transcript and gene sequences and sub-sequences # a Web-based application composed of a wide spectrum of search/retrieval facilities # a sequence export manager allowing massive extraction of bio-sequences (genes, introns, exons, gene flanking regions, transcripts, UTRs, CDS, proteins and signal peptides) in FASTA, EMBL and GenBank formats. It is an interconnected knowledge management system based on a MySQL relational database, which ensures data consistency and integrity, and on a Web Graphical User Interface (GUI), built in Seagull PHP Framework, offering a wide range of search and sequence extraction facilities. The database is compiled extracting and integrating information from public resources and data generated by the MitoRes team. The MitoRes database consists of comprehensive sequence entries whose core data are protein, transcript and gene sequences and taxonomic information describing the biological source of the protein. Additional information include: bio-sequences structure and location, biological function of protein product and dynamic links to both, external public databases used as data resources and public databases reporting complementary information. The core entity of the MitoRes database is represented by the protein so that each MitoRes entry is generated for each protein reported in the UniProt database as a nuclear encoded protein involved in mitochondrial biogenesis and function. Sponsors: MitoRes has been supported by Ministero Universit e Ricerca Scientifica, Italy (PRIN, Programma Biotecnologie legge 95/95-MURST 5, Proiect MURST Cluster C03/2000, CEGBA). Currently it is supported by operating grants from the Ministero dellIstruzione, dellUniversit e della Ricerca (MIUR), Italy (PNR 2001-2003 (FIRB art.8) D.M. 199, Strategic Program: Post-genome, grant 31-063933 and Project n.2, Cluster C03 L. 488/929).
Proper citation: MitoRes (RRID:SCR_008208) Copy
http://www.ebi.ac.uk/parasites/parasite-genome.html
This website contains information about the genomic sequence of parasites. It also contains multiple search engines to search six frame translations of parasite nucleotide databases for motifs, parasite protein databases for motifs, and parasite protein databases for keywords and text terms. * Guide to Internet Access to Parasite Genome Information * Guide to web-based analysis tools * Parasite Genome BLAST Server: Search a range of parasite specific nucleotide sequence databases with your own sequence. * Parasite Proteome Keyword Search Facility: Search parasite protein databases for keywords and text terms * Parasite Proteome Motif Search Facility: Search parasite protein databases for motifs * Parasite Six Frame Translation Motif Search Facility: Search six frame translations of parasite nucleotide databases for motifs * Genome computing resources: A list of ftp and gopher sites where genome computing applications and other resources can be found.
Proper citation: Parasite genome databases and genome research resources (RRID:SCR_008150) Copy
The University of California Davis Center for Comparative Medicine (CCM) is a cooperative, interdisciplinary research and teaching center that is co-sponsored by the School of Medicine and the School of Veterinary Medicine. CCM Faculty members have academic appointments in one or both Schools. The CCM Research Mission is to investigate the pathogenesis of human and animal disease, using animal models or naturally occurring animal diseases. Areas of emphasis include host-agent interactions during infectious disease, intervention and prevention strategies for infectious diseases, cancer, and mouse biology. CCM faculty contribute a broad range of expertise to these areas, including the disciplines of immunology, genomics, pathology, biochemistry, physiology, microbiology, molecular virology, and informatics. Through its robust and interdisciplinary research programs, the CCM provides a rich academic environment for teaching at the professional, graduate, and post-graduate levels within the School of Medicine and School of Veterinary Medicine. Opportunities are available for professional students from both schools to gain research experience. PhD candidates can pursue training opportunities in the CCMs faculty-sponsored research laboratories, with support from a number of training grants. This diverse research environment is intended to attract and train high-quality candidates to the disciplines of comparative medicine, independent and collaborative research, and mouse biology. Sponsors: CCM is supported by UC Davis.
Proper citation: University of California Davis Center for Comparative Medicine (RRID:SCR_008294) Copy
Can't find your Tool?
We recommend that you click next to the search bar to check some helpful tips on searches and refine your search firstly. Alternatively, please register your tool with the SciCrunch Registry by adding a little information to a web form, logging in will enable users to create a provisional RRID, but it not required to submit.
Welcome to the NIF Resources search. From here you can search through a compilation of resources used by NIF and see how data is organized within our community.
You are currently on the Community Resources tab looking through categories and sources that NIF has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.
If you have an account on NIF then you can log in from here to get additional features in NIF such as Collections, Saved Searches, and managing Resources.
Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:
You can save any searches you perform for quick access to later from here.
We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.
If you are logged into NIF you can add data records to your collections to create custom spreadsheets across multiple sources of data.
Here are the sources that were queried against in your search that you can investigate further.
Here are the categories present within NIF that you can filter your data on
Here are the subcategories present within this category that you can filter your data on
If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.