Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.
SciCrunch Registry is a curated repository of scientific resources, with a focus on biomedical resources, including tools, databases, and core facilities - visit SciCrunch to register your resource.
http://weizhong-lab.ucsd.edu/cd-hit/
THIS RESOURCE IS NO LONGER IN SERVICE. Documented on February 28,2023. Software program for clustering biological sequences with many applications in various fields such as making non-redundant databases, finding duplicates, identifying protein families, filtering sequence errors and improving sequence assembly etc. It is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset. The CD-HIT package has CD-HIT, CD-HIT-2D, CD-HIT-EST, CD-HIT-EST-2D, CD-HIT-454, CD-HIT-PARA, PSI-CD-HIT, CD-HIT-OTU and over a dozen scripts. * CD-HIT (CD-HIT-EST) clusters similar proteins (DNAs) into clusters that meet a user-defined similarity threshold. * CD-HIT-2D (CD-HIT-EST-2D) compares 2 datasets and identifies the sequences in db2 that are similar to db1 above a threshold. * CD-HIT-454 identifies natural and artificial duplicates from pyrosequencing reads. * CD-HIT-OTU cluster rRNA tags into OTUs The usage of other programs and scripts can be found in CD-HIT user''s guide. CD-HIT was originally developed by Dr. Weizhong Li at Dr. Adam Godzik''s Lab at the Burnham Institute (now Sanford-Burnham Medical Research Institute)., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
Proper citation: CD-HIT (RRID:SCR_007105) Copy
This database provides a platform to query and compare gene expression data during the development of the major model animals (zebrafish, drosophila, medaka, mouse). The name 4DXpress stands for expression database in 4D. The 4D (four dimensions) of 4DXpress can be interpreted either as: 3 spatial dimensions plus time, or as 1. species 2. gene 3. developmental stage 4. anatomical structure. The major focus of this database lies in cross species comparison. The high resolution expression data was acquired through whole mount in situ hybridsation-, antibody- or transgenic experiments. Data was integrated from several species specific expression pattern databases, such as ZFIN, BDGP, GXD, MEPD as well as directly submitted by researchers of the participating groups at EMBL. The 4DXpress database is a project within the Centre for Computational Biology at EMBL. It is developed by Yannick Haudry, Thorsten Henrich and Ivica Letunic and coordinated by Thorsten Henrich. Hugo Berube is developing the 4D ArrayExpress Data Warehouse at EBI for integrating in situ data with microarray data.
Proper citation: Expression Database in 4D (RRID:SCR_007066) Copy
Database containing the DNA sequence and annotation of the entire human chromosome 7, encompassing nearly 158 million nucleotides of DNA and 1917 gene structures, are presented; the most up to date collation of sequence, gene, and other annotations from all databases (eg. Celera published, NCBI, Ensembl, RIKEN, UCSC) as well as unpublished data. To generate a higher order description, additional structural features such as imprinted genes, fragile sites, and segmental duplications were integrated at the level of the DNA sequence with medical genetic data, including 440 chromosome rearrangement breakpoints associated with disease. The objective of this project is to generate a comprehensive description of human chromosome 7 to facilitate biological discovery, disease gene research and medical genetic applications. There are over 360 disease-associated genes or loci on chromosome 7. A major challenge ahead will be to represent chromosome alterations, variants, and polymorphisms and their related phenotypes (or lack thereof), in an accessible way. In addition to being a primary data source, this site serves as a weighing station for testing community ideas and information to produce highly curated data to be submitted to other databases such as NCBI, Ensembl, and UCSC. Therefore, any useful data submitted will be curated and shown in this database. All Chromosome 7 genomic clones (cosmids, BACs, YACs) listed in GBrowser and in other data tables are freely distributed.
Proper citation: Chromosome 7 Annotation Project (RRID:SCR_007134) Copy
https://github.com/jstjohn/SimSeq
An illumina paired-end and mate-pair short read simulator. This project attempts to model as many of the quirks that exist in Illumina data as possible. Some of these quirks include the potential for chimeric reads, and non-biotinylated fragment pull down in mate-pair libraries .
Proper citation: SimSeq (RRID:SCR_006947) Copy
Resource for experimentally validated human and mouse noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation in other vertebrates or epigenomic evidence (ChIP-Seq) of putative enhancer marks. Central public database of experimentally validated human and mouse noncoding fragments with gene enhancer activity as assessed in transgenic mice. Users can retrieve elements near single genes of interest, search for enhancers that target reporter gene expression to particular tissue, or download entire collections of enhancers with defined tissue specificity or conservation depth.
Proper citation: VISTA Enhancer Browser (RRID:SCR_007973) Copy
https://www.ncbi.nlm.nih.gov/genbank/dbest/
Database as a division of GenBank that contains sequence data and other information on single-pass cDNA sequences, or Expressed Sequence Tags, from a number of organisms.
Proper citation: dbEST (RRID:SCR_008132) Copy
THIS RESOURCE IS NO LONGER IN SERVICE. Documented on August 26,2019. In October 2016, T1DBase has merged with its sister site ImmunoBase (https://immunobase.org). Documented on March 2020, ImmunoBase ownership has been transferred to Open Targets (https://www.opentargets.org). Results for all studies can be explored using Open Targets Genetics (https://genetics.opentargets.org). Database focused on genetics and genomics of type 1 diabetes susceptibility providing a curated and integrated set of datasets and tools, across multiple species, to support and promote research in this area. The current data scope includes annotated genomic sequences for suspected T1D susceptibility regions; genetic data; microarray data; and global datasets, generally from the literature, that are useful for genetics and systems biology studies. The site also includes software tools for analyzing the data.
Proper citation: T1DBase (RRID:SCR_007959) Copy
http://www.baderlab.org/Software/ActiveDriver
A statistical method for interpreting variations in protein sequence (e.g. coding SNPs in the population, SNVs in cancer genomes) in the context of protein post-translational signaling modifications.
Proper citation: ActiveDriver (RRID:SCR_008104) Copy
THIS RESOURCE IS NO LONGER IN SERVICE. Documented on February 23,2023.Software package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data, but also supporting analysis of other types of data. QIMME analyzes and transforms raw sequencing data generated on Illumina or other platforms to publication quality graphics and statistics.
Proper citation: QIIME (RRID:SCR_008249) Copy
Griffin (G-protein-receptor interacting feature finding instrument) is a high-throughput system to predict GPCR - G-protein coupling selectively with the input of GPCR sequence and ligand molecular weight. This system consists of two parts: 1) HMM section using family specific multiple alignment of GPCRs, 2) SVM section using physico-chemical feature vectors in GPCR sequence. G-protein coupled receptors (GPCR), which is composed of seven transmembrane helices, play a role as interface of signal transduction. The external stimulation for GPCR, induce the coupling with G-protein (Gi/o, Gq/11, Gs, G12/13) followed by different kinds of signal transduction to inner cell. About half of distributed drugs are intending to control this GPCR - G-protein binding system, and therefore this system is important research target for the development of effective drug. For this purpose, it is necessary to monitor, effectively and comprehensively, of the activation of G-protein by identifying ligand combined with GPCR. Since, at present, it is difficult to construct such biochemical experiment system, if the answers for experimental results can be prepared beforehand by using bioinformatics techniques, large progress is brought to G-protein related drug design. Previous works for predicting GPCR-G protein coupling selectivity are using sequence pattern search, statistical models, and HMM representations showed high sensitivity of predictions. However, there are still no works that can predict with both high sensitivity and specificity. In this work we extracted comprehensively the physico-chemical parameters of each part of ligand, GPCR and G-protein, and choose the parameters which have strong correlation with the coupling selectivity of G-protein. These parameters were put as a feature vector, used for GPCR classification based on SVM.
Proper citation: G protein receptor interaction feature finding instrument (RRID:SCR_008343) Copy
https://CRAN.R-project.org/package=gma
Software package to perform Granger mediation analysis for time series. Includes single level GMA model and two-level GMA model, for time series with hierarchically nested structure.
Proper citation: GMA (RRID:SCR_009212) Copy
http://clipserve.clip.ubc.ca/topfind
An integrated knowledgebase focused on protein termini, their formation by proteases and functional implications. It contains information about the processing and the processing state of proteins and functional implications thereof derived from research literature, contributions by the scientific community and biological databases. It lists more than 120,000 N- and C-termini and almost 10,000 cleavages. TopFIND is a resource for comprehensive coverage of protein N- and C-termini discovered by all available in silico, in vitro as well as in vivo methodologies. It makes use of existing knowledge by seamless integration of data from UniProt and MEROPS and provides access to new data from community submission and manual literature curating. It renders modifications of protein termini, such as acetylation and citrulination, easily accessible and searchable and provides the means to identify and analyse extend and distribution of terminal modifications across a protein. The data is presented to the user with a strong emphasis on the relation to curated background information and underlying evidence that led to the observation of a terminus, its modification or proteolytic cleavage. In brief the protein information, its domain structure, protein termini, terminus modifications and proteolytic processing of and by other proteins is listed. All information is accompanied by metadata like its original source, method of identification, confidence measurement or related publication. A positional cross correlation evaluation matches termini and cleavage sites with protein features (such as amino acid variants) and domains to highlight potential effects and dependencies in a unique way. Also, a network view of all proteins showing their functional dependency as protease, substrate or protease inhibitor tied in with protein interactions is provided for the easy evaluation of network wide effects. A powerful yet user friendly filtering mechanism allows the presented data to be filtered based on parameters like methodology used, in vivo relevance, confidence or data source (e.g. limited to a single laboratory or publication). This provides means to assess physiological relevant data and to deduce functional information and hypotheses relevant to the bench scientist. TopFIND PROVIDES: * Integration of protein termini with proteolytic processing and protein features * Displays proteases and substrates within their protease web including detailed evidence information * Fully supports the Human Proteome Project through search by chromosome location CONTRIBUTE * Submit your N- or C-termini datasets * Contribute information on protein cleavages * Provide detailed experimental description, sample information and raw data
Proper citation: TopFIND (RRID:SCR_008918) Copy
http://go.princeton.edu/cgi-bin/GOTermFinder
The Generic GO Term Finder finds the significant GO terms shared among a list of genes from an organism, displaying the results in a table and as a graph (showing the terms and their ancestry). The user may optionally provide background information or a custom gene association file or filter evidence codes. This tool is capable of batch processing multiple queries at once. GO::TermFinder comprises a set of object-oriented Perl modules GO::TermFinder can be used on any system on which Perl can be run, either as a command line application, in single or batch mode, or as a web-based CGI script. This implementation, developed at the Lewis-Sigler Institute at Princeton, depends on the GO-TermFinder software written by Gavin Sherlock and Shuai Weng at Stanford University and the GO:View module written by Shuai Weng. It is made publicly available through the GMOD project. The full source code and documentation for GO:TermFinder are freely available from http://search.cpan.org/dist/GO-TermFinder/. Platform: Online tool, Windows compatible, Mac OS X compatible, Linux compatible, Unix compatible
Proper citation: Generic GO Term Finder (RRID:SCR_008870) Copy
http://plantgrn.noble.org/LegumeIP/
LegumeIP is an integrative database and bioinformatics platform for comparative genomics and transcriptomics to facilitate the study of gene function and genome evolution in legumes, and ultimately to generate molecular based breeding tools to improve quality of crop legumes. LegumeIP currently hosts large-scale genomics and transcriptomics data, including: * Genomic sequences of three model legumes, i.e. Medicago truncatula, Glycine max (soybean) and Lotus japonicus, including two reference plant species, Arabidopsis thaliana and Poplar trichocarpa, with the annotation based on UniProt TrEMBL, InterProScan, Gene Ontology and KEGG databases. LegumeIP covers a total 222,217 protein-coding gene sequences. * Large-scale gene expression data compiled from 104 array hybridizations from L. japonicas, 156 array hybridizations from M. truncatula gene atlas database, and 14 RNA-Seq-based gene expression profiles from G. max on different tissues including four common tissues: Nodule, Flower, Root and Leaf. * Systematic synteny analysis among M. truncatula, G. max, L. japonicus and A. thaliana. * Reconstruction of gene family and gene family-wide phylogenetic analysis across the five hosted species. LegumeIP features comprehensive search and visualization tools to enable the flexible query on gene annotation, gene family, synteny, relative abundance of gene expression.
Proper citation: LegumeIP (RRID:SCR_008906) Copy
http://hymenopteragenome.org/beebase/
Gene sequences and genomes of Bombus terrestris, Bombus impatiens, Apis mellifera and three of its pathogens, that are discoverable and analyzed via genome browsers, blast search, and apollo annotation tool. The genomes of two additional species, Apis dorsata and A. florea are currently under analysis and will soon be incorporated.BeeBase is an archive and will not be updated. The most up-to-date bee genome data is now available through the navigation bar on the HGD Home page.
Proper citation: BeeBase (RRID:SCR_008966) Copy
http://pages.stat.wisc.edu/~yandell/qtl/software/qtlbim/
Software library for QTL Bayesian Interval Mapping that provides a Bayesian model selection approach to map multiple interacting QTL. It works on experimentally inbred lines and performs a genome-wide search to locate multiple potential QTL. The package can handle continuous, binary and ordinal traits. (entry from Genetic Analysis Software)
Proper citation: R/QTLBIM (RRID:SCR_009375) Copy
https://github.com/lpantano/seqbuster
Software tool for processing and analysis of small RNAs datasets.Reveals ubiquitous miRNA modifications in human embryonic cells.
Proper citation: SeqBuster (RRID:SCR_009616) Copy
http://www.sph.umich.edu/csg/abecasis/MACH/download/
QTL analysis based on imputed dosages/posterior_probabilities.
Proper citation: MACH (RRID:SCR_009621) Copy
A cross-platform software program for Bayesian MCMC analysis of molecular sequences. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability. We include a simple to use user-interface program for setting up standard analyses and a suit of programs for analysing the results.
Proper citation: BEAST (RRID:SCR_010228) Copy
Web application to generate sequence logos, graphical representations of patterns within multiple sequence alignment. Designed to make generation of sequence logos easy. Sequence logo generator.
Proper citation: WEBLOGO (RRID:SCR_010236) Copy
Can't find your Tool?
We recommend that you click next to the search bar to check some helpful tips on searches and refine your search firstly. Alternatively, please register your tool with the SciCrunch Registry by adding a little information to a web form, logging in will enable users to create a provisional RRID, but it not required to submit.
Welcome to the NIF Resources search. From here you can search through a compilation of resources used by NIF and see how data is organized within our community.
You are currently on the Community Resources tab looking through categories and sources that NIF has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.
If you have an account on NIF then you can log in from here to get additional features in NIF such as Collections, Saved Searches, and managing Resources.
Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:
You can save any searches you perform for quick access to later from here.
We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.
If you are logged into NIF you can add data records to your collections to create custom spreadsheets across multiple sources of data.
Here are the sources that were queried against in your search that you can investigate further.
Here are the categories present within NIF that you can filter your data on
Here are the subcategories present within this category that you can filter your data on
If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.