Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.
SciCrunch Registry is a curated repository of scientific resources, with a focus on biomedical resources, including tools, databases, and core facilities - visit SciCrunch to register your resource.
http://www.sci.unisannio.it/docenti/rampone/
Data set of Homo Sapiens Exons, Introns and Splice regions extracted from GenBank Rel.123 with an aim of giving standardized material to train and to assess the prediction accuracy of computational approaches for gene identification and characterization. From the complete GenBank (Primate Sequences Division) Rel.123 (162,557 entries), entries of Human Nuclear DNA including Complete CDS and more than one Exon have been selected, and 4523 exons and 3802 introns have been extracted from these entries. Details about extracted exons and introns are reported (Locus, number, Start and End position in the entry, sequence, length, G+C content, presence of not AGCT data (nucleotide scan check)). Statistics are also reported (overall nucleotides, average G+C content, nucleotide scan check results, number of not GT starting / AG ending introns, minimum / maximum / average length, length standard deviation). 3799+3799 donor and acceptor sites, as windows of 140 nucleotides around each splice site have been extracted. After discarding sequences not including canonical GTAG junctions (65+74), including insufficient data (not enough material for a 140 nucleotide window) (686+589), including not AGCT bases (29+30), and redundant (218+226) there are 2796+ 2880 windows. Finally, there are 271,937 + 332,296 windows of false splice sites, selected by searching canonical GTAG pairs in not splicing positions. The false sites in a range of +/- 60 from a true splice site are marked as proximal.
Proper citation: HS3D - Homo Sapiens Splice Sites Dataset (RRID:SCR_002939) Copy
Curated lists of genes associated to speech / language phenotypes and structural or functional abnormalities observed in patient populations. Entrez ID gene information, as well as gene expression profiles from the Allen Brain Atlas are available. You can also download expression data for a given gene in JSON or XML format.
Proper citation: Speech Language Disorders Database (RRID:SCR_003655) Copy
http://www.linked-neuron-data.org/
Neuroscience data and knowledge from multiple scales and multiple data sources that has been extracted, linked, and organized to support comprehensive understanding of the brain. The core is the CAS Brain Knowledge base, a very large scale brain knowledge base based on automatic knowledge extraction and integration from various data and knowledge sources. The LND platform provides services for neuron data and knowledge extraction, representation, integration, visualization, semantic search and reasoning over the linked neuron data. Currently, LND extracts and integrates semantic data and knowledge from the following resources: PubMed, INCF-CUMBO, Allen Reference Atlas, NIF, NeuroLex, MeSH, DBPedia/Wikipedia, etc.
Proper citation: Linked Neuron Data (RRID:SCR_003658) Copy
http://www.sgn.cornell.edu/bulk/input.pl?modeunigene
Allows users to download Unigene or BAC information using a list of identifiers or complete datasets with FTP., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
Proper citation: Sol Genomics Network - Bulk download (RRID:SCR_007161) Copy
http://linux1.softberry.com/spldb/SpliceDB.html
Database of canonical and non-canonical mammalian splice sites. The information about verified splice site sequences for canonical and non-canonical sites is presented with the supporting evidence. Weight matrices were built for the major splice groups, which can be incorporated into gene prediction programs.
Proper citation: SpliceDB (RRID:SCR_006262) Copy
http://aws.amazon.com/1000genomes/
A dataset containing the full genomic sequence of 1,700 individuals, freely available for research use. The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation. The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research free of charge. The dataset containing the full genomic sequence of 1,700 individuals is now available to all via Amazon S3. The data can be found at: http://s3.amazonaws.com/1000genomes The 1000 Genomes Project aims to include the genomes of more than 2,662 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the data collection this year. Public Data Sets on AWS provide a centralized repository of public data hosted on Amazon Simple Storage Service (Amazon S3). The data can be seamlessly accessed from AWS services such Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), which provide organizations with the highly scalable compute resources needed to take advantage of these large data collections. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data. All 200 TB of the latest 1000 Genomes Project data is available in a publicly available Amazon S3 bucket. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP. Researchers can use the Amazon EC2 utility computing service to dive into this data without the usual capital investment required to work with data at this scale. AWS also provides a number of orchestration and automation services to help teams make their research available to others to remix and reuse. Making the data available via a bucket in Amazon S3 also means that customers can crunch the information using Hadoop via Amazon Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.
Proper citation: 1000 Genomes Project and AWS (RRID:SCR_008801) Copy
http://pathways.mcdb.ucla.edu/algal/
Tools to search gene lists for functional term enrichment as well as to dynamically visualize proteins onto pathway maps. Additionally, integrated expression data may be used to discover similarly expressed genes based on a starting gene of interest.
Proper citation: Algal Functional Annotation Tool (RRID:SCR_012034) Copy
http://www.zebrafinchatlas.org
Expression atlas of in situ hybridization images from large collection of genes expressed in brain of adult male zebra finches. Goal of ZEBrA project is to develop publicly available on-line digital atlas that documents expression of large collection of genes within brain of adult male zebra finches.
Proper citation: Zebra Finch Expression Brain Atlas (RRID:SCR_012988) Copy
http://www.viprbrc.org/brc/home.do?decorator=vipr
Provides searchable public repository of genomic, proteomic and other research data for different strains of pathogenic viruses along with suite of tools for analyzing data. Data can be shared, aggregated, analyzed using ViPR tools, and downloaded for local analysis. ViPR is an NIAID-funded resource that support the research of viral pathogens in the NIAID Category A-C Priority Pathogen lists and those causing (re)emerging infectious diseases. It provides a dedicated gateway to SARS-CoV-2 data that integrates data from external sources (GenBank, UniProt, Immune Epitope Database, Protein Data Bank), direct submissions, analysis pipelines and expert curation, and provides a suite of bioinformatics analysis and visualization tools for virology research.
Proper citation: Virus Pathogen Resource (ViPR) (RRID:SCR_012983) Copy
http://www.dkfz.de/en/epidemiologie-krebserkrankungen/software/software.html
THIS RESOURCE IS NO LONGER IN SERVICE. Documented on May 24,2023. Software program that performs estimation of power and sample sizes required to detect genetic and environmental main, as well as gene-environment interaction (GxE) effects in indirect matched case-control studies (1:1 matching). When the hypothesis of GxE is tested, power/sample size will be estimated for the detection of GxE, as well as for the detection of genetic and environmental marginal effects. Furthermore, power estimation is implemented for the joint test of genetic marginal and GxE effects (Kraft P et al., 2007). Power and sample size estimations are based on Gauderman''s (2002) asymptotic approach for power and sample size estimations in direct studies of GxE. Hardy-Weinberg equilibrium and independence of genotypes and environmental exposures in the population are assumed. The estimates are based on genotypic codes (G=1 (G=0) for individuals who carry a (non-) risk genotype), which depend on the mode of inheritance (dominant, recessive, or multiplicative). A conditional logistic regression approach is used, which employs a likelihood-ratio test with respect to a biallelic candidate SNP, a binary environmental factor (E=1 (E=0) in (un)exposed individuals), and the interaction between these components. (entry from Genetic Analysis Software)
Proper citation: PIAGE (RRID:SCR_013124) Copy
http://bioinformatics.ust.hk/BOOST.html
Software application (entry from Genetic Analysis Software) for a method for detecting gene-gene interactions. It allows examining all pairwise interactions in genome-wide case-control studies.
Proper citation: BOOST (RRID:SCR_013133) Copy
http://bioconductor.org/packages/edgeR/
Bioconductor software package for Empirical analysis of Digital Gene Expression data in R. Used for differential expression analysis of RNA-seq and digital gene expression data with biological replication.
Proper citation: edgeR (RRID:SCR_012802) Copy
Integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical information. In particular, gene catalogs in completely sequenced genomes are linked to higher-level systemic functions of cell, organism, and ecosystem. Analysis tools are also available. KEGG may be used as reference knowledge base for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies.
Proper citation: KEGG (RRID:SCR_012773) Copy
http://www-sequence.stanford.edu/group/candida/
The Stanford Genome Technology Center began a whole genome shotgun sequencing of strain SC5314 of Candida albicans. After reaching its original goal of 1.5X mean coverage of the haploid genome (16Mb) in summer, 1998, Stanford was awarded a supplemental grant to continue sequencing up to a coverage of 10X, performing as much assembly of the sequence as possible, using recognizable genes as nucleation points. Candida albicans is one of the most commonly encountered human pathogens, causing a wide variety of infections ranging from mucosal infections in generally healthy persons to life-threatening systemic infections in individuals with impaired immunity. Oral and esophogeal Candida infections are frequently seen in AIDS patients. Few classes of drugs are effective against these fungal infections, and all of them have limitations with regard to efficacy and side-effects.
Proper citation: Sequencing of Candida Albicans (RRID:SCR_013437) Copy
https://omictools.com/l2l-tool
THIS RESOURCE IS NO LONGER IN SERVICE, documented May 10, 2017. A pilot effort that has developed a centralized, web-based biospecimen locator that presents biospecimens collected and stored at participating Arizona hospitals and biospecimen banks, which are available for acquisition and use by researchers. Researchers may use this site to browse, search and request biospecimens to use in qualified studies. The development of the ABL was guided by the Arizona Biospecimen Consortium (ABC), a consortium of hospitals and medical centers in the Phoenix area, and is now being piloted by this Consortium under the direction of ABRC. You may browse by type (cells, fluid, molecular, tissue) or disease. Common data elements decided by the ABC Standards Committee, based on data elements on the National Cancer Institute''s (NCI''s) Common Biorepository Model (CBM), are displayed. These describe the minimum set of data elements that the NCI determined were most important for a researcher to see about a biospecimen. The ABL currently does not display information on whether or not clinical data is available to accompany the biospecimens. However, a requester has the ability to solicit clinical data in the request. Once a request is approved, the biospecimen provider will contact the requester to discuss the request (and the requester''s questions) before finalizing the invoice and shipment. The ABL is available to the public to browse. In order to request biospecimens from the ABL, the researcher will be required to submit the requested required information. Upon submission of the information, shipment of the requested biospecimen(s) will be dependent on the scientific and institutional review approval. Account required. Registration is open to everyone.. Documented on August 26, 2019.
Database of published microarray gene expression data, and a software tool for comparing that published data to a user''''s own microarray results. It is very simple to use - all you need is a web browser and a list of the probes that went up or down in your experiment. If you find L2L useful please consider contributing your published data to the L2L Microarray Database in the form of list files. L2L finds true biological patterns in gene expression data by systematically comparing your own list of genes to lists of genes that have been experimentally determined to be co-expressed in response to a particular stimulus - in other words, published lists of microarray results. The patterns it finds can point to the underlying disease process or affected molecular function that actually generated the observed changed in gene expression. Its insights are far more systematic than critical gene analyses, and more biologically relevant than pure Gene Ontology-based analyses. The publications included in the L2L MDB initially reflected topics thought to be related to Cockayne syndrome: aging, cancer, and DNA damage. Since then, the scope of the publications included has expanded considerably, to include chromatin structure, immune and inflammatory mediators, the hypoxic response, adipogenesis, growth factors, hormones, cell cycle regulators, and others. Despite the parochial origins of the database, the wide range of topics covered will make L2L of general interest to any investigator using microarrays to study human biology. In addition to the L2L Microarray Database, L2L contains three sets of lists derived from Gene Ontology categories: Biological Process, Cellular Component, and Molecular Function. As with the L2L MDB, each GO sub-category is represented by a text file that contains annotation information and a list of the HUGO symbols of the genes assigned to that sub-category or any of its descendants. You don''''t need to download L2L to use it to analyze your microarray data. There is an easy-to-use web-based analysis tool, and you have the option of downloading your results so you can view them at any time on your own computer, using any web browser. However, if you prefer, the entire L2L project, and all of its components, can be downloaded from the download page. Platform: Online tool, Windows compatible, Mac OS X compatible, Linux compatible, Unix compatible
Proper citation: L2L Microarray Analysis Tool (RRID:SCR_013440) Copy
Functional genomic database for malaria parasites. Database for Plasmodium spp. Provides resource for data analysis and visualization in gene-by-gene or genome-wide scale. PlasmoDB 5.5 contains annotated genomes, evidence of transcription, proteomics evidence, protein function evidence, population biology and evolution data. Data can be queried by selecting from query grid or drop down menus. Results can be combined with each other on query history page. Search results can be downloaded with associated functional data and registered users can store their query history for future retrieval or analysis.Key community database for malaria researchers, intersecting many types of laboratory and computational data, aggregated by gene.
Proper citation: PlasmoDB (RRID:SCR_013331) Copy
Database for ESTs (Expressed Sequence Tags), consensus sequences, bacterial artificial chromosome (BAC) clones, BES (BAC End Sequences). They have generated 69,545 ESTs from 6 full-length cDNA libraries (Porcine Abdominal Fat, Porcine Fat Cell, Porcine Loin Muscle, Liver and Pituitary gland). They have also identified a total of 182 BAC contigs from chromosome 6. It is very valuable resources to study porcine quantitative trait loci (QTL) mapping and genome study. Users can explore genomic alignment of various data types, including expressed sequence tags (ESTs), consensus sequences, singletons, QTL, Marker, UniGene and BAC clones by several options. To estimate the genomic location of sequence dataset, their data aligned BES (BAC End Sequences) instead of genomic sequence because Pig Genome has low-coverage sequencing data. Sus scrofa Genome Database mainly provide comparative map of four species (pig, cattle, dog and mouse) in chromosome 6.
Proper citation: PiGenome (RRID:SCR_013394) Copy
Database of traceable, standardized, annotated gene signatures which have been manually curated from publications that are indexed in PubMed. The Advanced Gene Search will perform a One-tailed Fisher Exact Test (which is equivalent to Hypergeometric Distribution) to test if your gene list is over-represented in any gene signature in GeneSigDB. Gene expression studies typically result in a list of genes (gene signature) which reflect the many biological pathways that are concurrently active. We have created a Gene Signature Data Base (GeneSigDB) of published gene expression signatures or gene sets which we have manually extracted from published literature. GeneSigDB was creating following a thorough search of PubMed using defined set of cancer gene signature search terms. We would be delighted to accept or update your gene signature. Please fill out the form as best you can. We will contact you when we get it and will be happy to work with you to ensure we accurately report your signature. GeneSigDB is capable of providing its functionality through a Java RESTful web service.
Proper citation: GeneSigDB (RRID:SCR_013275) Copy
http://bioinformatics.psb.ugent.be/ENIGMA/
A software tool to extract gene expression modules from perturbational microarray data, based on the use of combinatorial statistics and graph-based clustering. The modules are further characterized by incorporating other data types, e.g. GO annotation, protein interactions and transcription factor binding information, and by suggesting regulators that might have an effect on the expression of (some of) the genes in the module. Version : ENIGMA 1.1 used GO annotation version : Aug 29th 2007
Proper citation: ENIGMA (RRID:SCR_013400) Copy
http://folk.uio.no/thoree/FEST/
An R package for simulations and likelihood calculations of pair-wise family relationships using DNA marker data. (entry from Genetic Analysis Software)
Proper citation: R/FEST (RRID:SCR_013347) Copy
Can't find your Tool?
We recommend that you click next to the search bar to check some helpful tips on searches and refine your search firstly. Alternatively, please register your tool with the SciCrunch Registry by adding a little information to a web form, logging in will enable users to create a provisional RRID, but it not required to submit.
Welcome to the NIF Resources search. From here you can search through a compilation of resources used by NIF and see how data is organized within our community.
You are currently on the Community Resources tab looking through categories and sources that NIF has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.
If you have an account on NIF then you can log in from here to get additional features in NIF such as Collections, Saved Searches, and managing Resources.
Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:
You can save any searches you perform for quick access to later from here.
We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.
If you are logged into NIF you can add data records to your collections to create custom spreadsheets across multiple sources of data.
Here are the sources that were queried against in your search that you can investigate further.
Here are the categories present within NIF that you can filter your data on
Here are the subcategories present within this category that you can filter your data on
If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.