Search for bacterial genomes and metadata
Search for isolate names (ENA, SRA), species, metadata (country, serotype) or any combination of the above, flexibly.
BacQuerya is currently 'enhanced' for the following species, and has additional linked, searchable gene and sequence data:
You can search as with a search engine, tolerating mispellings and combining terms (e.g. 'streptococcus pnuemoniae nepal 23F').
Results are returned ordered by match to your query. Within this, we try to return 'high quality' samples such as reference sequences or those with uncontaminated assemblies at the top of the results.
You can filter the results directly using the provided toggles, search again after adjusting them. Select 'exact matches' to remove any 'fuzzy' matches to your query.
You can get the download links for results by clicking 'Download all sequences'. Above 100 sequences, this will be sent by email, or after a short wait.
This is a work in progress, and we will add more functionality to get data out of BacQuerya in the near future.
Clicking on an individual search result will open an isolate overview page, summarising available metadata for that isolate. These include: the species, accession IDs linked to external databases, download links for assemblies or read sets if available, metadata retrieved from the NCBI BioSample database and additional metadata extracted from other information sources. A JSON file with this metadata can be downloaded.
Gene cluster have been defined using panaroo.
Clicking on a result will open a gene overview page, summarising metadata for the gene of interest. The 'Names/Aliases' field displays all publicly seen gene identifiers for this gene and the 'Description(s)' all publicly seen functional annotations.
Population level information includes gene count and frequency, and a sequence alignment viewer (rendered as an image). This can be scaled to get an overview of the amount and position of variation, and only SNP sites selected. This alignment currently includes one sample from each strain (as defined by PopPUNK).
An inverse lookup table of isolates with this gene in this species is shown below, similar to the top level isolate results.
Genes can also be searched through using a nucleotide sequence query. Search sequences must be at nucleotides >=31bp long (as the index was built with 31-mers).
Sequences are queried using a COBS index by exact k-mer matching. Search results are ranked in descending order by the proportion of matching k-mers between the query sequence and the sequence of the indexed gene and search results link to the gene overview pages.
Studies can be searched by selecting the 'Study' tab and searching for a title, author, DOI or study topic. Presently this is an interface to PubMed search, so has the same features and results.
Clicking on a search result will load the metadata for that study, retrieved using the CrossRef API).
Search for bacterial genomes and metadata