Sunday, May 19, 2024

Science Roundup

Its been a while since I've written something for this blog, but with my first year of my PhD done I can now devote a little more time to it. Anyways here's a quick roundup of some cool topics I found interesting these past months. 

HACE - Targeted mutations 

Geneticists are often interested in the way that mutations in DNA affect the function of a protein (remember: DNA --> RNA --> Proteins). This can kind of be done on DNA in a laboratory setting through a variety of ways like exposing cells to UV radiation or through knockdown experiments but creating targeted mutations has proved difficult and complex. In a recent pre-print by Dawn Chen et al., the researchers have developed a new technology called HACE that is able to make small nicks in a single DNA strand using a guided version of Crisper-Cas9. The Cas9 bit has a section of RNA loaded on it that recruits an enzyme and makes random mutations as it moves along where the break in the DNA was made. This way, scientists can now make much more specific evaluations on how a mutation in a location of DNA may affect its function. 

Scale Eater Fish

I recently learned of the Scale Eater Fish (Perissodus microlepis), a species of freshwater fish in Zambia. These fish sneak up on other fish and, like their name suggests, bite off their scales and eat them. What makes scientists interested in them is that they are an unusual case study in natural selection. Scale eaters have mouths that are a bit crooked and bend either to the left or to the right.



Whether a population of scale eaters has more left or right bent individuals in a given year depends entirely on how sick of their BS the other fish in the lake are. Fish start to become protective of whichever side is currently more likely to have their scales bitten off. So, when there are more right mouthed scale eaters, the other fish learn to begin to watch their right sides. This leads to a disadvantage for right mouthed fish but an advantage for left mouthed ones. Just as the fish think they've adapted to getting bit on one side, scale eaters that bite the other side suddenly become more common in the lake, with this cycle seeming to occur every five years or so. How genetic and non-genetic cues jointly influence the direction and the degree of mouth bent-ness is still being investigated and of major curiosity to population-geneticists. 

How we Lost our Tails

Humans lost our external tails about 25 million years ago, leaving only the coccyx in its place. A new paper by Xia et al., shows that this loss may have occurred due to a transposable element that inserted itself into an ancestor's gene. Transposable elements are bits of DNA that jump around either by being converted to RNA and then reconverted back to into DNA and placed in a new position, or by producing an enzyme that moves its place in the genome. The most abundant type of transposable element is called an Alu Element, which make up about 10% of our DNA. Most of the time, because most of your DNA is non-functional, their movement doesn't do much. But it seems that one Alu element's movement into the middle of our TBXT gene led to our ancestors losing their tails!


Bichir and the Immune System

This is a project I've been working on at my new lab here at NC State. Bichir, scientifically known as by their family name "Polypteriformes," are the oldest lineage of bony fish. Bichir have been around for about 300 million years and has changed very little, giving them their nickname as "living fossils." 

The Bony Fish Lineage - AKA Acintopterygii. The oldest lineage are at the top, the bichirs and reedfish. The lineage in the middle, the Teleostei, represent the majority of fish species. 


Because bichir are so old, they are some of the only bony fish lineages that did not go through the "Teleost gene duplication event." This event, in which a common ancestor of all Teleostei had its whole genome fully duplicated, lead to a massive explosion in the diversity of fish species. This makes the Polypteriformes interesting for looking into the evolution of all sorts of processes. Specifically, I've been working on looking into characterizing several genes associated with the immune system to see how they might have looked pre-duplication event. 

Stonefish Venom Genes

One really cool aspect of our immune system is the membrane-attack complex. This is a structure that our body forms that pokes holes in disease-causing invaders, causing water to rush into their membranes and kill them. 


Stonefish are notorious for being some of the most venomous animals on earth, with many incidents reported yearly of divers accidentally stepping on their sharp protruding spines, leading to immense pain. They are also really ugly (apologies to any stonefish reading this). I was surprised to learn that the venom causing protein from stonefish actually works the exact same way as our immune system due to it being an ancient branch of our own membrane-attack complex family. Just like how we use our proteins to form pores in invaders, the stonefish venom, SNTX, pokes a hole in the cells of whatever tissue is unfortunate to come into contact with them, leading to cell death. 

Monday, January 8, 2024

BWA, Read Mapping, and Indexing

A couple of months ago I was working on a project looking at genetic differences between two populations of this marine worm as a part of one of my lab rotations. What should have taken me a week or two ended up being four because the first command I had to run to analyze the data I was given, "BWA aln," was not the correct version. After a bit of banging my head against a wall it turns out for the specific genetic data I was given I actually had to run "BWA mem." C’est la vie.


Anyways the reason why I'm telling you this isn't because I wanted to talk about the project I was doing, (I actually didn't end up finding a lot of genetic differentiation between the two populations), or because I'm warning you of the pitfalls of using aln vs mem, but rather because I think BWA is a really cool concept.  BWA and other read mapping programs like it rely on different variations of the same algorithm and represent such a ubiquitous first step in a lot of bioinformatics pipelines. Unless you are a mathematician, most people just run the command without really thinking about what its actually doing in a lot of depth.  

BWA - What is it?


BWA stands for Burrows Wheeler Alignment. Its based on an algorithm invented by two guys, Michael Burrows and David Wheeler, called Burrows Wheeler Transform (BWT), that was originally used as a compression tool for text files. It kind of faded into relative obscurity until it was picked up by geneticists in the 2010's who realized that it worked great for DNA sequencing files.  In fact almost every read mapping alignment tool for genetics uses the BWT algorithm in some way or another. BWA just happens to be (at the time of writing this) one of the most popular. 

It works like this: you take a piece of text, (lets use the example below of the word banana), add a $ symbol to delineate the end, and create columns of each rotation of the letters. So first you would move the letter "b" to the end and shift the "a" up, then in the next row move "a" to the end and shift the "n" up, and so forth. 

Fig 1

After you have every combination of the text, you can simplify it by taking just the last column (highlighted in red in Figure 1). For mathematical reasons, when you do this it tends to make the same letters  bunch together. You then add a number before each group repeated letters designating how many repeats there are.  


Now in this specific instance actual size of the text did not change, so the file containing the text 'banana' would not be compressed. But you can imagine how bunching repeats in this way can be useful for compressing a 50 GB genomic sequencing file of just A's, T's, C's, and G's. But we can take this a step further.

Map those reads!


See the reason why BWA is such a ubiquitous first step in a lot of pipeline's isn't just for its usefulness at compressing files, its also great for aligning newly sequenced genomes to a reference, also known as read mapping.



Read mapping basically allows you to take your fragmented genome sequence pieces and match them to the areas of best fit on a reference so you can reorganize them correctly. 

The way BWA does this is a bit counter intuitive and difficult to explain in paragraph form but bear with me. First we must create a matrix with four columns. The first column is just a number from 0 to however many characters there are. We label this as i. Then go back to the columns of rotations from before. The second and third columns of our matrix are the first and last columns of letters (highlighted below in red boxes). We label these as "First" and "Last." The fourth column is the most confusing. Using the three columns we have just created in our matrix, we then go row by row and look to see where does the character in the "Last" column, appear primarily in the "First" column and write down the corresponding number from i. We call this L2F(i). Again here's what that looks like using 'banana:'


Now what seems like utterly useless playing around with letters is actually genius. Basically in a very roundabout way, what we've done is make it easier to match where a piece of text might fit best to the reference. For a more detailed explanation on how this occurs see this fantastic youtube tutorial by Niema Moshiri. It involves limiting the range in which a pattern appears using the L2F(i) column.  Luckily, we don't have to think about this too hard since we can have a computer do it for us. Again, you can see how helpful this could be with large DNA files and why I get a little bit geeked out when I talk about it. With BWA done and your short read sequences mapped to a reference in their correct order, the next steps of actually analyzing them has become much easier. 

Thursday, July 13, 2023

What Colors Does Your Dog Actually See? And Why Did Color Vision Evolve?

I'm a bit late but recently there was a Tiktok trend where people would apply a filter that supposedly showed you what the world looked like to a dog. Reactions to the filter went semi-viral as some people sobbed that their dog couldn't enjoy the same vibrant colors that we humans do, which was pretty funny. This had me wondering though, how accurate was this filter? If a dog could see the same spectrums of light as us could it even "appreciate" these colors in the same way we do? What does THAT even mean? So, just like a dog that can't jump very high but wants to escape your backyard, I did a little bit of digging. 

I like this picture of this dog staring at me like an 18th century english orphan  

A little bit of background (long groan)...

To begin we have to understand what color vision is. Anybody that has taken a basic bio course knows about rods and cones, photoreceptor cells in your eyes that absorb light and trigger a complex reaction that allows us to see things. I remember these being described to me as "one helps you see shapes and the other helps you see colors," which is an oversimplification. In fact, both cells help you see colors, just slightly differently under different conditions. 


In the dark, both rods and cones release glutamate, an important chemical that sends signals to neurons called bipolar cells which act as an in-between to your other neurons, which lets the brain understand "vision". There are ON bipolar cells, which are called that because they are excited when the light is on, and OFF bipolar cells, which are excited when the light is off. Thus, OFF bipolar cells are excited by glutamate production during the dark and ON bipolar cells are excited by a complex chemical reaction that occurs when light shuts off your body's glutamate valve. The glutamate valve for rod cells are inhibited at lower levels of light and their bipolar cells can inhibit cone OFF bipolar cells, overtaking most of the cone pathway for vision at night and becoming our main way of seeing in the dark. 

But what about color? Thats where cones come in. Humans contain four types of light sensitive proteins called opsins. Cones have three classes of opsins: long (L), medium (M), and short (S),  that are excited by the corresponding wavelengths for red, yellow-green, and purple-blue. Rods only have one, which is why in the dark, when rods are dominant, everything appears more muted and grayish. 

    

A chart that every physics 101 class has seen


The possession of three cones for color vision is called trichromacy and its actually something that is specific to humans and other closely related primates. Most mammals (including your dog) have dichromatic vision, meaning they have some combination of only two types of opsins (usually S with M or L). Birds, amphibians, reptiles, and fish are often tetrachromatic (with 4 opsin proteins) and sometimes rarely pentachromatic, although having five types of opsins and being able to distinguish between the colors they provide are two different things. This is why although mantis shrimp have 55 different types of cones, the idea that they can "see more colors" than us is a bit of an oversimplification. A recent study found that mantis shrimp can not distinguish between wavelengths less than 25nm apart, contrasted by most humans who can distinguish between wavelengths by 1-5 nm. Its thought that this actually helps the shrimp out though. By not worrying itself with all these colors, they can reduce the amount of time it takes their brains to see contrasts between different organisms, helping its survival and making its responses faster. After all, seeing the world as a kaleidoscope of colors would probably get pretty distracting.


Mantis shrimps are however the only organism that can see circular polarization. What does that look like? I don't know, ask your local shrimp.


So what colors do dogs see? 

The answer is we really will never be able to know for sure (my least favorite answer that scientists love to give). But because they lack an L cone they probably see something like this:

The Tiktok filter was right all along! 


How did we get here?

The real reason I wrote this article wasn't because I cared too much about what colors dogs can and can't see. Most people already knew somewhat about rods and cones and that their dog is a dingus. The real reason is because I wanted to know more about opsin evolution. 

Like all proteins, opsins have genes that code them. It used to be that to study these proteins, we would have to isolate them directly from animal retinas, but with modern technologies scientists have opted instead to have cultured cells produce them for us in the lab. This is nice because it allows researchers, who are really just curious children at heart, to play around with the cells genes and see what happens to the opsins. In addition, we have sequenced the genes for about 1000 opsins, from humans to jellyfish, which has provided even more background to their history. 


Scientists theorize that around 500 million years ago, a jawless proto-vertebrate had already developed four opsins homologous to our modern day ones. Scientists dubbed the classes of these opsins as SWS1 and SWS2 (homologous to human S opsins), and RH2 and LWS (homologous to M human opsins). At some point, probably around 250 million years ago, mammals became more and more nocturnal in order to escape predators that hunted mostly in the daytime. In response, over time they lost RH2 and SWS2, which is why most mammals are dichromatic, like your dog. From what I've read, its not clear what the advantage of having SWS1 over SWS2 was however, since theres really not too much of a difference between the two except a little bit more UV-sensitivity in SWS1. 

Case in point, we as humans lost this UV-sensitivity in exchange for seeing blues and purples. Scientists found that exactly seven genetic mutations of the SWS1 gene changed the opsin wavelength sensitivity in primates as they switched from being nocturnal to foraging in the daytime. Seeing blues and purples might have allowed them to see berries and fruits better against green topiary, giving blue seeing primates a bit of an advantage.

Seeing more colors is useful!

But what about the LWS opsin? Overtime LWS slowly shifted its sensitivity to become M, allowing us to see the color green the way we do, and the L gene evolved from that. Again, this is probably because seeing contrasting colors in the daytime is helpful for finding food. Theres two hypotheses for what might have happened to create this two opsin gene system. The first is that there were two variants of this gene, one that was more sensitive to long wavelengths and one for medium ones. This means that at some point  our ancestors were probably running around seeing the world slightly differently from one another. Its theorized that this gene duplicated due to unequal crossing over during meiosis in a female primate. This meant that rather than one gene with two types (M and L) two distinct genes for M and L were created on a single X chromosome. Any children from this primate would now only need a single X chromosome with this mutation to attain trichromatic vision. 

The other hypothesis is that the M opsin gene duplicated. This would have allowed mutations to occur on one set of this gene whilst keeping the other intact. So while the original M gene remained, the duplicate could have had multiple mutations acting on it to eventually become L, allowing us to see the vibrant reds we know and love.

In the colorful history of our vision, it seems our ancestors were quite the sightseers, observing the world through slightly different lenses. Whether it was a tale of two gene variants, causing them to see greens and mediums with a quirky divergence, or the mischievous duplication of the M opsin gene, our vision has certainly evolved in an interesting manner. Keep those eyes wide open, my trichromats, and go hug your dog. 

Sunday, December 4, 2022

How do we connect characteristics to genes? (And vice-versa)

We oftentimes hear about studies saying things like "x gene has been shown to be responsible for y." Like this paper for instance, which found that mutations in the rhodopsin gene can affect vision in mice. That's awesome and cool but how exactly do we go about connecting those two things? Well, turns out there are lots of ways, all of which depends on what exactly you are trying to find out. 


The first method is called a "forward genetic screen." This is basically when you start out knowing what phenotype (the individual characteristic) you want to examine and are trying to find what genes are responsible for it. This involves making random mutations in your test subject, seeing what characteristics changed, and then going back and seeing what specific genes caused that change. One way of inducing mutations is through chemicals that introduce stop codons into an organism's DNA in random locations which can modify a gene's function.  Another technique involves exploiting RNA interference in which you can introduce specific enzymes that degrade mRNA and destroy any instructions needed to build certain proteins, thus also affecting the phenotype. 

Steps for forward genetic screenings

The second method is called a "reverse genetic screen." This is when you are trying to find the function of a gene by connecting it to a specific phenotype. Similar techniques are used to disrupt DNA function, only this time they are not random. 


A couple of handy guides!

It is important to note that often times it is not a single gene but a network of genes that is responsible for a phenotype, thus making this more complicated. I've only scratched the surface regarding methods and technologies that can be used to determine gene functions. As always, reading about this stuff makes me thankful we live in an era in which analyzing genes is easier than ever. 

Tuesday, June 28, 2022

Genome Mining and Microbes

No, not that type of mining!

Thats more like it!

With the advent of next generation sequencing technologies, thousands of genomes have become available for scientists to use. As bioinformatic methods increase in their ability to sift through massive amounts of genomic data, biologists have begun exploring for genes within microbes that play key roles in metabolic pathways. These genes often encode for secondary metabolites - molecules that are synthesized in response to environmental cues that provide advantages to organisms. These molecules can help facilitate nutrient acquisition, create defense mechanisms against predatory organisms, and help resist toxic compounds. Often, the discovery of these secondary metabolites and the gene clusters that make them, have led to the creation of new life saving drugs!

Until recently, application of these techniques have mostly been focused on culturable microbes. However, with the creation of culture-independent microbiology methods, scientists have begun looking towards the worlds largest environment, the oceans! Microalgae, marine protists, marine fungi and other microbial organisms in the oceans have begun to have their genomes sifted through to find genes that code for the synthesis of natural drugs that could be used by the pharmaceutical industry.

Thursday, March 10, 2022

Catching Invasive Species

We all know that invasive species cause a lot of problems. The introduced organisms are often so good at being themselves that, with the help of a lack of native predators, they outcompete already existing organisms. Famous examples of critters like lionfishasian shore crabs, zebra mussels and a multitude of others have wrecked havoc on local areas. One study estimated that biological invasions have cost North America roughly $26 billion dollars a year. Thus it should be no surprise that finding ways of dealing with invasive species has been a top priority of scientists for decades. Enter environmental DNA. 

As I have written about previously, I have had the fortune to work at an environment DNA (eDNA) lab and learn quite a bit about the methods and research that are used. eDNA refers to genetic material that is just floating around nature in the form of shedded skin or fecal matter. With the increase in more advanced bioinformatic methods we can scoop this DNA from the environment and compare it to DNA sequences in databases to find out what species it came from. 


But how does this help with combatting invasive species? Picture this. You work for a bio-monitoring program, scouting specific areas in a national park for possible abnormalities. One day you find a slew of green crabs on the beaches that are already everywhere! It's too late. You now have a possible environmental disaster! eDNA offers a clever tool in which instead of monitoring invasive organisms by physically locating them after they arrive, scientists are now able to detect them before they have a chance to fully establish themselves!

Friday, February 4, 2022

Gene Editing Starts the Eradication of Salmon Viruses

Great news coming from The Roslin Institute in Scotland! Researchers have identified genes associated with resistance to a disease known as Infectious Pancreatic Necrosis (IPN) in Atlantic Salmon. Seeing as how salmon represent 4.6% of the global food supply with almost all of that being from aquaculture farms, you can see how this would be a pretty big deal. IPN is among the list of several diseases that can greatly disrupt aquaculture centers by infecting their salmon production and causing high mortality rates. By finding the exact locations in Atlantic Salmon genomes that allows for some of them to be naturally resistant to IPN, farmers can more accurately test for and select naturally resistant brood stock: the animals in a farm used for breeding purposes. 



A dissected Salmon Parr infected with IPN (top) vs a healthy Parr (bottom)

But how exactly did scientists go about doing this? Well, first they performed what is called a "challenge experiment" in which they infected families of Atlantic Salmon and looked at the tanks that had the least amount of salmon dead. The salmon in those tanks were deemed resistant whereas the other ones were deemed susceptible or intermediate to IPN. They then took two of the intermediate families, tested for their parent's genotypes, and analyzed their gene expression patterns for IPN QTL-linked markers. 

QTL = Quantitative Trait Locus. It's an area on a chromosome region detected by statistical analysis that is significantly associated with variation for a quantitative trait. Often times to find QTLs scientists link them to specific genetic markers that exist in two distinguishable forms. 

After looking into the QTL pattern differences in the salmon, the authors found a specific gene within this area that was the most differentially expressed; a gene called nae1. They then used CRISPR-Cas9, a widely used method for gene editing, to block nae1 and see if it really caused a major difference in IPN resistance. Their results show that indeed, blocking nae1 significantly reduced the salmon's abilities to resist being infected by the virus! Exciting stuff!