Technology

The field of natural language processing is chasing the wrong goal

Published

6 days ago

August 2, 2020

At a typical annual meeting of the Association for Computational Linguistics (ACL), the program is a parade of titles like “A Structured Variational Autoencoder for Contextual Morphological Inflection.” The same technical flavor permeates the papers, the research talks, and many hallway chats.

At this year’s conference in July, though, something felt different—and it wasn’t just the virtual format. Attendees’ conversations were unusually introspective about the core methods and objectives of natural-language processing (NLP), the branch of AI focused on creating systems that analyze or generate human language. Papers in this year’s new “Theme” track asked questions like: Are current methods really enough to achieve the field’s ultimate goals? What even are those goals?

My colleagues and I at Elemental Cognition, an AI research firm based in Connecticut and New York, see the angst as justified. In fact, we believe that the field needs a transformation, not just in system design, but in a less glamorous area: evaluation.

The current NLP zeitgeist arose from half a decade of steady improvements under the standard evaluation paradigm. Systems’ ability to comprehend has generally been measured on benchmark data sets consisting of thousands of questions, each accompanied by passages containing the answer. When deep neural networks swept the field in the mid-2010s, they brought a quantum leap in performance. Subsequent rounds of work kept inching scores ever closer to 100% (or at least to parity with humans).

So researchers would publish new data sets of even trickier questions, only to see even bigger neural networks quickly post impressive scores. Much of today’s reading comprehension research entails carefully tweaking models to eke out a few more percentage points on the latest data sets. “State of the art” has practically become a proper noun: “We beat SOTA on SQuAD by 2.4 points!”

But many people in the field are growing weary of such leaderboard-chasing. What has the world really gained if a massive neural network achieves SOTA on some benchmark by a point or two? It’s not as though anyone cares about answering these questions for their own sake; winning the leaderboard is an academic exercise that may not make real-world tools any better. Indeed, many apparent improvements emerge not from general comprehension abilities, but from models’ extraordinary skill at exploiting spurious patterns in the data. Do recent “advances” really translate into helping people solve problems?

Such doubts are more than abstract fretting; whether systems are truly proficient at language comprehension has real stakes for society. Of course, “comprehension” entails a broad collection of skills. For simpler applications—such as retrieving Wikipedia factoids or assessing the sentiment in product reviews—modern methods do pretty well. But when people imagine computers that comprehend language, they envision far more sophisticated behaviors: legal tools that help people analyze their predicaments; research assistants that synthesize information from across the web; robots or game characters that carry out detailed instructions.

Today’s models are nowhere close to achieving that level of comprehension—and it’s not clear that yet another SOTA paper will bring the field any closer.

How did the NLP community end up with such a gap between on-paper evaluations and real-world ability? In an ACL position paper, my colleagues and I argue that in the quest to reach difficult benchmarks, evaluations have lost sight of the real targets: those sophisticated downstream applications. To borrow a line from the paper, the NLP researchers have been training to become professional sprinters by “glancing around the gym and adopting any exercises that look hard.”

To bring evaluations more in line with the targets, it helps to consider what holds today’s systems back.

A human reading a passage will build a detailed representation of entities, locations, events, and their relationships—a “mental model” of the world described in the text. The reader can then fill in missing details in the model, extrapolate a scene forward or backward, or even hypothesize about counterfactual alternatives.

This sort of modeling and reasoning is precisely what automated research assistants or game characters must do—and it’s conspicuously missing from today’s systems. An NLP researcher can usually stump a state-of-the-art reading comprehension system within a few tries. One reliable technique is to probe the system’s model of the world, which can leave even the much-ballyhooed GPT-3 babbling about cycloptic blades of grass.

Imbuing automated readers with world models will require major innovations in system design, as discussed in several Theme-track submissions. But our argument is more basic: however systems are implemented, if they need to have faithful world models, then evaluations should systematically test whether they have faithful world models.

Stated so baldly, that may sound obvious, but it’s rarely done. Research groups like the Allen Institute for AI have proposed other ways to harden the evaluations, such as targeting diverse linguistic structures, asking questions that rely on multiple reasoning steps, or even just aggregating many benchmarks. Other researchers, such as Yejin Choi’s group at the University of Washington, have focused on testing common sense, which pulls in aspects of a world model. Such efforts are helpful, but they generally still focus on compiling questions that today’s systems struggle to answer.

We’re proposing a more fundamental shift: to construct more meaningful evaluations, NLP researchers should start by thoroughly specifying what a system’s world model should contain to be useful for downstream applications. We call such an account a “template of understanding.”

One particularly promising testbed for this approach is fictional stories. Original stories are information-rich, un-Googleable, and central to many applications, making them an ideal test of reading comprehension skills. Drawing on cognitive science literature about human readers, our CEO David Ferrucci has proposed a four-part template for testing an AI system’s ability to understand stories.

Spatial: Where is everything located and how is it positioned throughout the story?
Temporal: What events occur and when?
Causal: How do events lead mechanistically to other events?
Motivational: Why do the characters decide to take the actions they take?

By systematically asking these questions about all the entities and events in a story, NLP researchers can score systems’ comprehension in a principled way, probing for the world models that systems actually need.

It’s heartening to see the NLP community reflect on what’s missing from today’s technologies. We hope this thinking will lead to substantial investment not just in new algorithms, but in new and more rigorous ways of measuring machines’ comprehension. Such work may not make as many headlines, but we suspect that investment in this area will push the field forward at least as much as the next gargantuan model.

Jesse Dunietz is a researcher at Elemental Cognition, where he works on developing rigorous evaluations for reading comprehension systems. He is also an educational designer for MIT’s Communication Lab and a science writer.

Source link

قالب وردپرس

Technology

How Lifetime Access to Rosetta Stone Is Cheaper Than You Think

Published

25 mins ago

August 7, 2020

pribizco

Social distancing means we’re all spending more time in the house than ever… But once you’ve played every board game you own, replayed the entire Halo series on Xbox, and turned your backyard into botanical gardens, what else are supposed to do with your time?

Perhaps this is the moment to spread your wings and learn something new? If you’re itching for some projects to get stuck into, check out our social distancing subscription bundle.

What’s Included?

We’ve teamed up with our partners to offer MUO readers a mega saving on three premium apps:

Rosetta Stone
KeepSolid VPN Unlimited
12min Premium Micro Book Library

Combined, the apps are worth nearly $850. As a MakeUseOf reader, you can pick up the entire bundle for just $199—a 76 percent discount on the regular price.

Why Use Rosetta Stone?

Rosetta Stone is one of the most comprehensive language-learning suites in the world. It caters for all levels and comes with a variety of learning aids. The program’s aim is to make you absorb your target language in the same way that you do when you’re a child.

In total, 24 languages are offered on the platform, including Spanish, French, German, Italian, Mandarin, and Russian.

This deal gives you lifetime access to all 24 languages on both desktop and mobile.

You’ll also be able to practice your pronunciation with Rosetta Stone’s TruAccent speech recognition technology and, when you feel comfortable enough, use the advanced speech engine to compare your accent with native speakers.

KeepSolid VPN Unlimited

The lockdown has meant we’ve all been binge-watching TV and movies more than ever. That’s where the next tool in the bundle—KeepSolid VPN Unlimited—is useful.

As any regular MakeUseOf reader knows, using a VPN lets you bypass geoblocking that streaming services put in place. It means you can access the Netflix catalog from another country, stream live TV from BBC iPlayer if you’re based outside the UK, and even sign up for paid subscriptions with video-on-demand providers from elsewhere in the world.

Of course, a VPN is also an essential security tool. You can protect yourself on public Wi-Fi, prevent your ISP and local government from seeing your web traffic, and encrypt all your internet data.

KeepSolid VPN Unlimited has servers in more than 80 countries and supports the major VPN protocols such as IKEv2, OpenVPN, and L2TP/IPSec.

Once again, this deal will bag you a lifetime subscription.

12min Premium Micro Book Library

The final lifetime subscription in our social distancing bundle is 12min Premium Micro Book Library.

One of the curious truths about 2020’s lockdown is that while we might all be spending more time in our homes, many of us also have less free time than ever. Anyone with kids knows what we’re talking about.

If you’re a book lover who’s not had time to read any new material in 2020, 12min Premium Micro Book Library might offer a solution. The app’s specialty is providing digestible overviews of long books. In theory, each overview only takes 12 minutes to read. Audio and text versions are available.

In total, more than 380 overviews are on the platform, with 30+ new titles added every month.

Make Social Distancing Fun!

More knowledge, more entertainment, more security. The perfect lockdown combination. If you’d like to get lifetime access to these three premium apps, make sure you grab the bundle while it’s still available—the deal expires on August 15th, 2020.

Read the full article: How Lifetime Access to Rosetta Stone Is Cheaper Than You Think

Source link

قالب وردپرس

Technology

Software that monitors students during tests perpetuates inequality and violates their privacy

Published

4 hours ago

August 7, 2020

pribizco

The coronavirus pandemic has been a boon for the test proctoring industry. About half a dozen companies in the US claim their software can accurately detect and prevent cheating in online tests. Examity , HonorLock , Proctorio , ProctorU, Respondus and others have rapidly grown since colleges and universities switched to remote classes.

While there’s no official tally, it’s reasonable to say that millions of algorithmically proctored tests are happening every month around the world. Proctorio told the New York Times in May that business had increased by 900% during the first few months of the pandemic, to the point where the company proctored 2.5 million tests worldwide in April alone.

I’m a university librarian and I’ve seen the impacts of these systems up close. My own employer, the University of Colorado Denver, has a contract with Proctorio.

It’s become clear to me that algorithmic proctoring is a modern surveillance technology that reinforces white supremacy, sexism, ableism, and transphobia. The use of these tools is an invasion of students’ privacy and, often, a civil rights violation.

If you’re a student taking an algorithmically proctored test, here’s how it works: When you begin, the software starts recording your computer’s camera, audio, and the websites you visit. It measures your body and watches you for the duration of the exam, tracking your movements to identify what it considers cheating behaviors. If you do anything that the software deems suspicious, it will alert your professor to view the recording and provide them a color-coded probability of your academic misconduct.

Depending on which company made the software, it will use some combination of machine learning, AI, and biometrics (including facial recognition, facial detection, or eye tracking) to do all of this. The problem is that facial recognition and detection have proven to be racist, sexist, and transphobic over, and over, and over again.

In general, technology has a pattern of reinforcing structural oppression like racism and sexism. Now these same biases are showing up in test proctoring software that disproportionately hurts marginalized students.

A Black woman at my university once told me that whenever she used Proctorio’s test proctoring software, it always prompted her to shine more light on her face. The software couldn’t validate her identity and she was denied access to tests so often that she had to go to her professor to make other arrangements. Her white peers never had this problem.

Similar kinds of discrimination can happen if a student is trans or non-binary. But if you’re a white cis man (like most of the developers who make facial recognition software), you’ll probably be fine.

Students with children are also penalized by these systems. If you’ve ever tried to answer emails while caring for kids, you know how impossible it can be to get even a few uninterrupted minutes in front of the computer. But several proctoring programs will flag noises in the room or anyone who leaves the camera’s view as nefarious. That means students with medical conditions who must use the bathroom or administer medication frequently would be considered similarly suspect.

Beyond all the ways that proctoring software can discriminate against students, algorithmic proctoring is also a significant invasion of privacy. These products film students in their homes and often require them to complete “room scans,” which involve using their camera to show their surroundings. In many cases, professors can access the recordings of their students at any time, and even download these recordings to their personal machines. They can also see each student’s location based on their IP address.

Privacy is paramount to librarians like me because patrons trust us with their data. After 9/11, when the Patriot Act authorized the US Department of Homeland Security to access library patron records in their search for terrorists, many librarians started using software that deleted a patron’s record once a book was returned. Products that violate people’s privacy and discriminate against them go against my professional ethos, and it’s deeply concerning to see such products eagerly adopted by institutions of higher education.

This zealousness would be slightly more understandable if there was any evidence that these programs actually did what they claim. To my knowledge, there isn’t a single peer-reviewed or controlled study that shows proctoring software effectively detects or prevents cheating. Given that universities pride themselves on making evidence-based decisions, this is a glaring oversight.

Fortunately, there are movements underway to ban proctoring software and ban face recognition technologies on campuses, as well as congressional bills to ban the US federal government from using face recognition. But even if face recognition technology were banned, proctoring software could still exist as a program that tracks the movements of students’ eyes and bodies. While that might be less racist, it would still discriminate against people with disabilities, breastfeeding parents, and people who are neuroatypical. These products can’t be reformed; they should be abandoned.

Cheating is not the threat to society that test proctoring companies would have you believe. It doesn’t dilute the value of degrees or degrade institutional reputations, and student’s aren’t trying to cheat their way into being your surgeon. Technology didn’t invent the conditions for cheating and it won’t be what stops it. The best thing we in higher education can do is to start with the radical idea of trusting students. Let’s choose compassion over surveillance.

Shea Swauger is an academic librarian and researcher at the University of Colorado Denver.

Source link

قالب وردپرس

Technology

5 Reasons Why You Should Buy The New Maruti Suzuki S-Cross 2020

Published

7 hours ago

August 7, 2020

pribizco

Ever since Maruti Suzuki discontinued diesel engines, its SUV S-Cross went off the sales chart as it only had a diesel variant. However, the 2020 Maruti Suzuki S-Cross is back with a powerful new hybrid petrol engine and better looks.

As reported in our previous article, the bookings for SUV are already open and you can book it from the nearest Nexa outlet or through online platforms. However, at that time we knew very little about the specifications of the new S-Cross. As the automaker confirms the details of the SUV, we can say the new S-Cross is quite impressive.

In fact, this SUV is worth considering before buying a new car in the compact SUV segment. Here are the reasons why we think you should check out the all-new 2020 Maruti Suzuki S-Cross.

5 Reasons Why Maruti Suzuki S-Cross

Image: Maruti Suzuki

1. Improved Built Quality And Safety Standards

Unlike the reputation of the automaker for built quality, Maruti Suzuki S-Cross is quite impressive with its high-end safety standards. The SUV has a five-star safety rating in ASEAN NCAP crash test.

With an overall score of 15.48 out of 16 for adult occupant protection, S-Cross is one of the safest cars in its segment.

Poor safety standards are one of the major reasons why people switch away from Maruti cars. With this assurance, you can go for S-Cross without any hesitation.

2. Hybrid Powertrain

maruti suzuki s-cross hybrid powertrain — Image: Maruti Suzuki

The new Maruti Suzuki S-Cross is one of the most affordable SUVs with a hybrid powertrain. The latest variant of this compact SUV is equipped with Suzuki’s Smart Hybrid Vehicle Technology (SHVS).

Under the hood is a combination of a K15B 1.5-liter BS6 compliant petrol engine and an electric motor. The electric motor gets power from two small lithium-ion battery packs.

With this powertrain, the SUV gets a total output of 103 BHP and 138 Nm of peak torque. The engine is available with both manual and automatic transmission.

Not only does the S-Cross have decent power delivery, but it also has very low carbon emissions due to the hybrid powertrain. Maruti Suzuki S-Cross is a very good option for masses who prefer eco-friendly cars.

Apart from this, Maruti Suzuki has provided features like Idle Start-Stop (ISS), Brake Energy Regeneration, and Torque assist during acceleration for a better experience.

3. Impressive Mileage

Being an SUV, S-Cross has quite an impressive fuel- efficiency. The automaker claims to provide a mileage of 18.55 km/l on MT and 18.43 km/l with AT. Generally, the SUVs in this segment offer a fuel-economy of under 15 km/l. However, with a mild-hybrid and idle Start-Stop (ISS) system, Maruti Suzuki S-Cross is providing impressive mileage.

The idle start-stop system turns off the engine when the vehicle is stationary, especially when waiting at red lights, and starts it again when the clutch is pressed. This way, it cuts off unnecessary fuel-consumption.

Looking at the fuel-efficiency we can say now you don’t have to worry about fuel-economy.

4. Improved Braking And Ride Quality

s-cross disc brakes — Image: Maruti Suzuki

The new S-Cross comes with 4 disc brakes, which provides amazing stopping power, you can easily go into a corner with a higher speed without any worry. Well, it doesn’t mean you should over-speed, always ride within the safe speed limit.

The front disc brakes of S-Cross are ventilated for better heat dissipation. The braking force acts more on the front wheels, therefore the front wheels need to be better in all aspects than the rear one.

5. More Features, More Comfort

The 2020 Maruti Suzuki S-Cross features a smart infotainment system with Android Auto and Apple CarPlay connectivity. Additionally, this compact SUV is equipped with features like cruise control, autosensing rain wipers, push start, and much more.

Other than this, the rear seats of the car are made foldable that allows the user to have more cabin space. Foldable seats are very useful in putting extra luggage.

The best part is that, with all these features and a hybrid tag, the price of S-Cross starts at Rs 8.39 lakh (ex-showroom). On the other hand, there are no hybrid SUVs in this segment except the MG Hector hybrid which is almost twice in price than S-Cross. This also makes S-Cross a value for money car.

Finally, we would like to say that though it might not be the best selling car of Maruti, it’s much better than the other best selling cars of the company.

Do share your views in the comments section below.

The post 5 Reasons Why You Should Buy The New Maruti Suzuki S-Cross 2020 appeared first on Fossbytes.

Source link

قالب وردپرس